Python fundamentals in 2024

Python's readability, simplicity, and vast community support have made it a preferred language for data scientists, enabling efficient data processing, analysis, and model development. This article guides you through the fundamentals of using Python for data programming in 2024.

Setting up Python in your Jupyter notebook

Getting your working environment set up when working with Python can often feel like half the battle. The following are steps for installing Python and setting up a data analysis environment.

First, select a cloud-based notebook service like JupyterHub, Deepnote or Google Colab. Then launch a new notebook and use pip or conda commands in the notebook to install Python libraries like Pandas, NumPy, Matplotlib, etc. Next, upload your data files to the cloud platform or access data from cloud storage. Last, you will import libraries you need for your use case:

Pandas: For data manipulation and analysis. Ideal for working with structured data.

NumPy: For numerical computing. Extensively used in scientific computing, supporting large, multi-dimensional arrays and matrices.

Matplotlib: A plotting library for creating static, animated, and interactive visualizations in Python.

Scikit-Learn: Widely used for machine learning, offering simple and efficient tools for data mining and data analysis.

TensorFlow: A deep learning library, popular for building and training neural networks.

Seaborn: Built on Matplotlib, it provides a high-level interface for drawing attractive and informative statistical graphics.

Fundamentals of Python Programming

Once you’ve set up your Python environment such as a Jupyter notebook, you’re ready to tackle the fundamentals of Python programming. We’ll cover those fundamentals in three categories: syntax, data types and data structures.

Basics of Python syntax

Variables and Data Types: In Python, you can define variables to store data without explicitly declaring their type. The data types include integers (int), floating-point numbers (float), strings (str), and booleans (bool).

x = 10         # integer
y = 3.14       # floating point number
name = "Alice" # string
is_valid = True # boolean

Comments: Comments in Python start with a # symbol. Anything following # on the line is ignored by the interpreter.

# This is a comment

Indentation: Python uses indentation to define blocks of code. This is crucial for defining function bodies, loops, if statements, etc.

if x > 5:
    print("x is greater than 5")

If-else statements: Conditional statements in Python are straightforward.

if x > 0:
    print("Positive")
elif x == 0:
    print("Zero")
else:
    print("Negative")

Loops: Python supports for and while loops.

for i in range(5):
    print(i)

while x > 0:
    print(x)
    x -= 1

Functions: Functions are defined using the def keyword.

def greet(name):
    return "Hello, " + name

Lists and dictionaries: Python has built-in support for lists (arrays) and dictionaries (hashmaps).

numbers = [1, 2, 3, 4, 5]      # List
person = {"name": "Alice", "age": 25} # Dictionary

Importing modules: You can import modules to use additional functions and classes.

pythonCopy code
import math
print(math.sqrt(16))

Error handling: Python uses try-except blocks for error handling.

pythonCopy code
try:
    x = 1 / 0
except ZeroDivisionError:
    print("Cannot divide by zero")

Classes and objects: Python is also an object-oriented language.

pythonCopy code
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def greet(self):
        return "Hello, my name is " + self.name

alice = Person("Alice", 30)
print(alice.greet())

Basics of Python data types

The following are basic Python data types you should be familiar with.

Numbers:

Integers (int): Whole numbers without a decimal point. Example: 5, 3, 42.

Floating Point Numbers (float): Numbers with a decimal point. Example: 3.14, 0.001, 2.0.

Complex Numbers (complex): Numbers with a real and imaginary part. Example: 1 + 2j.

Booleans (bool): Represents truth values. There are only two Boolean values: True and False.

Strings (str): A sequence of characters used to store text. Example: "Hello, world!".

Strings in Python are immutable, meaning they cannot be changed after they are created.
Strings can be manipulated and combined in various ways, and Python provides a wealth of methods for string processing.

Lists: Ordered and mutable collections of items. Example: [1, 2.3, "hello"].

Lists can contain items of different types and support operations like appending, removing, and slicing.

Tuples (tuple): Similar to lists, but immutable. Example: (1, "a", 3.14).

Tuples are often used for data that should not change after creation, like the dimensions of an object or coordinates on a map.

Sets (set): Unordered collections of unique elements. Example: {1, 2, 3}.

Sets are mutable and are useful for operations like finding unique items or set operations like union and intersection.

Dictionaries (dict): Collections of key-value pairs. Example: {"name": "Alice", "age": 25}.

The keys in a dictionary must be unique and immutable, like strings, numbers, or tuples.
Dictionaries are mutable and provide fast access to data based on keys.

None Type:

A special type representing the absence of a value or a null value. It is denoted by the keyword None.

Python data structures

These are collections of different kinds of data that can be represented by an object that is used throughout an entire program. Whereas data types’ values can be assigned directly to variables, data is assigned to data structure objects through algorithms and operations like push, pop, etc.

Exploratory Data Analysis in Python

Exploratory Data Analysis (EDA) in Python is a critical step in the data science workflow. It involves analyzing datasets to summarize their main characteristics, often using visual methods. The goal is to understand the data, find patterns, spot anomalies, test hypotheses, and check assumptions. Here's an overview of how EDA is typically conducted using Python:

Setting Up the Environment

First, you set up your Python environment with the necessary libraries. The most commonly used libraries for EDA are:

Pandas: For data manipulation and analysis.
NumPy: For numerical operations.
Matplotlib: For creating static, interactive, and animated visualizations.
Seaborn: For making statistical graphics in Python.
Scipy: For scientific and technical computing.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Data Collection

Load the dataset into a Pandas DataFrame. Data can be loaded from various sources like CSV files, SQL databases, JSON files, etc.

df = pd.read_csv('data.csv')

Data Cleaning

Prepare the data for analysis:

Handling missing values.
Correcting data types.
Removing duplicates.
Renaming columns for clarity.

df.dropna(inplace=True) # Remove missing values
df.drop_duplicates(inplace=True)

Data Exploration

Gain an understanding of the data's characteristics and structure:

Check the shape of the dataset (df.shape).
View a few rows of the dataset (df.head()).
Get a summary of the data (df.describe()).
Check the data types (df.dtypes).

Univariate Analysis

Analyze single variables to summarize and find patterns in the data:

For numerical variables: Use histograms, box plots.
For categorical variables: Use bar charts, frequency counts.

df['column_name'].hist() # For numerical data
df['category_column'].value_counts().plot(kind='bar') # For categorical data

Bivariate/Multivariate Analysis

Explore relationships between variables:

Scatter plots for examining relationships between two continuous variables.
Correlation matrices to understand the linear relationship between variables.
Pair plots in Seaborn to visualize relationships across the entire dataset.

sns.scatterplot(data=df, x='variable1', y='variable2')
sns.heatmap(df.corr(), annot=True)
sns.pairplot(df)

Grouping and Aggregation

Group data and aggregate information:

Use groupby to aggregate data by categories.
Compute summary statistics.

df.groupby('category').mean()

Feature Engineering

Create new features that might be relevant for the analysis:

Binning numeric data.
Creating date-time features from timestamps.
Deriving new categories.

df['new_feature'] = df['original_feature'].apply(some_function)

Dealing with Outliers

Detect and handle outliers in the data:

Use IQR (Interquartile Range) or Z-scores.
Visualize outliers using box plots.

sns.boxplot(x=df['variable'])

Data Visualization

Create visualizations to understand the data better and communicate findings:

Use Matplotlib and Seaborn for custom graphs and plots.
Tailor visualizations for the audience and the specific questions being addressed.

Data analytics with SQL and Python together

SQL and Python are often used together to first query the data, and then transform it for data analytics and data science use cases. Traditionally, using both tools would require working in separate editors. Alternatively, a more complicated solution is running SQL queries wrapped in Python through Python libraries like SQLite and SQLAlchemy.

But modern Python notebooks are also SQL notebooks in this day and age. The advantage of a notebook that prioritizes both SQL and Python as first-class citizens is that you can combine SQL’s ease of use with Python’s flexibility without having to context switch between multiple tools.

The result is seamless integration together of both SQL and Python in one place, so that you always have the right tool for the data task at hand.

Best Practices and Tips

Adhering to Python's coding best practices guarantees a consistently clean and legible codebase. This approach promotes the reuse of code, minimizes the chances of encountering bugs (while also facilitating their detection and correction), and simplifies the process of maintaining and refactoring your code.

Given Python's core principles of readability and simplicity, aligning with these established standards and practices empowers developers to fully leverage the elegant syntax that Python offers.

PEP 8 style guide: Follow the Python Enhancement Proposal (PEP) 8 style guide for code formatting and naming conventions. This helps make your code more readable and maintainable.

Use descriptive variable names: Choose meaningful and descriptive names for your variables, functions, and classes. This makes your code self-documenting and easier to understand.

Comment and document: Add comments to explain complex logic or to provide context for your code. Additionally, use docstrings to document your functions and classes. Tools like Sphinx can generate documentation from docstrings.

Modularize code: Break your code into smaller, reusable functions and modules. This promotes code reusability, readability, and maintainability.

List comprehensions: Utilize list comprehensions for concise and efficient ways to create lists or modify existing ones. They are more Pythonic than traditional for loops.

Use built-in functions: Python has a rich standard library with many built-in functions and modules. Make use of them instead of reinventing the wheel.

Virtual environments: Use virtual environments to manage dependencies for different projects. The venv or virtualenv modules are helpful for isolating project-specific packages.

Unit testing: Write unit tests using frameworks like unittest or pytest to ensure your code functions correctly and to catch regressions early.

Refactoring: Periodically review and refactor your code to eliminate redundancy, improve performance, and maintain a clean codebase.

Good coding practices often vary depending on the specific project and team, but these tips should provide a solid foundation for writing clean and maintainable Python code in 2024.

When it comes to implementing these fundamentals for Python, make Deepnote your go-to cloud-based coding interface. Get started for free today.

The notebook manifesto

Data analytics

Data engineering

Machine learning

Fintech & Finance

Biotechnology

Gaming

Enterprise

Startups

Research

Use cases

Python fundamentals in 2024

Setting up Python in your Jupyter notebook

Fundamentals of Python Programming

Basics of Python syntax

Basics of Python data types

Numbers:

Python data structures

Exploratory Data Analysis in Python

Data analytics with SQL and Python together

Best Practices and Tips

Blog

Ultimate guide to huggingface_hub library in Python

Ultimate guide to torchvision library in Python

How we made data apps 40% faster

That’s it, time to try Deepnote