Get started
← Back to all posts

Ultimate guide to SQLAlchemy library in python

By Katerina Hynkova

Updated on August 22, 2025

SQLAlchemy is a popular open-source SQL toolkit and object-relational mapper (ORM) for Python, created by Michael Bayer and first released in 2006.

Illustrative image for blog post

Its core purpose is to provide Python developers with a flexible and powerful way to interact with relational databases using Pythonic code instead of raw SQL. SQLAlchemy includes both a lower-level SQL Expression Language (the “Core”) and a higher-level ORM that maps Python classes to database tables. This dual-layer design makes SQLAlchemy a central tool in the Python ecosystem for database access, allowing developers to work at the level of abstraction that suits their needs. As of 2025, SQLAlchemy remains actively maintained and widely used (the latest stable release is SQLAlchemy 2.0.43, released August 11, 2025), underscoring its continued relevance and evolution nearly two decades since its creation.

Originally developed to address the “object-relational impedance mismatch,” SQLAlchemy’s design philosophy emphasizes both flexibility and transparency. Unlike some ORMs that attempt to hide SQL entirely, SQLAlchemy exposes the full power of SQL and gives developers control over exactly what is happening under the hood. It embraces the idea that relational databases and object-oriented applications have different strengths – SQLAlchemy provides tools to bridge them without compromising on either side. The library follows well-established patterns (like the Unit of Work and Data Mapper patterns) to ensure that Python objects and database rows stay in sync in a predictable way. By automating repetitive tasks (such as generating SQL for CRUD operations) while keeping the developer in control of queries and transactions, SQLAlchemy has become a de facto standard for database programming in Python.

In the Python ecosystem, SQLAlchemy plays a central role wherever a project needs to talk to an SQL database outside of a heavyweight framework. It is the default choice for many web frameworks (e.g. Flask’s database extension is built on SQLAlchemy) and is widely used in data analysis, backend services, and desktop applications. Its longevity and robust feature set have fostered a large community and a rich collection of extensions. Notably, the Alembic project (for database migrations) was created by the same author to integrate closely with SQLAlchemy, and libraries like Flask-SQLAlchemy provide convenient integration with web frameworks. Given its adaptability, SQLAlchemy remains highly relevant in 2024–2025: it has added modern features like asyncio support and improved type annotations in version 2.0, ensuring it stays up-to-date with contemporary Python development practices. In summary, SQLAlchemy’s history, purpose, and sustained community support have cemented its role as a cornerstone of database access in Python.

In-depth technical architecture (core and ORM)

How SQLAlchemy works: Technically, SQLAlchemy is more than just an ORM – it’s a comprehensive database toolkit. Its architecture is split into two main components: SQLAlchemy Core and the SQLAlchemy ORM. The Core provides a low-level, expressive SQL abstraction layer, while the ORM builds on Core to provide high-level object-relational mapping. At the heart of SQLAlchemy Core is the concept of an Engine, which manages database connections (including a connection pool) and knows how to speak the SQL dialect of your chosen database. The Engine uses a specific Dialect internally to adapt generic SQL expressions to the database’s SQL flavor (e.g. PostgreSQL, MySQL, SQLite, Oracle, etc., all of which are supported by SQLAlchemy via built-in or plugin dialects). When you create an Engine (via create_engine()), you provide a database URL that specifies the database type and connection details; SQLAlchemy then loads the appropriate dialect and DB-API driver to connect. The Engine acts as a factory for database Connection objects and as a holder of the connection pool for efficiency.

On top of the Engine, SQLAlchemy’s Core offers a SQL Expression Language – a domain-specific language in Python for constructing SQL queries. This is a system of Python objects (like Table, Column, select(), insert(), etc.) that correspond to SQL constructs. For example, you can define tables and columns in Python, then build a query using Python expressions rather than raw SQL strings. These expressions compile to actual SQL statements at execution time. Importantly, this approach is database-agnostic: the same Core query can run on SQLite or PostgreSQL, with the dialect translating it appropriately. SQLAlchemy Core handles details like quoting identifiers and parameter styles for each DB. It also provides transaction management (via Connection objects or Engine contexts) and reflection (ability to load database schema into Python objects). In summary, the Core’s architecture centers on the Engine for connectivity and the SQL Expression system for composable query building.

The ORM layer is an optional component built on the Core. It uses the Core’s capabilities to perform database operations but allows you to work with Python classes and objects instead of raw SQL constructs. You start by defining mapping classes (often using the declarative system via declarative_base() or the newer DeclarativeBase in SQLAlchemy 2.0). Each mapped class corresponds to a database table, and class attributes are mapped to columns. Under the hood, SQLAlchemy sets up a MetaData object to hold the schema (table definitions) and uses Python descriptors to link class attributes to Column objects. The ORM introduces the concept of a Session, which is a working unit that caches objects, tracks changes (using an identity map so each database row corresponds to at most one object in the session), and orchestrates commits (the Unit of Work pattern). When you query via the ORM (e.g. session.query(SomeClass) or using the new select(SomeClass)), SQLAlchemy uses the Core to build an appropriate SELECT statement, executes it, and then instantiates objects of your classes for each row, populating them with database data. The Session will also transparently add new objects or update existing ones to the database on commit, generating the appropriate INSERT/UPDATE/DELETE statements behind the scenes.

Integrations and extensions: Because of its layered architecture, SQLAlchemy integrates with a variety of systems. The Engine and Core can be used directly by libraries like Pandas to read from/write to databases – for instance, Pandas can accept a SQLAlchemy connectable (Engine or Connection) to read a SQL query into a DataFrame, benefiting from SQLAlchemy’s support for many databases. The ORM integrates with web frameworks: libraries such as Flask-SQLAlchemy provide a simplified interface to tie a Session’s scope to the web request lifecycle. SQLAlchemy’s design also encourages external extensions; for example, geoalchemy extends Column types for spatial data, and dogpile.cache (by the same author) can be used alongside SQLAlchemy for caching query results in applications. Another key integration is with Alembic for schema migrations – while SQLAlchemy Core can reflect and generate schemas, it does not natively handle schema evolution (ALTER TABLE, etc.) in application code. Alembic fills this gap by using SQLAlchemy’s metadata to generate migration scripts and apply them in a controlled way. The ecosystem also includes community dialects for less common databases (e.g. Amazon Redshift, Google BigQuery) which can be installed to let SQLAlchemy communicate with those systems. In short, SQLAlchemy’s architecture of a powerful Core and an optional ORM makes it both flexible and extensible – you can use it for anything from executing raw SQL with connection pooling to full-fledged ORM with rich object models, and it plays well with other tools in the Python data stack.

Benefits and use cases of SQLAlchemy

Using SQLAlchemy brings numerous benefits compared to working with lower-level database APIs or simpler ORMs, especially in real-world application contexts. First, SQLAlchemy provides a high level of abstraction without sacrificing power. Instead of writing raw SQL strings and manually handling result sets, developers can work with Python objects and expressions – this leads to clearer, more maintainable code. Yet, because SQLAlchemy’s queries ultimately compile to SQL, you have access to the full capabilities of the underlying database (complex joins, subqueries, window functions, etc.). The ability to drop down to literal SQL when needed (via text() or raw connection executes) means you’re never boxed in: it’s virtually never the case that “the ORM generated a bad query” that you can’t fix, since you retain full control if necessary. In practice, this means you get the convenience of an ORM for common operations and the flexibility to optimize or hand-tune critical queries – the best of both worlds.

Second, SQLAlchemy is database-agnostic and promotes portability of code. With a single codebase, you can support SQLite (for local development or testing) and a production database like PostgreSQL or MySQL, simply by changing the database URL. The ORM classes and Core queries do not need to change because SQLAlchemy’s dialect system handles differences in SQL syntax and datatypes. This is a huge benefit for projects that may start with one database and need to migrate to another, or for applications that need to support multiple database backends. It also simplifies testing: you can use an in-memory SQLite database for fast tests and be confident that the queries you run are analogous to those on your production database (with some caveats for DB-specific features). This portability makes SQLAlchemy a safe choice for libraries and frameworks – many open-source projects (like Apache Superset and Airflow) use SQLAlchemy precisely to leverage its wide database support.

Third, using SQLAlchemy can drastically improve developer productivity and code maintainability. The ORM lets you work in terms of your domain model (Python classes) rather than writing repetitive SQL for CRUD. For example, adding a new record can be as simple as instantiating an object and adding it to a session, and SQLAlchemy will generate the appropriate INSERT statement. This leads to fewer errors (especially SQL injection risks, since SQLAlchemy uses parameter binding safely by default) and more focus on business logic instead of SQL syntax. Moreover, SQLAlchemy’s query language is expressive: filtering, joining, and aggregating can be done through a fluent interface that many find easier to read than raw SQL strings embedded in code. The library also handles many edge cases and efficiencies internally – e.g. it will not emit unnecessary SQL if an object hasn’t changed, it will transparently expire objects to reflect fresh data when needed, and it uses a unit-of-work that batches SQL operations to avoid half-finished updates. All of this reduces the chance for bugs like forgetting to commit, leaking connections, or making SQL typos.

From a use-case perspective, SQLAlchemy shines in a variety of scenarios. In web applications, it’s used to manage persistent data like user accounts, blog posts, e-commerce orders, etc., often in conjunction with frameworks like Flask or FastAPI. Compared to using a low-level DB-API (like psycopg2) directly, SQLAlchemy provides connection pooling, transaction management, and a nicer API – meaning web developers don’t have to manually open/close connections on each request or hand-write SQL for every query. In data pipelines and scientific computing, SQLAlchemy’s Core is useful for analysts who want to perform complex queries or data transformations in-database and fetch results into pandas DataFrames. For example, one can use pandas.read_sql() with a SQLAlchemy engine to load data from a remote database, benefiting from SQLAlchemy’s support for multiple databases and avoiding the need to switch to different library APIs for Postgres vs. SQLite. In scripting or tool development, SQLAlchemy allows quick construction of internal databases (for caching results, for example) without worrying about SQL syntax differences. Even if the project is not heavy on object modeling, the Core can be used for its robust query construction and safe parameter handling (preventing SQL injection by default).

Comparison to not using an ORM: If one were to not use SQLAlchemy (or any ORM), they would typically use the Python DB-API directly with raw SQL strings. This quickly becomes laborious and error-prone – you’d be writing many similar SELECT/INSERT/UPDATE statements, manually mapping query results (tuples) to Python objects or dictionaries, and handling connections and transactions explicitly. This approach often leads to duplicated code and potential mistakes (like forgetting to escape a string properly, leading to injection vulnerabilities). By contrast, SQLAlchemy eliminates a lot of that boilerplate: for instance, it will automatically parameterize queries (avoiding injection issues), and you can centralize your schema definitions in one place (your model classes) instead of scattering CREATE TABLE and SELECT statements around your code. Additionally, SQLAlchemy’s Session provides a unit-of-work that can batch multiple operations into one transaction and will intelligently order SQL statements based on dependencies (e.g. insert parents before children, etc.). Without an ORM, the developer has to remember to commit or rollback transactions and properly sequence operations. In summary, the benefit of SQLAlchemy is not that it hides SQL (you can always see or log the SQL), but that it streamlines database interactions and reduces the likelihood of mistakes, while still giving you full control when you need it.

Real-world contexts illustrate these benefits: for example, in a banking or financial application, using SQLAlchemy can ensure that complex transactions (transferring funds between accounts, etc.) are handled safely with rollback on error, using Python constructs rather than manual SQL SAVEPOINTs. In a high-traffic web API, SQLAlchemy’s connection pooling and optimized SQL generation can help maintain performance and avoid database overload. Conversely, without SQLAlchemy, developers might spend extra time writing and testing SQL for each new feature and debugging subtle issues like connection timeouts or transaction isolation problems. Therefore, the benefits of using SQLAlchemy include increased development speed, improved code quality, database independence, and access to a mature body of knowledge and community support on best practices for database code. All of these make it a strong choice for Python projects that interact with relational databases.

Installation and setup for local development

Installing SQLAlchemy is straightforward, and it can be done in multiple ways depending on your environment and preferences. The primary method is using pip, Python’s package installer. On any operating system, you can open a terminal and run:

pip install SQLAlchemy

This will download and install the latest SQLAlchemy release from PyPI (as of 2025, that would get SQLAlchemy 2.0.x). It’s recommended to do this inside a virtual environment (using venv or virtualenv) to avoid conflicts with other packages. For example, you might do python -m venv venv to create a virtual environment, activate it, and then pip install SQLAlchemy in that isolated environment. If you’re using Anaconda or Miniconda, you can install SQLAlchemy via conda instead. The Anaconda distribution often includes SQLAlchemy by default, but if not, you can run:

conda install -c anaconda sqlalchemy

This installs SQLAlchemy from Anaconda’s package channel. Alternatively, conda install -c conda-forge sqlalchemy will fetch it from conda-forge. The conda installation method is useful if you prefer managing dependencies through the Conda environment (especially on Windows, where conda can sometimes simplify installing database drivers and dependencies).

IDE integration (VS Code, PyCharm): If you are working in Visual Studio Code, installation is still done via pip or conda as above, but you should ensure that VS Code is using the correct Python interpreter (virtual environment) where SQLAlchemy is installed. In VS Code’s terminal, run the pip install command for your environment, and VS Code will then be able to import SQLAlchemy in that workspace. VS Code doesn’t require a special extension for SQLAlchemy, but you might consider installing the Python extension which provides linting and autocompletion. In PyCharm, you can install SQLAlchemy through the built-in package manager: go to the Python Packages or Interpreter settings, search for "SQLAlchemy", and install the latest version. PyCharm will handle using pip under the hood for the selected interpreter. You can also just use pip in PyCharm’s terminal, since it’s the same as any pip usage. After installation, PyCharm should automatically recognize the SQLAlchemy package and provide code completion for it.

Anaconda navigator: If you prefer a GUI approach and you have Anaconda Navigator, you can open Navigator, go to the Environments tab, search for "sqlalchemy" in the package list, and install it from there. This is essentially doing the conda install in the background. Make sure to choose the correct environment in Navigator (you might create a new environment for your project, then install SQLAlchemy into it).

Operating system specifics: SQLAlchemy itself is a pure Python library (with optional C extensions for speed), so installing it doesn’t require any special system packages. Wheels (pre-compiled binaries) are available for all major OSes, so pip install typically just works. On Windows, if you are using a database like MySQL or PostgreSQL, you might need to install their client drivers (e.g. pip install psycopg2 for Postgres, which may need Visual C++ Build Tools, or use psycopg2-binary which provides a precompiled wheel). For SQLite, no action is needed since Python comes with SQLite support out of the box. On macOS, pip will usually install SQLAlchemy without issues. If using Apple Silicon (M1/M2), most wheels including SQLAlchemy are available, but ensure you have an updated pip to grab the correct wheel. On Linux, pip/conda handle it similarly – you just need Python installed. If you are installing in a system-wide location, you might need sudo for pip (but using a virtual env is strongly recommended instead).

Docker: For Docker users, you can add SQLAlchemy installation to your Dockerfile. For example, if you use a Python base image, you might have a line RUN pip install SQLAlchemy (and any database drivers you need) in the Dockerfile. That way your container image will have SQLAlchemy ready to use. It’s also common to include the package in a requirements.txt file and use pip install -r requirements.txt in the Docker build process. Because SQLAlchemy is OS-independent and pure Python, it runs the same inside the container on Linux as it does on your local machine.

Virtual environments: No matter which method, it’s best practice to use isolated environments. If you use venv, after creating and activating the environment, pip install will put SQLAlchemy into that environment’s site-packages. If you use poetry or pipenv as dependency managers, you can add SQLAlchemy to your project by running poetry add SQLAlchemy or pipenv install SQLAlchemy respectively. These tools handle virtualenv creation and will pin a version for reproducibility.

After installing, you can verify by running python -c "import sqlalchemy; print(sqlalchemy.__version__)". This should print the version (e.g. "2.0.43"). If you encounter an installation error (which is rare for SQLAlchemy), read the message – sometimes it could be trying to compile the optional C extensions. You can usually still use SQLAlchemy without the extensions (the core functionality will still work, just slightly slower in some areas). Most often, installation issues are related to database-specific drivers, not SQLAlchemy itself. For example, pip install psycopg2 might require a Postgres client library on the system. But installing SQLAlchemy alone should be quick and easy on all platforms.

Cloud and other environments: In a cloud environment (like AWS EC2, Google Cloud VM, or an on-prem server), you would similarly install via pip in your application environment. For AWS Lambda or serverless environments, you’d typically include SQLAlchemy in your deployment package (possibly using a requirements.txt with SQLAlchemy listed). SQLAlchemy doesn’t have large binary dependencies, so it’s lightweight to include (the wheel is a few MB at most). If you are working in an environment that has restrictions (e.g. some managed services like AWS Glue), ensure that adding external packages is supported (often via a requirements file or an .egg/.zip library upload). Since this guide focuses on standalone and local setups, the key point is: SQLAlchemy installation is straightforward, using pip or conda, and does not require special handling. Once installed, you can begin using it in your Python scripts or applications.

Beginner-friendly example script (20+ lines)

Let’s walk through a complete, beginner-friendly example of using SQLAlchemy. This example will create a new SQLite database, define a table using the ORM, insert some data, and query it back. It will illustrate the typical workflow and include basic error handling and explanations in comments.

# example_script.py – A simple SQLAlchemy usage example from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.orm import declarative_base, relationship, Session

# Define the base class for ORM models
Base = declarative_base()

# Define two ORM classes (tables): User and Address, in a one-to-many relationship class User(Base):
__tablename__ = 'users' id = Column(Integer, primary_key=True)
name = Column(String(50), nullable=False)
email = Column(String(100), unique=True)
 # Relationship to Address (one user to many addresses)
addresses = relationship("Address", back_populates="user", cascade="all, delete-orphan")

 def __repr__(self):
 return f"User(id={self.id}, name='{self.name}', email='{self.email}')"

class Address(Base):
__tablename__ = 'addresses' id = Column(Integer, primary_key=True)
email_address = Column(String(100), nullable=False)
user_id = Column(Integer, ForeignKey('users.id'))
 # Relationship back to User
user = relationship("User", back_populates="addresses")

 def __repr__(self):
 return f"Address(id={self.id}, email_address='{self.email_address}')"

# Create an SQLite engine (database will be stored in example.db file)
engine = create_engine("sqlite:///example.db", echo=True)

# Create all tables in the database (based on the classes defined above)
Base.metadata.create_all(engine)  # This issues CREATE TABLE statements for our models # Start a session to interact with the database with Session(engine) as session:
 try:
 # Create new User object
new_user = User(name="Alice", email="[email protected]")
 # Create two Address objects linked to the user
new_user.addresses = [
Address(email_address="[email protected]"),
Address(email_address="[email protected]")
]
session.add(new_user)  # Add the new User (and addresses via relationship) to the session
session.commit()  # Commit the transaction to save data except Exception as e:
session.rollback()  # Roll back the transaction on error print("Error during database operations:", e)
 # Session will auto-close here due to context manager # Start a new session to query the data with Session(engine) as session:
 # Query all users
users = session.query(User).all()  # Alternatively: session.execute(select(User)).scalars().all() for user in users:
 print(user)
 # Print each address associated with the user for addr in user.addresses:
 print(" ", addr)

What this script does: We first set up the ORM by creating a Base class using declarative_base(). Then we define two classes: User and Address. These classes have class attributes that are instances of Column, which SQLAlchemy uses to know how to create the database schema. We specify types like Integer and String, set one column as primary key, and establish a relationship: a User has many Addresses. The relationship() call with back_populates on both sides tells SQLAlchemy the ORM linkage (so accessing user.addresses gives a list of Address objects, and address.user gives the User). We’ve also included a cascade="all, delete-orphan" on the relationship so that if a User is deleted, their addresses are deleted too – this is a common ORM pattern to maintain referential integrity in-memory (note that in the database, the ForeignKey will prevent orphans if not null).

Next, we create an Engine pointing to a SQLite database file named example.db. We use echo=True to have SQLAlchemy print all the SQL statements it executes to the console for learning purposes. When Base.metadata.create_all(engine) is called, SQLAlchemy will issue the appropriate CREATE TABLE statements for each model if the table doesn’t already exist. You’ll see output (thanks to echo=True) showing SQL like CREATE TABLE users (...) and CREATE TABLE addresses (...). This is a convenient way to bootstrap a new database based on your models.

We then open a Session and perform some operations. In the first with Session(engine) as session: block, we create a new User object (Alice) and two related Address objects. Assigning the addresses to new_user.addresses establishes the relationship (SQLAlchemy will automatically set the user_id on each Address to link to the new User). We then add only the User to the session – because the Address instances are attached via the relationship, the session is aware of them as well (they are in the session through back-population). When we call session.commit(), the session will flush all new objects to the database. This results in an INSERT for the user, followed by INSERT for each address referencing the user’s primary key. The echo output will show these SQL statements and the transaction commit. We have wrapped this in a try/except: if any database error occurs (for example, if the email violated the unique constraint or the database was unreachable), we catch the exception, call session.rollback() to undo any partial changes, and print an error. This pattern ensures the session is not left in a broken state by an unhandled exception. After commit (or rollback), we exit the with block, which will close the session (even if an exception happened, the finally of context manager closes it).

In the second with Session(engine) as session: block, we demonstrate querying. We use session.query(User).all() to load all User records from the database. Under the hood, this will issue a SELECT * FROM users and return User objects. Because we defined a relationship, by default the addresses collection on a User is lazy-loaded – which means the addresses aren’t fetched until we access that attribute. In the loop, when we do for addr in user.addresses:, SQLAlchemy will emit a query to fetch addresses for that specific user (you’ll see a SELECT on the addresses table filtered by user_id). This lazy loading is convenient, though in some cases you might want to eager load relationships to avoid many small queries (that can be done with session.query(User).options(joinedload(User.addresses)), for example). Finally, we print out each user and their addresses. We defined __repr__ for our classes to make the printout informative. You might see something like:

BEGIN (implicit)
INSERT INTO users (id, name, email) VALUES (1, 'Alice', '[email protected]')
INSERT INTO addresses (id, email_address, user_id) VALUES (1, '[email protected]', 1)
INSERT INTO addresses (id, email_address, user_id) VALUES (2, '[email protected]', 1)
COMMIT BEGIN (implicit)
SELECT users.id AS users_id, users.name AS users_name, users.email AS users_email
FROM users
[generated in 0.00096s] (SQL statement log)
COMMIT User(id=1, name='Alice', email='[email protected]')
Address(id=1, email_address='[email protected]')
Address(id=2, email_address='[email protected]')

(The above is a conceptual combination of the echo logs and print output.) The exact SQL and formatting may differ, but the idea is that the script creates records and then retrieves them. The expected output when running the script will show the SQL statements (because of echo=True), and then the printed Python representations of the objects. We successfully inserted one User with two associated Addresses and then read them back.

Common pitfalls for beginners illustrated here: One pitfall is forgetting to call session.commit(). In our example, if we omitted session.commit(), the new user and addresses would not be saved to the database (the transaction would remain uncommitted and then be rolled back when the session closes). Always remember to commit (or use session.flush() for intermediate steps) to persist changes. Another pitfall is not handling exceptions – if an error occurs during commit(), the session will be in a “inactive” state where you must rollback. We showed the pattern of catching exceptions and rolling back to keep things clean. A third potential issue is lazy loading gotchas: if the session were closed before we accessed user.addresses, we would get an error (because the addresses would attempt to load but the session is closed). In our code, we accessed addresses within the same session that loaded the users, so it’s fine. Beginners should be mindful that once the session is closed, trying to access lazy-loaded relationships will raise a DetachedInstanceError. The solution is to access needed data while the session is open or use eager loading. Lastly, path names: we used a relative SQLite URL (sqlite:///example.db), which creates the file in the working directory – ensure you have write permissions there. If you see no output or file, check the path or use an absolute path.

Overall, this script demonstrates the typical pattern: define models, create engine, create tables, open a session, do some session.add and session.commit for inserts, then use session.query (or select) to get data back. All within about 30 lines of code, we accomplished what would take much more code in raw SQL (and it’s database-independent). This example can be expanded – for instance, you could add another user, or query with filters (session.query(User).filter_by(name="Alice").first() to get Alice specifically). But as a start, it gives a full cycle of operations using SQLAlchemy.

Database engine and connection management (core feature)

Establishing a connection (engine): In SQLAlchemy, the central object for database connectivity is the Engine. The Engine wraps the database connection pool and the dialect, and it provides an interface to execute SQL statements. To create an Engine, you use the create_engine() function with a database URL. The URL typically has the form dialect+driver://username:password@host:port/database. For example:

  • SQLite (file-based, no username/password needed): sqlite:///path/to/db.sqlite3

  • SQLite (in-memory): sqlite:// (with no file path, uses RAM)

  • PostgreSQL with psycopg2 driver: postgresql+psycopg2://user:pass@localhost/mydb

  • MySQL with PyMySQL driver: mysql+pymysql://user:pass@localhost/mydb

  • Oracle with cx_Oracle: oracle+cx_oracle://user:pass@host/dbname

SQLAlchemy includes dialects for all major databases, and choosing the correct driver in the URL ensures it uses the right DB-API. If you omit the driver (like postgresql://), SQLAlchemy will use a default (psycopg2 in the case of PostgreSQL). Once you call create_engine, SQLAlchemy does not immediately open a connection; it waits until you actually need to talk to the database. The Engine manages a pool of connections – by default, it will keep up to 5 connections open (for most DBs) to reuse, which improves performance for multiple database operations. You typically create one Engine per database in your application, and you can treat it as a singleton that can be used throughout your app (engines are designed to be thread-safe and shared).

Example:

engine = create_engine("postgresql+psycopg2://scott:tiger@localhost/testdb", echo=True)

This would set up an Engine for a PostgreSQL database named "testdb" with username "scott", password "tiger". The echo=True flag instructs the Engine to log all SQL statements it executes, which is very useful for debugging and understanding what SQLAlchemy is doing.

Using connections and transactions: There are a couple of ways to use the Engine to interact with the database:

  • Imperative (Core) style: You can acquire a Connection from the Engine and execute SQL (either text or expression constructs). For example:

    conn = engine.connect()
    result = conn.execute(text("SELECT * FROM users"))
    for row in result:
     print(row)
    conn.close()

    Here, engine.connect() gives a connection object (basically a proxy to a DBAPI connection from the pool). When you execute a statement, SQLAlchemy will implicitly begin a transaction if one isn’t already begun. You can control transactions manually by calling conn.begin() to get a Transaction object, and then commit() or rollback() on that. A convenient method is engine.begin() which returns a Connection in a transaction context manager:

    with engine.begin() as conn:
    conn.execute(text("INSERT INTO users (name) VALUES ('x')"))
    conn.execute(text("INSERT INTO addresses (email_address, user_id) VALUES ('[email protected]', 1)"))
    # engine.begin() will auto-commit if block succeeds, or auto-rollback if an exception occurs 

    This pattern is nice for executing multiple statements transactionally without explicitly calling commit or rollback.

  • ORM (Session) style: If you’re using the ORM (covered in detail in later sections), you typically don’t use engine.connect() directly. Instead, you instantiate a Session with Session(engine) or use a sessionmaker bound to the engine. The Session will handle obtaining and releasing connections from the Engine’s pool as needed. For a beginner, it’s enough to know that the Session will grab a connection when you need to communicate with the DB (at query or commit time) and will hold it for the duration of a transaction.

Connection pooling: Out of the box, SQLAlchemy engines use a connection pool (specifically, QueuePool). This means when you call engine.connect(), it will give you a connection that is either newly opened or reused from the pool if available. When you call .close() on a Connection (or the Session closes), the connection isn’t actually closed – it’s returned to the pool for reuse. This pooling greatly enhances performance in applications that make many short database requests (like web apps). The default pool size is 5 (with 5 overflow, meaning at most 10 concurrent connections can be created if needed). You can adjust these using parameters in create_engine, for example:

engine = create_engine(url, pool_size=10, max_overflow=20, pool_timeout=30, pool_recycle=1800)

  • pool_size sets the number of connections to keep open in the pool.

  • max_overflow allows extra connections (above pool_size) to be opened if all pooled connections are busy.

  • pool_timeout is how long to wait (seconds) for a connection if the pool is all checked out before raising an error.

  • pool_recycle is a useful parameter particularly for MySQL or other databases with connection timeout limits – it will recycle (refresh) connections after a certain number of seconds, essentially closing and reopening them, to avoid "MySQL server has gone away" errors due to inactivity. For example, pool_recycle=1800 will ensure no connection object is older than 30 minutes.

If for some reason you want to disable pooling (each connect truly opens a new connection and closes immediately), you can use a different pool implementation. For example, create_engine(url, poolclass=NullPool) will not pool connections (useful for certain debugging scenarios or short-lived scripts). But generally, you want pooling enabled.

Dialect and DBAPI specifics: The Engine is aware of the dialect (the database type and driver). This means it knows how to adapt things like the syntax for LIMIT/OFFSET, or how to handle insertion of server-generated primary keys. For example, on PostgreSQL the dialect will use RETURNING clauses to efficiently get autogenerated primary keys on insert, whereas on SQLite it might call a specific function to get last_insert_rowid(). Most of this is automatic. But as a developer, you should pick the correct driver in the URL. If you don’t include a driver, SQLAlchemy often picks a default one:

  • PostgreSQL default driver: psycopg2.

  • MySQL default driver: mysqldb (which might not be installed; these days many use pymysql or mysqlconnector).

  • Microsoft SQL Server default: pyodbc.

    It’s often better to be explicit, e.g. mysql+pymysql:// or mssql+pyodbc:// so you know which DBAPI is in use and can ensure it’s installed. If the driver module isn’t installed, you’ll get an ImportError when creating the engine.

Logging and debugging connections: With echo=True on engine, you’ll see all SQL statements along with a note about the connection (like Connection [some hex id] in the log). If you want more fine-grained control, you can configure the Python logging for SQLAlchemy. The engine log output is on the logger sqlalchemy.engine, which you can set to INFO or DEBUG level to see SQL and transaction events. There’s also engine.pool logger to see pool checkouts/checkins if needed.

Error handling: When working with connections at the Core level, any DBAPI exceptions (like unique constraint violations, connection errors) will be raised as SQLAlchemy exceptions (which are mostly wrappers around DBAPI exceptions). For example, you might catch sqlalchemy.exc.IntegrityError if a unique constraint fails. If you’re using the with engine.begin(): context, it will automatically rollback on exceptions. If you’re manually controlling transactions, remember to commit or rollback appropriately. One common pattern if executing multiple statements via connection is:

trans = conn.begin()
try:
conn.execute(...)
conn.execute(...)
trans.commit()
except:
trans.rollback()
 raise 

But using with engine.begin() simplifies that as shown earlier.

Multiprocessing or forking considerations: If your application forks processes (for example, using the multiprocessing module or running under uWSGI with prefork), it’s recommended to create separate Engine instances or dispose of the Engine in the child process. The reason is that if connections were open in the parent, after a fork the child process might inherit those sockets, leading to confusion. A quick way is to call engine.dispose() in the child to close any inherited connections (they’ll be reopened as needed). SQLAlchemy’s docs advise that engines and pools are not fork-safe. For threaded usage, however, Engine and pools are safe – multiple threads can share the Engine and it will synchronize access to the pool correctly.

Summary of engine usage: The Engine is the base of all SQLAlchemy operations. Beginners should understand that creating an engine does not immediately connect but prepares the application. You typically create it once (per database) and use it for the lifetime of your app. To execute statements in Core style, use engine.connect() or engine.execute() (Note: in SQLAlchemy 2.0, engine.execute() is removed, you use engine.connect().execute() or the engine.begin() context as shown). The Engine also has methods like execute() in the 1.x API but in 2.0 you usually go through Connection or Session. The Engine will handle resource management (reuse connections, etc.) so you as a developer can focus on the SQL or ORM logic. Lastly, if you want to explicitly close out an Engine (for instance on application shutdown), you can call engine.dispose(), which will close all pooled connections. If you don’t, Python’s garbage collector will eventually clean it up, but it’s polite to dispose especially for long-lived processes shutting down gracefully.

Defining tables and models (SQLAlchemy core vs ORM)

Defining the database schema in SQLAlchemy can be done in two main ways: using the SQLAlchemy Core (declarative schema) or using the ORM’s declarative classes. Both approaches ultimately define tables and columns, but their usage differs in whether you immediately tie those definitions to Python classes.

Using SQLAlchemy core (table objects): In Core, you define tables by creating instances of the Table class, usually attaching them to a MetaData object. For example:

from sqlalchemy import MetaData, Table, Column, Integer, String, ForeignKey

metadata = MetaData()
users_table = Table('users', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(50), nullable=False),
Column('email', String(100), unique=True)
)
addresses_table = Table('addresses', metadata,
Column('id', Integer, primary_key=True),
Column('email_address', String(100), nullable=False),
Column('user_id', Integer, ForeignKey('users.id'))
)

Here we constructed users_table and addresses_table with their columns and constraints. We could then call metadata.create_all(engine) to create these tables in the database. This approach does not define any Python classes to represent rows – it’s a schema-centric definition. You would work with these tables by using SQL expressions (select, insert, update) on them. For instance, select(users_table.c.name).where(users_table.c.id == 1) would create a select query on the users table. Core is ideal if you want fine control or if you don’t need a full object layer (e.g., writing a script to transform data).

Using the ORM (declarative classes): The ORM’s declarative system allows you to define classes that map to tables, essentially combining the class definition and table definition in one step. We saw an example in the previous script: by subclassing Base = declarative_base(), and setting attributes as Column objects, SQLAlchemy under the hood creates a Table object and ties it to the class. For example:

Base = declarative_base()
class User(Base):
__tablename__ = 'users' id = Column(Integer, primary_key=True)
name = Column(String(50))
email = Column(String(100))

When this class is defined, SQLAlchemy knows about a table named 'users' with the given columns. Base.metadata will include a Table for 'users' just as if we defined it with the Core approach (indeed, Base.metadata.tables['users'] would give you the Table object). The difference is that now we have a Python class User that we can instantiate and use to interact with data via the ORM. The declarative mapping automatically applies the Data Mapper pattern: instances of User correspond to rows in the 'users' table, and you can load them or save them via a Session.

Columns and types: When defining columns (in either style), you use column types from sqlalchemy (like Integer, String, Boolean, DateTime, etc.). These are SQLAlchemy’s TypeEngine objects that abstract DB-specific types. For example, String(50) will create a VARCHAR(50) on most databases. SQLAlchemy will adjust types where needed (e.g. a Boolean might become an INTEGER or CHAR on databases that don’t support a native boolean). There are also special types like JSON for databases that support JSON columns (MySQL, Postgres) – if a database doesn’t, SQLAlchemy might fall back to storing JSON as Text. You can also define custom types by subclassing TypeDecorator if needed.

Primary keys and constraints: It’s important to define a primary key on each table (especially for ORM use – the ORM requires a primary key for each mapped class to manage identity). In Core, you mark it in the Column (as primary_key=True). You can have composite primary keys by marking multiple columns; SQLAlchemy supports that. Unique constraints can be added per Column (unique=True) or as separate UniqueConstraint objects in the Table definition. Foreign keys are indicated by ForeignKey('other_table.column') on a Column. This not only enforces referential integrity in the database (when the schema is created), but also allows the ORM to know about relationships if you set up relationship() in the classes. If using Core only, ForeignKey is still important for things like metadata.create_all to generate proper DDL and also if you want to reflect and sort table creation by dependencies (SQLAlchemy’s MetaData can sort tables by foreign keys for create/drop order).

MetaData and declarative base: The MetaData object is a collection of Table objects. When you use declarative_base(), it creates a MetaData for you (accessible via Base.metadata). In bigger projects, you might organize models across several files – often they all import the same Base or the same MetaData to ensure all tables are collected together. You can also create multiple Base classes with different metadata if you need to manage multiple sets of tables or multiple databases.

Hybrid approach (classical mapping): There is a way to use the ORM without declarative, by first defining Tables (as in Core) and then mapping classes to them using mapper() or the registry.map_imperatively in 1.4/2.0. This is called classical mapping. It offers some advanced use cases (like mapping multiple classes to one table, or mapping to a subset of columns), but for most users the declarative approach is simpler and more common. Declarative essentially does two things: it declares the table structure and sets up the class <-> table mapping in one go.

Auto generation vs reflection: If you already have an existing database, you might not want to manually write out all the models. SQLAlchemy Core allows reflection: using MetaData.reflect(engine) will populate the MetaData with Table objects for all tables found in the database. Then you could use the automap extension (sqlalchemy.ext.automap) to automatically produce classes from those tables. Automap can generate classes with relationships by analyzing foreign keys. This is useful for quick start scenarios or integration with legacy databases. However, for a beginner, it’s often educational to write out the models by hand, as it makes the code self-documenting.

When to use core vs ORM definitions: If your use case is mainly raw data processing or you just need to run some SQL queries and fetch results (and you don’t need complex object behavior), you might stick to Core. For example, if writing a migration script or a data import/export script, Core gives you fine control and might be more straightforward since you’re thinking in terms of tables and SQL. On the other hand, for building an application with business logic, using ORM classes is very convenient. You can add methods to your classes (like a User.full_name() method if you stored first and last names separately, etc.), and you can navigate relationships via attributes rather than writing joins for each interaction.

Defining relationships (ORM): While Core Table definitions include ForeignKeys, the ORM goes further by letting you define relationships between classes. In our previous example we used relationship("Address", back_populates="user") in User and similarly in Address. The relationship() function doesn’t create any database structure by itself (that’s what ForeignKey on columns is for), but it tells the ORM how to handle related objects. With that in place, if you access user.addresses, SQLAlchemy knows to execute a SQL query to fetch addresses where user_id = that user’s id. You can also specify loading strategies (lazy, eager) and cascade behaviors on relationships – for instance, cascade="all, delete-orphan" means when a User is deleted from the session or database, all Address objects associated should be deleted as well (and if an Address is removed from a User’s addresses collection, it should be deleted). These help mimic real-world rules in the object model.

Creating the schema: Once tables/models are defined, you typically create them in the actual database by calling Base.metadata.create_all(engine) for ORM, or metadata.create_all(engine) for Core. This will emit CREATE TABLE statements in an order that respects foreign key dependencies (SQLAlchemy will automatically sort tables – e.g., create users before addresses because addresses depends on users). If tables already exist, create_all will skip them (it doesn’t attempt to alter existing tables). If you need to drop tables, metadata.drop_all(engine) will drop them in an order that avoids dependency issues (dropping child tables first). One thing to note: SQLAlchemy’s create_all is good for initial schema or in-memory databases, but for schema migrations over time, you’d use Alembic (which can compare your models to the DB and generate ALTERs). SQLAlchemy Core itself does not provide ALTER commands directly via an API (the philosophy is to use migrations for that).

Reserved words and quoting: If a column name conflicts with a Python reserved word (say you have a column named "class" or "order"), in ORM you can still define it by using an alternative Python identifier and providing a key or name parameter. For example:

class Vehicle(Base):
__tablename__ = 'vehicles' type = Column('type', String(50))  # 'type' is a Python built-in, but we quote it as column name 

Here we use Column('type', String) which means the actual column name in DB is "type", but the attribute on the class is also type. If that was problematic, we could do vehicle_type = Column('type', String(50)) – then vehicle_type is the Python attribute, and "type" is the column name. SQLAlchemy will quote reserved words or special characters in SQL (e.g., if a column is named "order", it will be quoted as "order" in PostgreSQL or [order] in MSSQL to not confuse with ORDER keyword). The official FAQ mentions how to handle columns that are Python reserved words – typically by using the key argument in Column or using underscores, etc..

Deferred reflection (for existing DB): If working with an existing schema, you might use autoload_with=engine in Table definitions to pull column definitions automatically. Or use automap. These topics go deeper, but it’s useful to know that SQLAlchemy can reflect tables: e.g., users_reflected = Table('users', metadata, autoload_with=engine) would populate columns from an existing users table.

Metadata vs database: Keep in mind, defining a table or model in SQLAlchemy does not by itself make any change to the database until you explicitly call create/drop or issue DDL. It’s just an in-memory description. This means you can define models for a database that already exists without recreating it – you’d just not call create_all. You’d instead perhaps use those models to query data.

In summary, defining schema in SQLAlchemy can be done via Core (Table objects) or ORM (classes). The ORM builds on Core and is usually more convenient for app development. Both share common constructs like Column, types, ForeignKey which ensure your Python representation of the schema matches the actual database schema. Good practices include always setting primary keys, naming your tables and columns clearly, and leveraging relationships in ORM to avoid manual join logic scattered through your code.

Creating, reading, updating, and deleting (CRUD) data

With the schema in place (tables defined and perhaps created in the database), the next core feature is performing CRUD operations: Create, Read, Update, Delete. SQLAlchemy provides multiple ways to do these, whether you use the ORM or the Core.

Inserting (create) data:

  • Using ORM (session.add): Once you have an ORM class, adding a new record is as simple as creating an instance of that class and adding it to a session. For example:

    new_user = User(name="Bob", email="[email protected]")
    session.add(new_user)
    session.commit()

    On commit, SQLAlchemy will issue an INSERT into the users table. If the table has an autoincrement primary key, SQLAlchemy will retrieve that new id and set it on the new_user object (so after commit, new_user.id will be populated). You can add multiple objects at once: session.add_all([obj1, obj2, ...]). Until you commit (or flush), these changes are pending in the session and not yet in the database.

  • Using core (insert statement): The SQL Expression Language offers an insert() function to create an INSERT SQL statement. Example:

    from sqlalchemy import insert
    stmt = insert(users_table).values(name="Charlie", email="[email protected]")
    engine.execute(stmt)

    In SQLAlchemy 2.0 style, you might do with engine.begin() as conn: conn.execute(stmt) to automatically commit. After this executes, the row is in the database. If you need the primary key, on some DBs you can get it via result.inserted_primary_key. For bulk inserts, you can pass a list of dictionaries to .values() or use conn.execute(insert_stmt, [ {col1:val1, col2:val2}, {...}, ... ]) to insert multiple rows in one go.

  • Bulk operations: The ORM has some bulk methods like session.bulk_insert_mappings(User, [ {name:..., email:...}, {...} ]) which bypass some of the ORM unit-of-work and directly translate into executemany insert calls. These can be faster for huge inserts (no object construction overhead), but come with caveats (they don’t populate the identity map, etc.). For beginners, using normal add/add_all is fine for moderate amounts of data. For performance-critical bulk insertion (thousands of rows), consider Core or bulk mappings. The FAQ explicitly addresses inserting hundreds of thousands of rows – doing it through the ORM one object at a time will be slow because of Python overhead per row; in such cases, use bulk inserts or the Core for speed.

Reading (querying) data:

  • Using ORM (session.query or session.execute): The ORM provides a rich query API. The classic way (SQLAlchemy 1.x style) is session.query(Model) which returns a Query object you can filter, order, etc. For example:

    session.query(User).filter(User.name == 'Alice').order_by(User.id).all()

    This would produce a SELECT with a WHERE clause and an ORDER BY, returning a list of User objects. You can chain multiple filter conditions (which combine with AND by default, or use or_ for OR conditions). There’s also filter_by(name='Alice') which is a shorthand for simple equals checks. In SQLAlchemy 2.0, the recommended approach is to use the select() function with the class:

    from sqlalchemy import select
    stmt = select(User).where(User.name == 'Alice').order_by(User.id)
    result = session.execute(stmt)
    users = result.scalars().all()

    The session.execute(select_stmt) returns a Result object; .scalars() is used because the raw result would be a Row object (or tuple if multiple columns), but since we selected an entity, .scalars() yields the User instances. .all() then materializes the list. Both approaches are fine; the 2.0 style has the advantage of more consistency with Core.

    You can also query specific columns or expressions. For example:

    session.query(User.name, User.email).filter(User.id < 5).all()

    would return a list of tuples of name and email for users with id < 5. If using select(User.name, User.email), the result would come as tuples as well (and scalars() would not be appropriate since there are multiple columns). The ORM’s query capabilities include joins (e.g. session.query(User).join(User.addresses).filter(Address.email_address.contains("@gmail.com")).all() would join users and addresses and filter addresses by those containing gmail in the email). The ORM can automatically translate relationship properties: session.query(User).filter(User.addresses.any(Address.email_address.ilike("%@yahoo.com"))).all() finds users who have at least one associated address with a yahoo email – behind the scenes it generates an EXISTS subquery.

  • Using core (select and text): With core Table objects or column expressions, you use select() function. For example:

    from sqlalchemy import select, or_
    stmt = select(users_table.c.name, users_table.c.email).where(
    or_(users_table.c.name == 'Alice', users_table.c.name == 'Bob')
    )
    with engine.connect() as conn:
    result = conn.execute(stmt)
    rows = result.fetchall()
     for row in rows:
     print(row.name, row.email)

    This uses select() from the SQL Expression API. The where clause can use Python bitwise operators & and | for AND/OR (SQLAlchemy overloads them) or use and_(), or_() functions. The result of conn.execute(select_stmt) yields a Result that can be iterated or fetched. Each row can be accessed like a tuple or by column name. The Core is very flexible: you can join tables by doing select(users_table.c.name, addresses_table.c.email_address).join_from(users_table, addresses_table, users_table.c.id == addresses_table.c.user_id) or shorter: users_table.join(addresses_table) in the select. The API mirrors SQL pretty directly.

  • Filtering and searching: Common filter operations:

    • == for equality, != for inequality.

    • .like("%foo%") or .ilike() for case-insensitive on some backends.

    • .in_([list]) for IN lists.

    • .between(x, y) for between.

    • .is_(None) / .is_not(None) for checking NULL (since == None is overloaded as .is_(None) internally).

    • .any() for relationships (as shown).

    • exists() subqueries for more complex conditions.

    Example of exists:

    from sqlalchemy import exists
    stmt = select(User).where(
    exists().where(Address.user_id == User.id).where(Address.email_address.contains("@gmail.com"))
    )
    gmail_users = session.scalars(stmt).all()

    This translates to a SQL EXISTS (subquery) checking for at least one address of the user with a gmail address.

  • Ordering and limits: You can use .order_by(Model.field.desc()) for descending order, etc. For limiting results, the ORM query has .limit(n) and .offset(n). The Core select has .limit and .offset methods as well. These correspond to SQL LIMIT n or the appropriate clause on each DB (e.g. TOP in MSSQL or FETCH FIRST in Oracle).

  • Count and aggregates: To count rows, one can do:

    total_users = session.query(User).count()

    This issues a SELECT COUNT(*) FROM users automatically (SQLAlchemy transforms the query). You can also do session.query(func.count(User.id)).scalar() for more control. In 2.0 style:

    from sqlalchemy import func
    count_stmt = select(func.count(User.id))
    total = session.scalar(count_stmt)

    The ORM .count() method has a caveat if you had joins or group_by – it might produce a subquery count; for simple queries it’s fine. Using func.count() is more explicit.

  • Eager loading: If you know you will need related objects, you can instruct the ORM to load them in the same query (to avoid the N+1 query problem). For example:

    from sqlalchemy.orm import joinedload
    user_with_addresses = session.query(User).options(joinedload(User.addresses)).filter_by(name='Alice').first()

    This will do one query joining users and addresses, loading the addresses collection eagerly. Alternatively, select(User).options(subqueryload(User.addresses)) will issue a separate second query to load addresses for all users loaded (which can be more efficient if you have many users, to not multiply data with a big join). Eager loading is an advanced feature to consider as your data access patterns grow.

Updating data:

  • Using ORM: To update an object, simply change its attributes in Python and commit:

    user = session.get(User, 1)  # fetch user with primary key 1
    user.email = "[email protected]"
    session.commit()

    The Session will detect that user.email was modified (SQLAlchemy instrumented the attribute) and on commit will emit an UPDATE users SET email='new' WHERE id=1. The flush/commit process only updates changed columns by default. If you change multiple fields, it will update them all in one statement. If an object is not in the session (detached), and you know it should be updated, you either reattach it (session.merge(detached_user) which copies its state in) or simply query it, change it, commit.

    The ORM also allows querying and updating in bulk without loading into objects using session.query(Model).filter(...).update({Model.field: value}). For example:

    session.query(User).filter(User.name == 'Bob').update({User.email: "[email protected]"})

    This will issue a single UPDATE statement affecting all users named Bob. You might need to set synchronize_session=False or 'fetch' depending on how you want the session state to be handled. For a beginner, bulk update is there but caution: by default it doesn’t refresh objects in session, so their .email attribute might be stale in memory. Often it’s fine to use for maintenance tasks. Under the hood it’s straightforward SQL.

  • Using core: Construct an update expression:

    from sqlalchemy import update
    stmt = update(users_table).where(users_table.c.name == 'Bob').values(email="[email protected]")
    result = conn.execute(stmt)
    print(result.rowcount, "rows updated")

    This executes an UPDATE...WHERE in SQL. rowcount gives the number of rows matched/updated. Core updates are useful for mass updates or scripting. Keep in mind if you use the ORM and do core updates behind its back, the ORM Session won't know about those changes unless you refresh or expire objects.

Deleting data:

  • Using ORM: To delete an object with the ORM, you load it into a session and call session.delete(obj), then commit:

    user = session.get(User, 1)
    session.delete(user)
    session.commit()

    This emits a DELETE FROM users WHERE id=1. If you have cascades on relationships, SQLAlchemy will also delete related objects in the session (for instance, if User.addresses was set to delete-orphan cascade, it will delete those address objects in memory too, and as a result they’ll be deleted from DB due to foreign key ON DELETE or separate DELETE statements if ON DELETE CASCADE is not set). In our earlier example, since we used cascade="all, delete-orphan", deleting a User via session.delete will also issue deletes for the Address entries (or if the foreign key has ON DELETE CASCADE, the DB does it and SQLAlchemy just needs to expunge them from session).

    Bulk delete without loading objects is possible:

    session.query(User).filter(User.name == 'Charlie').delete()

    This will DELETE all users named Charlie directly in the DB. Like bulk update, you might need synchronize_session parameter if you care about remaining session state (e.g., if those objects were in session). Bulk deletion is often used in maintenance scripts and is straightforward (just be cautious with relationships – without ON DELETE CASCADE in the DB, deleting parent records might violate foreign key constraints if children exist; either delete children first or configure cascade).

  • Using core:

    from sqlalchemy import delete
    stmt = delete(users_table).where(users_table.c.id == 5)
    conn.execute(stmt)

    That will delete the user with id 5. The returned result has rowcount similarly.

Transactions in CRUD: If you’re using Session in ORM, by default it wraps operations in a transaction that ends at session.commit(). If an error happens mid-commit (say violation of a constraint), the Session will rollback and you can handle the exception. If you’re using engine.begin() context for core, it similarly ensures commit or rollback. You can manually begin transactions via session.begin() as a context manager too:

with session.begin():
 # do multiple operations
session.add(u1)
session.add(u2)
# session.begin context will auto-commit or rollback on exception 

This is akin to an atomic block.

Common pitfalls in CRUD:

  • Forgetting to flush/commit – data might not actually be saved. (We mentioned this).

  • Modifying an object outside of a session – if you do user = User(name="x") and then never add it to a session or commit, it won’t be in DB.

  • Deleting objects that still have relationships without proper cascade or foreign key config – leads to IntegrityError. For example, trying to delete a User that has Address rows without ON DELETE CASCADE will error. You must delete addresses first or use session.delete(user) after configuring relationship cascade which will delete addresses in memory. The official guidance is often to enable ON DELETE CASCADE in the schema if you want DB-enforced cascade, or handle it in code.

  • Querying after commit – after you commit, the default behavior is the session expunges/invalidates objects? Actually, commit in SQLAlchemy 2.0 expunges all objects by default (so they become detached). In 1.x, commit did not expunge, it would keep them, but in a no-transaction state. In 2.x, they changed it so that calling commit/rollback makes the session not hold those objects. If you try to use them, you might need to re-add them or continue in a new transaction. This is a nuance: for beginners, just be aware that after session.commit(), the objects are detached (if you access user.name it’s fine, but if you access lazy relationship, it will error because no transaction). One way is to do things within a with Session.begin(): block or just not rely on lazy loads after commit.

  • Concurrent modifications: If two sessions load the same object, modify different fields, and commit, the last commit wins (overwriting earlier changes without warning unless you implement version counters for optimistic locking). SQLAlchemy can do optimistic concurrency control via a version_id column if you set that up.

In summary, performing CRUD with SQLAlchemy is intuitive: create objects and add to session, read/query using query API or select, update by modifying attributes or via update statements, delete by session.delete or delete statements. The Session’s unit-of-work handles batching those into efficient SQL (e.g. one insert per object, but you can optimize with bulk ops later). It’s important to manage the transaction lifecycle (open session, commit or rollback). With practice, these operations feel natural and you largely work with Python objects, letting SQLAlchemy generate SQL under the hood except when you need something custom.

Querying and filtering data (core feature)

Querying data is one of the most common activities, and SQLAlchemy provides a powerful and flexible system to retrieve exactly what you need from the database. We touched on basic querying in previous sections; here we’ll dive a bit deeper into different querying techniques, advanced filtering, and how to retrieve results efficiently.

Basic select queries (ORM): The simplest query in the ORM is to get all objects of a certain type:

users = session.query(User).all()

This will issue SELECT * FROM users and return a list of User objects. Often, you don’t actually want all records, especially if the table is large; you might use query(User).limit(100).all() to get a subset. More typically, you filter for specific records:

  • filter_by: session.query(User).filter_by(name="Alice", email="[email protected]").first() translates to a WHERE name='Alice' AND email='[email protected]'. filter_by uses keyword arguments matching column names.

  • filter: session.query(User).filter(User.name=="Alice", User.email.like("%@gmail.com%")) – here we use more expressive conditions (like with wildcards). Multiple conditions in .filter() are combined with AND automatically.

  • Comparison operators: SQLAlchemy allows Python operators to represent SQL ops:

    • == becomes =,

    • != becomes <>,

    • <, <=, >, >= as expected,

    • .like() and .ilike() for case-insensitive (on PG and others that support ILIKE),

    • .in_([list]) for IN,

    • .is_(None) for IS NULL (since == None is internally overridden to produce IS NULL).

      Example: session.query(User).filter(User.id.in_([1, 2, 3])) generates WHERE user.id IN (1, 2, 3).

  • Logical combination: For more complex logic:

    from sqlalchemy import or_, and_
    q = session.query(User).filter(
    or_(User.name == 'Alice', User.name == 'Bob'),
    User.email.ilike("%@example.com")
    )

    This finds users whose name is Alice or Bob and whose email ends with @example.com. The or_ and and_ are needed because Python’s and/or cannot be overloaded (so using | and & as infix operators is the alternative). You could write the same as ((User.name=='Alice') | (User.name=='Bob')) & User.email.ilike("%@example.com") – which might be more readable to some. The result, either way, is a single SQL WHERE clause.

  • exists() and subqueries: If you want to filter based on a condition involving a related table without actually loading the related objects (and maybe without having set up relationships in ORM), you can use a subquery or exists.

    Example: find users with no addresses:

    from sqlalchemy import not_, exists, select
    subq = select(Address.user_id).where(Address.user_id == User.id)
    users_no_addr = session.query(User).filter(not_(exists(subq))).all()

    This will generate a NOT EXISTS (SELECT 1 FROM addresses WHERE addresses.user_id = users.id) clause. If you have a relationship set up, the ORM gives an easier way: session.query(User).filter(~User.addresses.any()) – the ~ is negation, .any() yields an EXISTS for addresses. Similarly, User.addresses.any(Address.email_address.like('%gmail%')) yields exists with that condition.

  • Selecting columns vs entities: As mentioned, session.query(User.name, User.email) returns column tuples. Sometimes you want just one column:

    names = session.query(User.name).filter(User.name.ilike("A%")).all()

    That gives [('Alice',), ('Alan',), ...] list of one-tuples. If you want a flat list, you can use session.scalars(select(User.name).where(...)).all() in 2.x, or iterate and take the element in 1.x. There’s also .with_entities(User.name) method on query to change what’s being selected if needed. Keep in mind that when you select columns (not full entities), the result is not ORM objects but either tuples or whatever you specify.

  • Ordering and grouping:

    session.query(User).order_by(User.name.desc(), User.id).all()

    sorts by name descending, then id ascending. For grouping and aggregates:

    from sqlalchemy import func
    counts = session.query(User.name, func.count(Address.id)).join(User.addresses).group_by(User.id).all()

    This would give each user’s name and how many addresses they have (assuming the relationship is set up and join works). The result might be like [('Alice', 2), ('Bob', 0), ...]. If a user had no addresses, an inner join would exclude them; you could use outerjoin to include Bob with count 0 (but func.count(Address.id) with outer join will count 0 as null vs actual 0 – a trick is to use func.count(Address.id) which yields 0 for no matches anyway because count of null still counts as 0 in SQL aggregate).

  • Limiting & offset:

    To paginate:

    page = 2
    per_page = 10
    users_page2 = session.query(User).order_by(User.id).limit(per_page).offset((page-1)*per_page).all()

    That yields rows 11-20 (for page2 if 1-indexed). Many ORMs, including SQLAlchemy, also support using Paginator via external libraries or manual, but base limit/offset works. Note not all databases handle offset efficiently; for large offsets, other strategies might be needed (like using a bookmark or key for pagination).

  • Getting one or many:

    • .all() gives list,

    • .first() gives the first or None (and applies limit 1 in SQL),

    • .one() expects exactly one result or raises (if 0 or >1 results, it errors),

    • .one_or_none() returns one or None, and errors if more than one.

    • In 2.0, these are similar but often you might use session.execute(select...).one() etc.

Querying with core (non-ORM): The SQL Expression language allows everything the ORM does and sometimes more (like CTEs, window functions easily, etc.). Example:

stmt = select(User.name, Address.email_address).select_from(
User.join(Address, User.id == Address.user_id)
).where(Address.email_address.endswith("example.com"))

This manually joins users and addresses (like an explicit SQL JOIN). Alternatively, since we have ORM classes, we could do select(User, Address).join(Address, User.id == Address.user_id) which returns a row of (User, Address) tuples. If working purely with Table objects:

stmt = select(users_table.c.name, addresses_table.c.email_address).select_from(
users_table.join(addresses_table, users_table.c.id == addresses_table.c.user_id)
)
result = conn.execute(stmt).fetchall()

You’d get list of tuples.

Textual SQL: If needed, you can drop into raw SQL with from sqlalchemy import text:

result = session.execute(text("SELECT name FROM users WHERE name LIKE :pattern"), {"pattern": "A%"}).all()

This gives low-level control (and the returned results are in terms of column names or positions, not objects). Text is useful for things not easily expressed or if you already have an SQL snippet. Always use parameters (:pattern as above) to avoid injection issues; session.execute with text requires passing params separately.

Performance considerations in querying:

  • The N+1 problem: if you iterate over users and for each user access a lazy-loaded relationship (like user.addresses in a loop), you’ll issue 1 query for users + N queries (one per user) for addresses. For small N it’s fine; for large N this is slow. Solve by eager loading relationships (join or subquery load) or by restructuring query (maybe directly join and load data).

  • Large result sets: if you query a huge table without limit, fetching it can consume a lot of memory and time. If you only need part, always filter or limit. If you need to stream results without loading all into memory, the Core allows iteration or the yield_per() method in the ORM (e.g. for user in session.query(User).yield_per(1000)) to load 1000 objects at a time from the DB cursor. However, yield_per has caveats (it disables in-memory caching of relationships etc., but it’s helpful for batch processing).

  • Query count vs len: If you have a Query object q, calling q.count() issues a SELECT COUNT(*) on the database, which is efficient in the DB but if you had join loaded, it might complicate things. If you did len(q.all()), that would actually fetch all results into Python and then count them – much slower for large sets. So use count() for count.

  • exists() vs joining: Using exists().where(...).select() is often the idiomatic way to check existence. The ORM .any() we discussed uses exists under the hood. This can be more efficient than doing an outer join and checking for NULL, etc.

  • contains_eager: If you write a custom join in an ORM query, you can tell SQLAlchemy that the joined columns should be loaded into a relationship property. For example:

    q = session.query(User, Address).join(User.addresses).filter(Address.email_address.endswith("@gmail.com")).options(contains_eager(User.addresses))
    users = q.all()

    This would produce duplicate User entries for each Address in the result because of the one-to-many join. contains_eager(User.addresses) tells the ORM, "I have already joined addresses, so populate the addresses relationship rather than giving me separate Address objects." Then users will actually be a list of User objects with their addresses loaded (and duplicates consolidated). Without contains_eager, you’d get a list of (User, Address) tuples and possibly duplicate user objects. This is an advanced use case for performance when writing manual join queries with ORM.

Examples of practical queries:

  • Find all users who have no email on file (i.e., email is null): session.query(User).filter(User.email == None).all() – uses IS NULL automatically.

  • Find the three most recent addresses added (assuming an auto-increment PK roughly correlates to insertion order or a timestamp field): session.query(Address).order_by(Address.id.desc()).limit(3).all().

  • Case-insensitive search for users by name: session.query(User).filter(func.lower(User.name) == "john") – or simply use .ilike("john") which might not exactly do lower, but for PostgreSQL it uses ILIKE, for others it might not exist so .ilike might fallback to lower. Alternatively: filter(User.name.ilike("%jo%")) to find names containing "jo" in any casing.

  • Using .distinct(): Suppose you have duplicate entries or you join tables and get duplicates, you can add .distinct() to the query to get unique results (this translates to SELECT DISTINCT). It must be used carefully if you’re selecting whole objects with columns – often you want distinct on a specific column. For example, session.query(User.name).distinct().all() gets unique user names.

Integration with Pandas or others: We saw earlier you can simply use the Engine with pandas: pd.read_sql(select(User).where(...), engine) and it will handle fetching. If you want to convert a query result to DataFrame manually, you can do pd.DataFrame(result.fetchall(), columns=result.keys()) if using Core result. With ORM objects, you’d likely construct a DataFrame from a list of object attributes via comprehension.

Common pitfalls when querying:

  • Forgetting to join or mis-using filters that involve related tables without the join: If you do session.query(User).filter(Address.email_address == 'foo'), SQLAlchemy might throw an error or implicitly join if a relationship is configured. Actually, without any join, referencing Address in filter could raise because Address is not in the FROM list. In the ORM, you either need to join or use .filter(User.addresses.any(...)) if you want that convenience.

  • Ambiguous column names in multi-join: If two tables have a column with the same name, and you do query(User, Address).filter(User.id==Address.user_id).all(), you might get a warning about ambiguous column name if you had a filter or order_by just by name. It’s safer to use Model.column in filters, not bare string names, to avoid ambiguity (the ORM does a decent job labeling things though).

  • Using == None instead of .is_(None): Actually User.email == None is fine; SQLAlchemy intercepts that and produces IS NULL with a warning (or without, depending on version). But using Python and/or will not work as expected since User.name == 'x' and User.email=='y' will try to evaluate truthiness of the first expression (which is a ClauseElement, truthiness is not allowed). Use bitwise operators or separate filter() arguments.

  • Loading too much data inadvertently: e.g., calling .all() on a query without realizing the table has millions of rows. Always limit or iterate in chunks for very large data sets.

Working with results: The Result object in Core (and Session.execute in 2.0 returns a Result) provides methods like .fetchall(), .fetchone(), .scalar_one(), etc. If you use session.query, you typically get the objects or tuples directly rather than a Result. But in 2.x when you do session.execute(select(...)), you get a Result. To get ORM objects from it, you might use .scalars(). Example:

result = session.execute(select(User).where(User.name == 'Alice'))
alice = result.scalar_one()  # expects exactly one result 

This returns the User object.

In summary, SQLAlchemy’s querying API can handle simple to extremely complex queries. It allows you to remain in Python for most logic, resorting to raw SQL only for edge cases. The key is understanding how to construct filters and joins properly, and being mindful of performance (loading only what you need, using relationships wisely to avoid excessive queries, and leveraging the database to filter/aggregate as much as possible). With practice, writing a query like “find all users with at least 2 addresses whose emails are all confirmed” becomes a matter of combining these building blocks (exists or join with grouping and having count >= 2, etc.). The library’s flexibility in filtering and joining is one of its greatest strengths, giving you the full power of SQL in a Pythonic way.

Modeling relationships and joins (core feature)

Relational databases are all about relationships between tables, and SQLAlchemy’s ORM provides a rich set of tools to define and work with these relationships. We’ve already seen basic examples of one-to-many (User -> Address) relationships. Here we will discuss different types of relationships, how to configure them, and how to use joins effectively, both in ORM and Core contexts.

One-to-many relationships (Parent/Child): This is the most common type, e.g. one User has many Address entries. In the database, this is represented by a foreign key on the “many” side (addresses.user_id references users.id). In SQLAlchemy ORM, we set this up with:

  • A ForeignKey on the Address class: user_id = Column(Integer, ForeignKey('users.id')).

  • A relationship() on one or both classes. On the User side, addresses = relationship("Address", back_populates="user"), and on the Address side, user = relationship("User", back_populates="addresses").

The back_populates (or alternatively backref) ensures that the two relationships are linked – so when you append an Address to user.addresses, SQLAlchemy will also set address.user for you. The difference between using back_populates vs backref:

  • back_populates requires you to declare the relationship on both sides and name each other (as we did).

  • backref is a shortcut that declares the relationship on one side and automatically creates the other side for you. For instance, addresses = relationship("Address", backref="user") on User would automatically add user = relationship("User") on Address class. Some prefer backref for brevity; others prefer back_populates for explicitness. Functionally, they are similar, but back_populates can be clearer in code and plays nicer with type checkers.

Using this relationship is straightforward: given a User object, user.addresses is a Python list (actually a special list-like collection) of Address objects. You can append to it or remove from it, and SQLAlchemy will manage the foreign keys. For example:

user = User(name="Daisy")
addr = Address(email_address="[email protected]")
user.addresses.append(addr)
session.add(user)
session.commit()

This will insert into users, insert into addresses with the appropriate user_id (SQLAlchemy set user_id on addr automatically when we appended). If we removed an address from user.addresses and committed, by default SQLAlchemy would just update the address’s foreign key to NULL (unless we instruct it to delete the address via cascade delete-orphan).

Cascade options: The relationship() function has a cascade parameter to control what happens to related objects on session operations. For example:

  • cascade="all, delete-orphan" on User.addresses means:

    • "all" = include related objects in all session operations (so if you add a User, it will add new addresses as well; if you delete a User via session.delete, it will delete the addresses in the session as well).

    • "delete-orphan" = if an Address is removed from the User.addresses collection and has no other parent, it will be marked for deletion. Essentially it means “addresses exist only as children of a user; if they are not attached to a user, they should be deleted.”

If you don’t include delete-orphan, removing an address from the collection will just orphan it (set its user_id to NULL if DB allows, or cause constraint error if not allowed). If you do include it, removing it signals SQLAlchemy to DELETE it from DB on flush.

Cascades also control things like if you call session.delete(user), whether to automatically delete the addresses or not. By default, session.delete(user) will attempt to delete user row. If the DB has foreign key constraints without ON DELETE CASCADE, that will error if addresses exist for that user. With cascade="all, delete-orphan", SQLAlchemy will issue deletes for the related addresses first, then the user, to satisfy constraints. Alternatively (or additionally), one could set the foreign key in the DB with ondelete="CASCADE" (and tell SQLAlchemy via ForeignKey(..., ondelete='CASCADE')), which means if user is deleted in DB, the DB will auto-delete the addresses. In that case, SQLAlchemy doesn’t need to explicitly delete addresses; one must be careful to either let DB do it or ORM do it, to avoid confusion. The FAQ recommends using ON DELETE CASCADE in the database for things like logically dependent children, and the ORM can accommodate that (you’d set passive_deletes=True on relationship so SQLAlchemy won’t fetch or delete children and just trust the DB to handle it).

One-to-One Relationships: In SQLAlchemy, one-to-one is basically a special case of one-to-many where the “many” side is restricted to at most one. For example, if each user has at most one address in a separate table. You can model it by setting uselist=False on the relationship. For instance:

class Employee(Base):
__tablename__ = 'employees' id = Column(Integer, primary_key=True)
name = Column(String)
 # one-to-one with Address
address = relationship("Address", back_populates="employee", uselist=False)

class Address(Base):
__tablename__ = 'addresses' id = Column(Integer, primary_key=True)
employee_id = Column(Integer, ForeignKey('employees.id'), unique=True)  # unique constraint to ensure one-to-one
employee = relationship("Employee", back_populates="address")

Here uselist=False tells the ORM to expect a single object instead of a list for employee.address. Also, making the foreign key unique (or primary key) ensures the DB enforces the one-to-one. Now, employee.address returns an Address or None, and address.employee returns the Employee. Operations and cascade work similarly; it's just not a list.

Many-to-many relationships: Many-to-many require an association table (join table) in the database. For example, if we had a Post and Tag many-to-many (a post can have many tags, a tag can be on many posts). We’d create a table:

post_tags = Table('post_tags', Base.metadata,
Column('post_id', ForeignKey('posts.id'), primary_key=True),
Column('tag_id', ForeignKey('tags.id'), primary_key=True)
)

No additional columns needed for pure association (just two FKs, often composite PK). Then on the ORM classes:

class Post(Base):
__tablename__ = 'posts' id = Column(Integer, primary_key=True)
title = Column(String)
tags = relationship("Tag", secondary=post_tags, back_populates="posts")

class Tag(Base):
__tablename__ = 'tags' id = Column(Integer, primary_key=True)
name = Column(String)
posts = relationship("Post", secondary=post_tags, back_populates="tags")

We use secondary=post_tags to tell relationship that it’s a many-to-many via that table. Now, post.tags is a list of Tag objects and tag.posts is a list of Post objects. When you append a Tag to post.tags, SQLAlchemy will insert a row into the post_tags table on flush. Removing a Tag from the collection will remove the row from post_tags (with cascade “all” in effect by default on many-to-many, it will delete the association row, but not the Tag or Post themselves). If you delete a Post, by default SQLAlchemy will delete its association rows (because of delete cascade on association; this is internal behavior when using secondary).

One can also add extra data to association (like a timestamp when tag added) by using an association object pattern – define a class for the association with its own table, and use association_proxy. That’s advanced; many times a plain association table is enough.

Joins in queries (ORM): When querying with relationships, you often need to join tables. SQLAlchemy’s ORM allows you to join either by specifying the target class (if it can figure out the join condition via foreign keys), or by condition:

  • session.query(Post).join(Post.tags) will join the posts table to the association table and then to tags automatically (because it knows the relationship). Actually, with many-to-many, you might need to do join(Post.tags) which I think will include the association table and tag table join internally, or you might have to do join(post_tags).join(Tag) in older usage. But modern SQLAlchemy can often do it with just relationship property name.

  • For one-to-many, session.query(User).join(User.addresses) will produce a JOIN from users to addresses on users.id=addresses.user_id.

  • If you have multiple join paths or name clashes, you can specify target: session.query(User).join(Address, User.id == Address.user_id) or even session.query(User).join(Address, Address.user_id == User.id) (just need correct condition). The ORM also allows session.query(User).outerjoin(User.addresses) for left outer join.

After joining, you might want to filter on the related table. For example, to get users who have a certain tag, you could:

session.query(User).join(User.addresses).filter(Address.email_address.ilike("%gmail%")).all()

This returns users who have at least one address with gmail (but if a user has multiple addresses, this will duplicate the user in results for each matching address, unless you use .distinct() on user.id or handle duplicates in code). In such case, you might prefer using any() as we described, which uses a subquery and returns each user at most once.

Using aliases in joins: If you need to join the same table multiple times (self-joins or multiple relationships to same table), you use aliased() to give an alternate name.

For instance, consider an Employee table with manager_id pointing to another Employee (self-referential one-to-many: one manager has many employees). If we want to get employees and their managers:

Manager = aliased(Employee)
session.query(Employee.name, Manager.name).join(Manager, Employee.manager_id == Manager.id).all()

This would yield pairs of (employee_name, manager_name). The ORM could also be configured with relationship("Employee", remote_side=Employee.id) to manage hierarchy (so you can do emp.manager to get a manager and manager.team to get list of subordinates via one relationship).

Reflecting joins in results: As mentioned in the Query section, if you do session.query(User, Address).join(User.addresses), the result will be a list of (User, Address) tuples, where user may repeat if they have multiple addresses. If you want each user object once with addresses loaded, either use options(joinedload(User.addresses)) in the original query or process afterwards (group by user in Python, etc.). The contains_eager we discussed can also populate relationship if you do custom joins.

Core SQL joins: In SQL Expression (Core) usage, you explicitly join tables:

  • users_table.join(addresses_table, users_table.c.id == addresses_table.c.user_id, isouter=True) for left outer join if needed.

  • The select can chain joins: select(users_table.c.name, addresses_table.c.email_address).select_from(users_table.join(addresses_table, ...)).

  • If using the ORM select with multiple entities, it automatically knows how to join if you do .join(Address, User.id == Address.user_id) or .join(User.addresses).

Performance considerations for joins and relationships:

  • Eager loading (via joinedload or subqueryload) can reduce the number of queries but increases data per query. joinedload is best for one-to-many when you know the “many” side won’t blow up data size too much. subqueryload fetches the “many” side in a second query, which can avoid data explosion for one-to-many large lists.

  • Many-to-many with joinedload will join through association, which duplicates main objects as well. It may be fine for moderate sizes, but if each Post has 50 tags and you load 100 posts, that’s 5000 rows in join.

  • In such cases, subqueryload(Post.tags) might be better: it will first load 100 posts, then run a second query SELECT post_id, tag... FROM tags JOIN post_tags WHERE post_id IN (...) to get all tags for those posts, then assemble them in Python.

  • Fine-tune loading strategy depending on scenario: the ORM offers lazy='select' (the default lazy load one-by-one), lazy='joined' (same as joinedload by default on relationship, can set in relationship definition to always join load), etc. You can override per query with .options().

Examples of relationship queries:

  • Find all users with no addresses:

    • Option 1: session.query(User).filter(~User.addresses.any()) (translates to NOT EXISTS subquery).

    • Option 2: session.query(User).outerjoin(User.addresses).filter(Address.id == None).all() (join and filter for addresses null – careful: the join duplicates user if multiple addresses, but if using outer join and require address null, should only yield those with none).

  • Find all tags used by posts written this year:

    • could do: session.query(Tag).join(Tag.posts).filter(Post.date >= date(2025,1,1)).all() (but that yields duplicates if a tag is on multiple posts that match; adding .distinct() fixes that).

    • Or better: session.query(Tag).filter(Tag.posts.any(Post.date >= date(2025,1,1))).all().

  • Self-referential: e.g., find all employees who are managers (i.e., someone who has others referencing them):

    • session.query(Employee).filter(Employee.team.any()) if team = relationship("Employee", back_populates="manager") representing subordinates. That uses .any() on the reverse one-to-many.

Working with relationship collections in Python:

Once you have objects, you can use them naturally:

user = session.get(User, 1)
for addr in user.addresses:
 print(addr.email_address)

If addresses were not loaded yet, this triggers a lazy load query for that user’s addresses. If you had a lot of users and did that in a loop (N+1 problem), you’d want to instead query all addresses in one go or eager load them.

You can also filter a relationship in memory or use dynamic relationships. SQLAlchemy has lazy='dynamic' for relationships, which makes user.addresses not a list but a query object you can further filter (commonly used in Flask-SQLAlchemy). For example, addresses = user.addresses.filter(Address.email_address.endswith("@gmail.com")).all(). But dynamic relationships do not load the data until you call .all() etc., and you cannot iterate directly without loading. This is an option if you have extremely large collections and rarely need all children at once, or need additional filtering on them frequently.

Summary: Modeling relationships in SQLAlchemy involves setting up ForeignKey constraints and relationship() mappings. Once configured, you can navigate and query these relationships with ease. Joins can be done explicitly or implicitly via relationship property references in queries. It’s important to use the right loading strategy to avoid performance pitfalls. The ORM’s relationship system also handles referential integrity concerns in the unit-of-work (like preventing you from adding an object with a missing required parent, unless you explicitly allow it, etc.). All these tools allow you to treat related data in a more object-oriented way while still leveraging the power of relational joins under the hood.

Advanced usage and optimization techniques

As your application grows, you may need to optimize SQLAlchemy’s usage for better performance, memory efficiency, and maintainability. In this section, we cover advanced topics such as performance tuning, parallelism, caching, profiling, as well as best practices for organizing code, handling errors, documenting, and deploying SQLAlchemy-based applications.

Performance optimization

Lazy Loading vs Eager Loading: By default, SQLAlchemy’s ORM uses lazy loading for relationships – meaning it will load related objects on demand. While lazy loading keeps the initial query fast and memory footprint low, it can cause many small queries (N+1 problem) if you access lots of related objects in a loop. To optimize, identify scenarios where eager loading makes sense. Use joinedload for one-to-many or many-to-one when the expected collection size is modest and you need them immediately. Use subqueryload or selectinload (in newer SQLAlchemy) for one-to-many if the collection might be large – this issues a second query to load them all with a single IN clause, avoiding data explosion of a big join. Example:

from sqlalchemy.orm import selectinload
users = session.query(User).options(selectinload(User.addresses)).all()

This will load all users, then a second query will fetch all addresses for those users in one go (using WHERE user_id IN (list_of_user_ids)). This can significantly reduce query count and overall time for large sets. The trade-off is more memory usage at once and possibly redundant data if some relationships aren’t needed.

Batching and bulk operations: If you need to insert or update a lot of rows, the ORM can be slow doing it one object at a time because of Python overhead and flush overhead. We have bulk methods:

  • session.bulk_insert_mappings(Model, list_of_dicts) which directly translates to INSERT of many rows without constructing ORM objects. It’s much faster (SQLAlchemy’s docs note it can be an order of magnitude faster for large inserts) because it bypasses unit-of-work bookkeeping. However, it won’t put those objects in the session identity map (so you shouldn’t use them in the same session expecting to query them without re-querying).

  • Similarly, session.bulk_update_mappings(Model, list_of_dicts_with_keys_and_updates) for updates, or session.query(Model).filter(...).update({...}, synchronize_session=False) as mentioned earlier. The synchronize_session=False option tells SQLAlchemy not to try to update the in-memory state of objects (which is fine if you’ll expire or not use them further in this session). Bulk updates issue one UPDATE ... WHERE ... which can be far more efficient than loading objects and changing them.

  • Bulk deletes: session.query(Model).filter(...).delete(synchronize_session=False) does one DELETE query.

Use the core for heavy data loads: The SQLAlchemy Core can also do bulk inserts using engine.execute(insert_stmt, list_of_params) which uses DB-API executemany under the hood. For example:

conn = engine.connect()
conn.execute(users_table.insert(), [{"name": "A"}, {"name": "B"}, ...])

This leverages the DB driver’s bulk insert capabilities. For something extremely performance-sensitive like inserting hundreds of thousands of rows, you might consider using the database’s bulk load utility (like PostgreSQL’s COPY command). SQLAlchemy doesn’t wrap COPY by default (you can use raw psycopg2 for that or libraries like csv with psycopg2 copy_expert). There are libraries and patterns to do that if needed. But often, executemany via SQLAlchemy is sufficient.

Connection pool tuning: For high-throughput applications (like a web service handling many concurrent requests to DB), tuning the pool can help:

  • Increase pool_size if you have a lot of concurrent DB usage and the default 5 becomes a bottleneck (and your DB can handle more connections).

  • Use max_overflow to allow temporary bursts beyond pool_size. By default max_overflow=10, so with pool_size=5 you can have up to 15 connections if needed. If beyond that, further requests wait for a connection or error out after pool_timeout.

  • If you experience connection timeouts in logs, it means pool is exhausted – consider raising pool_size or max_overflow, or check if connections are being returned properly (no leaks).

  • Use pool_recycle to avoid stale connections. For example, MySQL has a default timeout (~8 hours). If a connection sits idle longer, the next use will get "MySQL server has gone away". Setting pool_recycle=3600 seconds ensures the pool will reconnect connections older than an hour before using them, thus preventing that error.

Asynchronous IO and concurrency: In modern Python web apps (asyncio-based, like FastAPI), you can utilize SQLAlchemy’s asyncio support introduced in 1.4+ (SQLAlchemy 2.0 has it as well). This allows running DB operations without blocking the event loop (it uses an async driver such as asyncpg for Postgres). Example:

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
engine = create_async_engine("postgresql+asyncpg://user:pass@host/dbname")
async with AsyncSession(engine) as session:
result = await session.execute(select(User).where(User.id==5))
user = result.scalar_one()

This can improve throughput in an async server by allowing other tasks to run while DB query is running. However, be aware that if your DB is the bottleneck, async doesn’t magically make it faster; it just frees up Python to do other things. If you have CPU-bound tasks, consider multi-threading or multi-processing; SQLAlchemy engine and sessions are not inherently multi-threaded across threads (Engine can be shared among threads safely, Session is not thread-safe, each thread should use its own Session or a scoped_session tied to thread). For multiprocessing (like using joblib or multiprocessing library to parallelize DB tasks), you should not share Engine across fork unless you call engine.dispose() in child processes to avoid weird connection sharing issues. It can be simpler to create a new engine in each child or use a separate process entirely for DB operations (depending on architecture).

Caching: SQLAlchemy itself doesn’t cache query results across sessions (that’s usually up to application logic or an external cache like Redis, etc.). However, it does have an internal query compilation cache that caches the SQL string for a given query structure, to avoid re-rendering the SQL each time. In 1.4/2.0, this is on by default. If you use custom types or constructs, you might see warnings about caching being skipped (like the new CacheKey warnings). Ensuring your custom TypeDecorator has cache_ok=True if it’s safe can re-enable caching. This compilation caching can improve performance of repeated similar queries significantly by skipping the overhead of building the SQL string again and again.

For caching actual data, consider:

  • Using a caching decorator at the application layer if data is static or rarely changes.

  • You can use dogpile.cache (by Mike Bayer, SQLAlchemy’s author) which integrates with regions of data. For example, caching the result of an expensive query for a short time to avoid hitting DB frequently.

  • Another pattern: relationship loader strategies can cache related collections in memory for the life of the session (that’s just normal identity map usage). But across sessions (like across requests in a web app), nothing is cached by default, which is usually fine as the DB cache (buffer pool) will handle a lot. If high read traffic on certain queries, external caching might help.

Profiling and monitoring: To identify performance issues:

  • Enable SQL logging (echo=True or better, configure logging to output to a file) to see if you have N+1 issues or slow queries. The log shows each SQL and the time in seconds if echo is True (SQLAlchemy logs like INFO SQL (0.0021s) SELECT ...).

  • Use Python profilers (cProfile, pyinstrument) to see if a lot of time is spent in SQLAlchemy’s own processing vs DB calls. If DB calls are slow, look at the database side: use EXPLAIN on slow queries (you can get the SQL from a query by doing str(query.statement) or enabling echo to copy it) and ensure indexes are in place. SQLAlchemy by itself can’t fix a missing index – that’s up to you to add to the model or in the DB. You can define indexes in models using Index() object or index=True on Column for a simple index.

  • Use the SQLAlchemy events system to track things like query execution times or connection pool checkouts if needed. For instance, event before_cursor_execute and after_cursor_execute can be hooked to measure query durations more granally. However, the echo logs suffice in most cases.

Memory usage: The SQLAlchemy Session is a unit of work that keeps track of all objects you’ve loaded or added (in the identity map). If you load a huge number of rows (say thousands of objects) into one Session, they all remain in memory until the session is closed or they’re explicitly removed. If you are processing a lot of data in batches, it’s wise to:

  • Use session.expire_all() or session.expunge(obj) to remove objects you no longer need. For example, if iterating over a large table, after processing each object you could session.expunge(obj) to remove it from identity map, freeing memory.

  • Use query iteration with .yield_per(n) which will not load the entire result into memory at once. But note: yield_per doesn’t commit the transaction in between, so if it’s a long-running read, you might hold a transaction for a while. If that’s an issue (e.g., in a read committed isolation, it might be okay; in repeatable read, you might get a snapshot and not see new data mid-iteration which might be fine).

  • Break tasks into smaller transactions. You can commit periodically and then continue if appropriate. Committing will release transaction (and by default expunge objects in 2.0 as noted), which frees memory. If in 1.x, after commit the objects remain, you might call session.expunge_all() or simply create a new session for the next batch.

Parallelism: If you want to use Python threads to parallelize DB operations, you can create multiple sessions (one per thread, or better use scoped_session which creates a session per thread automatically). The Engine and its pool can be shared safely among threads, and will hand out distinct connections. Just ensure each thread doesn’t share session or objects with another thread (unless you implement locks around it, which is not usually worth it). For CPU-bound tasks, the GIL will limit parallelism in pure Python – consider multiprocessing or C extensions.

Best practices (code organization, testing, etc.)

Code organization:

  • Models module: Define your ORM classes (models) in one module (or package). It’s common to have a models.py or a models/ package for all your Table and class definitions. Ensure all models import the same Base (declarative base) or share the same registry/metadata if using the new 2.0 registry pattern. This way, Base.metadata.create_all(engine) will see all tables.

  • Session management: Don’t use a global Session object. Instead, use sessionmaker to configure a factory, e.g. SessionLocal = sessionmaker(bind=engine, autoflush=False) (autoflush default True is fine in most cases; sometimes turned off for bulk writes until commit). Then, in each request (for web apps), instantiate a session via SessionLocal() and close it after. Frameworks like Flask use context managers or teardown functions to remove the session (Flask-SQLAlchemy provides db.session which is a scoped_session hiding these details). In general, the pattern is:

    session = SessionLocal()
    try:
     # do stuff
    session.commit()
    except:
    session.rollback()
     raise finally:
    session.close()

    Or use context: with SessionLocal() as session: which will commit or rollback depending on if exception, similar to above.

  • Avoid passing sessions all around your code if possible – instead, high-level functions can accept a session or get one from context. If using a web framework, consider dependency injection (FastAPI can provide a session per request as a dependency).

  • Separation of concerns: Perhaps use repository pattern if it suits (have functions that encapsulate queries so your business logic doesn’t contain query details all over). But don’t over-engineer; SQLAlchemy is itself a unit-of-work and repository. Many find it fine to call session.query in service code directly, while others prefer an abstraction layer.

  • Keep your SQLAlchemy configuration (engine, connection string, options) in one place (like config file or environment variables). It’s good to log or print the SQLAlchemy version and DB connection info on startup (excluding password) for troubleshooting.

Testing:

  • For testing database logic, a handy approach is to use an SQLite in-memory database if possible, because it’s fast and doesn’t require an external DB. However, in-memory lives only per connection – you have to ensure the same connection is used or use check_same_thread=False if multi-thread. Alternatively, use a temporary SQLite file on disk for each test (or per test session).

  • Create all tables at test setup via Base.metadata.create_all.

  • Use a transaction per test strategy: begin a transaction at the start of a test, and roll it back at the end, so the database is unchanged (this is easy if you can keep everything on one connection, e.g. using subtransactions or savepoints). One approach: use session.begin_nested() to open a savepoint, and session.rollback at end to rollback to that savepoint, effectively undoing changes, then session.remove.

  • Another method: after each test, drop and recreate tables (slower but ensures clean slate).

  • Factories: It’s often convenient to use factories (e.g., Factory Boy library) to create test objects easily, or just small helper functions to add sample data.

  • Test concurrency issues by simulating multiple sessions or using threads if needed. Often, verifying that locks or unique constraints behave as expected might require real database integration tests (if something is critical, consider testing against the actual database brand you use, not just SQLite, since SQLite has different locking and type behaviors).

Error handling:

  • Always anticipate that any DB operation can raise an exception (DB down, constraint violation, etc.). Catch specific exceptions where appropriate:

    • sqlalchemy.exc.IntegrityError for things like unique constraint, foreign key violations. You can inspect the string for specifics or use e.orig to get underlying DBAPI exception for more detail.

    • sqlalchemy.exc.DataError for conversion issues (like too long value for column).

    • sqlalchemy.exc.OperationalError for connection issues, timeouts, etc.

    • If using Session.commit(), any exception will cause the session to be in a rollback state (transaction is already marked as failed). You should call session.rollback() then, to make session usable again for either another try or to close it out properly.

  • Use try/except around commit if you want to handle DB errors gracefully (for example, if you want to return a user-friendly message on unique constraint failure like "username already exists"). You might examine the IntegrityError to see which constraint name it failed (some DBs include it in message).

  • Don’t catch broad exceptions too early – let them bubble up to either a centralized error handler or at least ensure you always rollback to not leave transactions open.

Documentation and maintainability:

  • Use meaningful names for your models and fields (avoid naming classes the same as built-in Python names or SQL reserved words; if must, use tablename to something else if class name conflicts).

  • Comment your models if there are any tricky aspects (like if a relationship is using a custom join condition or special cascade settings).

  • Keep an ERD (Entity-Relationship Diagram) or at least a mental model of how classes relate. Some tools can generate diagrams from SQLAlchemy models (like erdantic library or using SQLAlchemy’s MetaData to draw).

  • Keep SQLAlchemy updated; newer versions often have bug fixes and performance improvements. (E.g., SQLAlchemy 1.3 to 1.4 introduced many deprecations in preparation for 2.0; it’s good to stay current to eventually move to 2.x which cleaned up a lot).

  • If using alembic for migrations, keep your migration scripts in VCS and test applying them to fresh schema to ensure they work (especially if models and migrations drift apart).

Deployment:

  • Manage database connections via config. For instance, in a web app, tune pool size for the number of app processes/threads. Ensure the database’s max connections is >= sum of all pools from all app instances.

  • Use environment variables for DB URL (SQLAlchemy can parse an URL easily). Example: DATABASE_URL=postgresql+psycopg2://user:pass@dbserver/mydb.

  • If deploying an app with SQLAlchemy, ensure the database driver (e.g., psycopg2, PyMySQL, cx_Oracle) is installed in the environment.

  • Manage migrations (if using Alembic): typically generate migration scripts when models change and apply them during deployment (some do automatic on start, but it's safer to run a migration command or through CI/CD).

  • Monitor performance: If you have long-running app, consider enabling logging at WARN level for sqlalchemy.engine to catch warnings like too many connections, or sqlalchemy.pool to catch connection timeout occurrences.

Profiling in production: If you can’t replicate an issue in dev, consider enabling echo or logging for a short period in production (or use an APM tool that tracks SQL queries) to see slow queries. Some ORMs log queries taking > X seconds as warnings; you could implement a similar thing with events.

In conclusion, advanced usage of SQLAlchemy involves carefully balancing its powerful features with the realities of database performance. By optimizing query patterns (lazy vs eager), leveraging bulk operations, tuning the connection pool, and following best practices in session management and error handling, you can scale your application efficiently. SQLAlchemy is a mature toolkit, and most performance issues can be mitigated by using it idiomatically and paying attention to what SQL is actually being emitted (the golden rule: if in doubt, look at the SQL log!). With the right techniques, SQLAlchemy can handle high load and complex interactions gracefully, as evidenced by user reports of it sustaining very high throughput when tuned properly.

Real-world applications and examples

SQLAlchemy is used in a vast array of industries and projects. Below are several examples of how it’s applied in real-world scenarios, along with performance metrics or integration notes where applicable:

  • Web applications and microservices: SQLAlchemy is a go-to solution for database access in Python web frameworks outside of Django. For instance, Reddit (which is built on Pylons, now Pyramid) historically used SQLAlchemy as part of its stack for certain services. Many Flask applications use Flask-SQLAlchemy (which is a thin integration on top of SQLAlchemy’s ORM). In these contexts, SQLAlchemy manages the core data models like users, posts, comments, etc. The benefit is the ability to easily switch database backends – for example, a startup might start with SQLite for a prototype and migrate to PostgreSQL in production with minimal code changes beyond the connection URL. Performance-wise, SQLAlchemy can handle high traffic: a well-known anecdotal report on Reddit’s r/Python was a user handling “50-100k writes and 200k reads per second” using SQLAlchemy Core with optimized patterns. The keys were using bulk inserts and keeping the ORM out of hot paths (using Core for hot loops, and ORM for higher-level logic), and leveraging connection pooling for throughput.

  • Scientific research and data analysis: Data scientists often use SQLAlchemy to interface with databases for analysis. For example, Pandas integrates with SQLAlchemy for its read_sql and to_sql functions. A researcher might use SQLAlchemy to pull data from a Postgres database into a pandas DataFrame, do analysis, then write back results. In astronomy, projects like the SDSS (SkyServer) have used SQLAlchemy in Python tools to query their large SQL databases of celestial objects. The advantage is writing queries in Python while letting SQLAlchemy handle the DB-API differences. Performance is often not a limiting factor here because the heavy lifting is done by the database (e.g., doing aggregation in SQL) and by analytical libraries like numpy/pandas once data is fetched. SQLAlchemy just acts as glue and is efficient enough to stream results (especially using yield_per to not load entire tables in memory).

  • Financial services and FinTech: Banks and trading firms use SQLAlchemy in tools for things like risk analysis, trade reconciliation, etc. An example is Quantitative trading systems where data about trades and prices needs to be stored and analyzed. SQLAlchemy’s ORM can map complex schemas of financial instruments, and its support for multiple databases allows interfacing with whatever RDBMS the firm uses (Oracle, SQL Server, etc., thanks to dialects). One big open-source project in this domain was Zipline (quantitative trading library) which used SQLAlchemy to store historical pricing data in SQLite for backtesting. Performance metrics: SQLite via SQLAlchemy was able to handle millions of price records for backtesting when used with proper indexing, and SQLAlchemy Core was used for bulk inserts of data (because loading massive CSVs of historical prices was I/O bound, the overhead of Python was negligible compared to file reading).

  • Automation and workflow systems: Apache Airflow, a popular workflow automation platform, uses SQLAlchemy under the hood for its Metadata Database (which stores task statuses, schedules, etc.). Specifically, Airflow’s ORM models (for DAG runs, tasks, etc.) are defined with SQLAlchemy, and it supports multiple database backends (SQLite for testing, Postgres or MySQL in production). The heavy use of SQLAlchemy allows Airflow to abstract the database layer and easily add support for another DB if needed. Airflow’s performance and stability is partly owed to SQLAlchemy handling connection pooling and transactions robustly for the web server and scheduler processes. Integration: Airflow also uses Alembic (by the same author) for database migrations, demonstrating how SQLAlchemy and Alembic together provide a full solution for evolving schema over Airflow releases. Airflow typically sees many small queries (e.g., updating task state), which SQLAlchemy handles well – memory isn’t an issue because each query is in an independent session and committed immediately. The Airflow team explicitly notes that it uses SQLAlchemy and that one can configure the SQLAlchemy pool size and recycle in Airflow’s config to tune database connections.

  • Business intelligence and analytics platforms: Apache Superset, an open-source BI platform, uses SQLAlchemy extensively to connect to many different databases. It relies on SQLAlchemy’s dialect system to issue queries to various SQL engines (Postgres, MySQL, Presto, even non-SQL sources via community dialects). Superset doesn’t use the ORM for user queries; instead, it leverages SQLAlchemy Core/engine to connect and fetch data for dashboards. It also uses the ORM for its metadata (users, dashboards) under the hood. The ability to support dozens of database backends is enabled by SQLAlchemy dialects – Superset basically says: if there’s a SQLAlchemy dialect and a DB-API driver for your database, Superset can likely connect to it. This extensibility is a huge win in the BI space where you have everything from cloud data warehouses (Snowflake, BigQuery) to traditional RDBMS. Superset’s performance considerations: it mostly delegates heavy query execution to the database itself. The overhead of SQLAlchemy for these large result sets is minimal; Superset streams results and uses pandas or other means to process if needed. They do encourage usage of SQLAlchemy’s limit and other functions to ensure partial retrieval for previews.

  • Large open-source projects: The OpenStack project (cloud computing platform) makes significant use of SQLAlchemy. Many OpenStack components (Nova, Neutron, etc.) have database layers abstracted with SQLAlchemy’s ORM. OpenStack’s common library oslo.db provides helpers for using SQLAlchemy (like Query filters, session management patterns). They even collaborated with SQLAlchemy’s author to fine-tune the library for their use cases. For example, OpenStack needed efficient bulk updates and high concurrency; SQLAlchemy’s session was configured for autocommit in some cases to handle their distributed transactions. They’ve documented patterns like using session.bulk_insert_mappings for performance, and eventlet (cooperative threads) with SQLAlchemy – an interesting note: they often use with engine.begin() for transaction scoping in these multi-threaded async contexts. Performance metrics from OpenStack: a component like Keystone (identity service) can handle thousands of token validations per second, where each is a small database check. By using connection pooling (and possibly dogpile.cache for tokens), they achieved high throughput. The Glance service (image storage) uses SQLAlchemy for metadata about images; they saw that switching from an older ORM (SQLObject) to SQLAlchemy improved stability and allowed them to handle more concurrent API requests to the database with less connection leak issues.

  • CMS and publishing: Some content management systems in Python use SQLAlchemy. For instance, Kotti (a pyramid-based CMS) uses SQLAlchemy for its data models (pages, users, etc.). SQLAlchemy’s ability to easily do hierarchical queries (via adjacency list pattern or nested sets) is useful for CMS page trees. Through the ORM, something like page.children (one-to-many self-referential) can be navigated. Performance is managed by judicious use of relationship loading (often a CMS will lazy load content bodies to avoid pulling large text blobs until necessary). The result is a responsive editing interface that doesn’t do more queries than needed. A real example: Kotti might list all pages in the site – thanks to identity map, if you query all top-level pages and then for each do page.children, the children queries are separate but short; some implement an eager load to prefetch all pages to avoid N+1 in rendering menus.

  • Academic and education projects: The MIT OpenCourseWare project (and similar educational platforms) have used SQLAlchemy in backend tools to migrate and manage course data. Python apps built for universities often use SQLAlchemy to interact with Oracle or PostgreSQL databases. The benefit here is productivity – developers can quickly write data migration scripts or synchronization jobs without mastering each database’s API. The performance for these is typically I/O bound (reading files, etc.), and SQLAlchemy’s overhead is negligible. Where needed, they’ll use bulk inserts or even raw connections for maximum speed (and SQLAlchemy’s connection pooling still benefits them).

These examples illustrate that SQLAlchemy is battle-tested in scenarios from small scale (single-user desktop apps managing a SQLite DB) to enterprise scale (web services at Fortune 500 companies, scientific data pipelines processing millions of records). The ability to integrate with different contexts is a major strength. For instance, a company might use SQLAlchemy in a web API, in a background worker, and in a Jupyter notebook for analysis – all sharing the same models and database, ensuring consistency across the stack.

Integration strategies seen in the wild include:

  • Using Celery (a Python task queue) with SQLAlchemy for long-running tasks. Each Celery worker might have its own engine and sessions. A best practice is to not pass ORM objects between tasks (serialize IDs instead), and re-fetch in the task if needed. SQLAlchemy plays well here as you can easily instantiate a Session in the task and do what’s needed.

  • Combining SQLAlchemy with Pydantic or other validation libraries to create API schemas. For example, read from DB with SQLAlchemy, then feed the data into Pydantic models for output serialization. This works nicely, and libraries exist to help with conversion if desired (or simple asdict).

  • Using SQLAlchemy with asyncio frameworks (as mentioned in performance section). For example, an app using FastAPI might use SQLAlchemy’s async engine to talk to PostgreSQL. This keeps the stack non-blocking end-to-end.

Finally, in terms of performance metrics: one might ask "How fast is SQLAlchemy compared to raw SQL or other ORMs?" Generally, the raw DBAPI calls will be the fastest possible (least overhead), but SQLAlchemy is typically within a small factor of that for well-optimized code. A Dev.to article showed some benchmarks (e.g., inserting 100k rows: raw psycopg2 vs SQLAlchemy Core vs ORM) and found Core was nearly as fast as raw (maybe 5-10% slower due to abstraction), and ORM was slower mainly due to object creation overhead. However, when using bulk inserts, the difference narrowed. Many community comparisons (like the one in Tortoise ORM’s repo) show SQLAlchemy’s performance is “surprisingly similar” to others and often limited by the database itself for large operations. For reads, the bottleneck is usually network/IO; SQLAlchemy’s overhead per row is quite low (especially in 2.0 with the new simplified result handling). And for extreme needs, one can mix in raw SQL for the hottest queries while still enjoying ORM for the rest – that flexibility is why so many projects stick with SQLAlchemy as they grow.

Comparison of SQLAlchemy with alternative libraries

There are several other ORM or database libraries in Python. Here we provide a comparison table highlighting key differences and features among SQLAlchemy, Peewee, Pony ORM.

AspectSQLAlchemy (Core & ORM)PeeweePony ORM
Features & ScopeVery comprehensive: includes SQL Expression Core (for advanced queries, custom SQL) and optional ORM. Supports complex relationships, inheritance, custom types, and cross-database support.Lightweight ORM with simple, Pythonic API. Covers basic relations, joins, and has some extensions (playhouse) for advanced features. Lacks a SQL builder, but you can execute raw SQL easily.ORM focused on declarative queries using generator syntax. Offers automatic query generation from Python comprehensions, and advanced in-memory cache for objects. Lacks a separate Core; it's ORM-centric.
PerformanceHigh performance with tuning. Core is nearly as fast as raw SQL (especially with compilation caching). ORM overhead exists but can be mitigated with bulk operations and lazy loading control. Proven at very high loads. Also has optional C extensions for speed.Generally fast for simpler queries; low overhead. Peewee is known to be efficient for small-to-medium apps (its queries are straightforward). For very large ops, might require manual optimization (it has insert_many, etc.).Pony ORM emphasizes optimization: it has an internal translator that tries to generate efficient SQL for comprehensions, and an object cache. In benchmarks, Pony can be very fast for certain read-heavy operations (it often outperforms others on complex queries due to its optimization). Write performance is similar to others; uses batch operations behind scenes.
Learning CurveSteeper learning curve due to its two-layer nature and vast API. Requires understanding sessions, query building, etc. However, excellent docs and large community help. Once learned, very powerful.Very easy to start with: simple model definition, and queries feel intuitive (similar to Django but simpler). Small API surface. Good for beginners or small projects.Moderate: Unique approach (generator expressions for queries) can be magical to new users. After initial examples, it feels natural to some. The query syntax is very Pythonic (no method chaining), which some love and others find non-standard. Documentation is decent but not as extensive.
Community & SupportHuge community, long history. Extensive documentation and examples. Help available on Stack Overflow (tens of thousands of questions), mailing list, Discord/Gitter. Continuous development by maintainers (backed by naming it part of Python SQL toolkit standard). MIT licensed (permissive).Smaller community but active. Maintained primarily by one author (Charles Leifer). Docs are pretty good and cover common “recipes”. Fewer Stack Overflow questions (though enough for common issues). MIT licensed.Niche but loyal following. The query syntax attracts some enthusiasts. Community is smaller; support via forum and GitHub issues. PonyORM is Apache 2.0 licensed. Has a commercial aspect (there was an attempt to offer cloud service for Pony, but core is open).
Documentation & EcosystemExtremely extensive docs (tutorials, reference, FAQs). Many third-party extensions (e.g., Flask-SQLAlchemy, Alembic for migrations, etc.). Supports multiple DBs via dialects (included and external). Stable API (2.0 changes were major but guided by 1.4 deprecations).Documentation is good and straightforward with examples. Ecosystem smaller; a few extensions like playhouse (which adds things like signals, SQLite full-text search, etc.). Fewer built-in integrations, but Peewee is simple enough to not need many.Decent documentation including a chapter on the philosophy of their query approach. Has an interactive query editor in their ORM browser which is unique. Smaller ecosystem; not as many third-party integrations, but it can work with Dev tools like PyCharm (they added some support).
LicenseMIT License (permissive, use in commercial easily).MIT License (permissive).Apache License 2.0 (permissive).

Summary of comparison: SQLAlchemy stands out for its flexibility and power – it can handle simple use cases but also extremely complex ones (supporting multiple databases, custom query constructs, etc.). Peewee is great for smaller apps or those who want a quick and easy ORM with less boilerplate; it has fewer features but also less to learn, and it's very lightweight. Pony ORM offers an interesting approach with its Pythonic query syntax and is highly optimized in some scenarios; however, its community is smaller and some find debugging comprehensions trickier. Tortoise ORM is filling the need for an async-native ORM; it’s a good choice for new async projects, with performance and features catching up to synchronous ORMs. Django’s ORM is excellent within the Django ecosystem – it has a shallow learning curve there and lots of conveniences, but outside of Django it’s usually not used (if someone wants standalone, they pick SQLAlchemy or Peewee). Each has its niche:

  • SQLAlchemy: Best for when you need fine control, database agnosticism, or are building a non-Django app at scale. It's the generalist’s choice with enterprise capability.

  • Peewee: Best for small projects, quick prototyping, or applications where simplicity and minimal overhead are priorities.

  • Pony ORM: Good for developers who want to write queries in pure Python expressions and possibly benefit from its in-memory caching; also its performance on certain read queries might shine.

  • Tortoise ORM: Great for async web apps (FastAPI, etc.) where using SQLAlchemy asynchronously is possible but still new – Tortoise provides a Django-like comfort in async world.

  • Django ORM: If you are using Django, you get it by default and it works very well in that context. It’s less flexible outside it but within, it yields rapid development (admin UI, etc.).

In terms of support and longevity, SQLAlchemy and Django ORM have been around the longest and proven stable. Peewee and Pony are over a decade old too, with consistent maintenance. Tortoise is newer but active. License-wise, all are open-source and permissive, so no major restrictions there.

Migration guide: when and how to migrate to SQLAlchemy

Migrating an existing project to SQLAlchemy – whether from raw SQL or another ORM – can bring many benefits (robust architecture, vendor-agnostic code, richer features). However, it requires careful planning. Below we outline when you might want to migrate, and how to do it, with code examples and common pitfalls.

When to consider migrating to SQLAlchemy

  • Using raw SQL/DB-API everywhere: If your codebase is littered with raw SQL strings (e.g., "SELECT * FROM table WHERE ...") and manual cursor handling, you might face maintainability issues. Migrating to SQLAlchemy ORM can greatly improve readability and reduce error-prone boilerplate (like forgetting to parameterize a query or close a cursor). It’s time to migrate when queries become complex or database logic is duplicated across the code.

  • Outgrowing a lightweight ORM: Perhaps you started with a simple ORM (like Peewee or SQLObject) and now need features it doesn’t provide – such as support for a different database, or more complex relationship handling, or just better performance tuning. SQLAlchemy is a common upgrade path because of its flexibility and performance.

  • Switching frameworks or languages: If you are moving from Django to a Flask/FastAPI architecture, you’ll likely migrate from Django ORM to SQLAlchemy (since Django’s ORM is tied to Django). This often happens when splitting a monolithic Django app into microservices – new services might prefer SQLAlchemy.

  • Database vendor migration: If you plan to change your database (say from MySQL to PostgreSQL, or to an exotic SQL store), SQLAlchemy eases that because of its dialect abstraction. While other ORMs support multiple DBs too, SQLAlchemy is particularly known for broad dialect support (including niche ones). If your current solution doesn’t smoothly support the target DB, migrating to SQLAlchemy might be prudent before the DB switch, to use its dialect.

  • Needing advanced features: You might find yourself needing to do bulk operations, use database-specific constructs (hints, CTEs, window functions), or tune transaction scopes. SQLAlchemy’s Core and fine-grained session control make these possible where simpler ORMs can struggle.

  • Community and longevity: If your current ORM appears unmaintained or its community is dwindling, moving to the well-supported SQLAlchemy ensures future compatibility and access to help.

How to migrate – step by step

1. Plan the schema mapping: List out your current database tables and relationships. In SQLAlchemy, you’ll create classes for these. For example, if you have a table users and a table orders (with user_id foreign key), plan to create User and Order classes with a relationship.

2. Set up SQLAlchemy models side-by-side: You can introduce SQLAlchemy models into the project without immediately removing the old code. They can point to the same tables. For instance, if currently using raw SQL:

# Old way to get user
cursor.execute("SELECT id, name, email FROM users WHERE id=%s", (user_id,))
row = cursor.fetchone()

You introduce:

# Define SQLAlchemy model
Base = declarative_base()

class User(Base):
__tablename__ = 'users' id = Column(Integer, primary_key=True)
name = Column(String(50))
email = Column(String(100))

# ... similarly Order class with user_id ForeignKey('users.id') ... 

Now you can use session.query(User).get(some_id) to retrieve a user.

3. Gradual replacement of wueries: Identify queries in the old system and replace them with SQLAlchemy one by one. For example:

  • Select queries: Replace raw SELECTs or alternative-ORM queries with SQLAlchemy Query. E.g.,
    Before (raw SQL):

    def get_user_by_email(email):
    cursor.execute("SELECT id, name, email FROM users WHERE email=%s", (email,))
    data = cursor.fetchone()
     return data  # perhaps returns a tuple 

    After (SQLAlchemy):

    def get_user_by_email(email):
     return session.query(User).filter_by(email=email).first()

    Pitfall: The returned object is now a User instance, not a tuple. Ensure the calling code expects that. Possibly adjust calling code to use user.name instead of row[1], etc. This is a big shift: migrating to ORM often means refactoring code to use attributes instead of sequence indices or dictionary keys. It's worth it for clarity but requires careful testing.

  • Insert/Update queries: Replace raw INSERTs or updates with session.add or query update.
    Before:

    cursor.execute("INSERT INTO orders (user_id, total) VALUES (%s, %s)", (uid, total))
    conn.commit()

    After:

    new_order = Order(user_id=uid, total=total)
    session.add(new_order)
    session.commit()

    Pitfall: If the old code expected to immediately use the new order’s ID (some DBAPIs allow returning generated keys or doing SELECT LAST_INSERT_ID()), in SQLAlchemy after commit the new_order.id will be populated automatically. That’s a benefit, but just ensure to adapt the logic (maybe the old code manually retrieved the ID; now it's already there).

  • Complex transactions: If you had manual transaction demarcation (conn.begin() etc.), use session.begin() or the default session.commit() behavior. With SQLAlchemy you typically rely on session’s transaction, but you can still control if needed:

    # Old pseudo-code:
    conn.begin()
    cursor.execute("UPDATE accounts SET balance=balance-100 WHERE id=1")
    cursor.execute("UPDATE accounts SET balance=balance+100 WHERE id=2")
    conn.commit()

    After:

    try:
    sender = session.query(Account).get(1)
    receiver = session.query(Account).get(2)
    sender.balance -= 100
    receiver.balance += 100
    session.commit()
    except:
    session.rollback()
     raise

    This leverages the unit-of-work to handle it in one transaction (which it will by default). Pitfall: If the old code relied on database enforcing something (like triggers or cascades), ensure the same happens (SQLAlchemy will also just send two updates as above; triggers would still fire). If using session.commit(), any exception triggers rollback – we handle that as shown.

4. Migrate relationships: If moving from an ORM that uses different relationship access patterns (say, Django or Peewee), adapt to SQLAlchemy’s:

  • Peewee: uses attributes similarly, so user.orders in Peewee vs user.orders in SQLAlchemy will both be lists of related objects. But Peewee might require calling .execute() on a query to get them, whereas SQLAlchemy lazy loads when accessed. This is a fairly smooth migration: just ensure to not reuse Peewee’s model instances as SQLAlchemy's (they are different classes).

  • Django: uses user.order_set.all() vs SQLAlchemy user.orders. So, you’d replace those calls. Possibly adjust template code or business logic accordingly.

  • Raw SQL: you might have been manually querying related data; now simply use user.orders after defining orders = relationship('Order', back_populates='user') on User. This is a big win: e.g., previously to get user and their orders you did two queries or a join manually, now:

    user = session.query(User).options(joinedload(User.orders)).get(uid)
    print(user.name, "has", len(user.orders), "orders")

    Pitfall: If you lazy load without joinedload, ensure the session is still open when accessing user.orders.

5. Dual-write during transition (if needed): In some migration scenarios (especially migrating ORMs in a running app, or migrating data stores), you might want to write to both systems for a while to ensure consistency before fully switching. For example, if you want to be cautious:

# When creating a new order, use both old and new methods (for transition period) def create_order(user_id, total):
 # Old way
cursor.execute("INSERT INTO orders (user_id, total) VALUES (%s, %s)", (user_id, total))
conn.commit()
 # New way
new_order = Order(user_id=user_id, total=total)
session.add(new_order)
session.commit()

This way, whether code reads via old method or new, the data is there. This is more relevant if you're migrating data gradually or want a fallback. Usually, for ORM switch within the same DB, you don't need dual writes for long – just one deployment where both are active, then drop the old.

6. Testing and validation: This is crucial. Ensure that the new SQLAlchemy-based code returns the same results as the old code for given inputs. You can write parallel tests:

users_old = get_users_via_old_method()  # maybe returns list of dicts/tuples
users_new = session.query(User).all()  # list of User objects assert len(users_old) == len(users_new)
for old, new in zip(users_old, users_new):
 assert old['name'] == new.name
...

Or for logic: if transferring money, assert final balances are the same with both approaches. If possible, run both systems in staging with the same DB to confirm behavior matches.

7. Migration of migrations: If your old approach had its own migrations (e.g., Django migrations, or manual SQL scripts), you can switch to Alembic for new migrations going forward. You might start Alembic with autogeneration against the existing schema, to get a baseline. Common pitfall: ensure naming conventions of constraints or indices are considered if you care about Alembic continuing nicely (you can configure SQLAlchemy naming_convention to match whatever your old system had, to avoid duplicate constraint creation etc. in future migrations).

8. Deprecate/remove old code: Once you verify the SQLAlchemy code covers all functionality, remove the old raw SQL calls or old ORM usage to avoid confusion. This might be iterative – e.g., remove one module’s raw usage after migrating that module’s logic to SQLAlchemy, while others remain until converted.

Example migration: Peewee to SQLAlchemy

Suppose you have Peewee models:

from peewee import Model, SqliteDatabase, CharField, ForeignKeyField
db = SqliteDatabase('app.db')
class UserPW(Model):
name = CharField()
 class Meta: database = db
class OrderPW(Model):
user = ForeignKeyField(UserPW, backref='orders')
total = CharField()
 class Meta: database = db

To migrate to SQLAlchemy:

engine = create_engine('sqlite:///app.db')
Base = declarative_base()
class User(Base):
__tablename__ = 'userpw' # match Peewee table name if different id = Column(Integer, primary_key=True)
name = Column(String)
orders = relationship("Order", back_populates="user")
class Order(Base):
__tablename__ = 'orderpw' id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('userpw.id'))
total = Column(String)
user = relationship("User", back_populates="orders")

Now data is shared (using same tables). You can now do:

# Peewee way
u = UserPW.get(UserPW.name == 'Alice')
orders = [o.total for o in u.orders]  # Peewee backref # SQLAlchemy way
session = Session(engine)
u2 = session.query(User).filter_by(name='Alice').one()
orders2 = [o.total for o in u2.orders]
assert orders == orders2

Both yield the same result list of totals. Then you redirect code to use the SQLAlchemy portion.

Pitfalls & common issues:

  • Session Management: Newcomers often forget to commit or close sessions. In raw DB-API, you might have auto-commit or you manually commit. With SQLAlchemy, if you don’t call commit, your changes are not saved. Conversely, if the old code relied on autocommit behavior after each statement, you might end up with open transactions if you don’t commit. Solution: Use context managers or explicit commit after each logical unit.

  • Identity Map surprises: If you load the same row twice in SQLAlchemy (same Session), you get the same object. In some old code, two queries for the same record might yield two separate result dicts which you modify independently. In ORM, modifying one object’s state means any references to it reflect that change (which is usually good and consistent with unit of work, but can surprise if not expected). Usually it's a benefit (no stale duplicates), but be aware.

  • Query differences: Some complex queries might need to be re-written. For example, if you have a tricky subselect or join, you might need SQLAlchemy Core to express it. If something truly doesn’t translate easily, you can use session.execute(text("RAW SQL")) as a stopgap. But try to embrace SQLAlchemy idioms for maintainability.

  • Performance differences: After migration, profile critical paths. It’s possible the ORM is doing more work than the old raw SQL did (e.g., selecting more columns or introducing an N+1). Optimize by adding lazy='dynamic' or eager loading as appropriate, or even using .with_entities() to limit columns if needed. The migration gives opportunity to improve performance as well (like using selectinload to reduce query count).

  • Transaction Isolation Pitfall: If your old approach did some locking or specific isolation handling (like SELECT ... FOR UPDATE), ensure to replicate that. SQLAlchemy can do with_for_update() on queries to mimic SELECT FOR UPDATE. If not, you might lose some safety. So check any raw SQL that had special clauses.

Migrating between SQLAlchemy versions (e.g., 1.x to 2.x)

This is another kind of migration (within SQLAlchemy). If you have old SQLAlchemy 1.3/1.4 code and want to fully adopt 2.x style:

  • Use the SQLAlchemy 2.0 migration guide. Key changes include:

    • session.query(Model) vs session.execute(select(Model)). In 2.0, either is fine but query() is now a legacy interface (though still available via Compatibility mode). It might be just a find-replace to use select(). The pitfall is .one() on a Result vs Query works slightly differently (.one() now returns object directly rather than a tuple for single column).

    • session.begin() context usage instead of explicit commit/rollback blocks can be adopted (not mandatory, but cleans up code).

    • No implicit autocommit: In 2.0, if you execute a statement outside of session context, you must explicitly commit connection. In 1.x, some engine.execute would autocommit if not in transaction. The migration guide warns about that. If you had code relying on autocommit, you'll add commits.

    • The future=True flag in 1.4 can be used to test behavior before fully switching to 2.0.

  • In general, migrating between versions of SQLAlchemy is easier than migrating from a different ORM. They maintain backwards compatibility well (e.g., they had a long deprecation period). So this kind of migration is usually just updating some calls and syntax (like relationship(..., uselist=False) no change, but session.execute(stmt).scalars().all() vs old session.execute(stmt).fetchall() differences).

Finalizing the migration

Once everything is switched to SQLAlchemy and tested:

  • Remove old ORM libraries from requirements to avoid confusion.

  • If migrating from raw SQL, possibly drop a lot of custom DB handling code (closing connections, etc.), as SQLAlchemy sessions handle it.

  • Train the team in SQLAlchemy usage – ensure everyone knows how to do things “the SQLAlchemy way” to prevent reintroducing raw SQL out of habit.

  • Monitor after deployment: watch for any new DB errors or performance issues. If any appear, they can often be resolved by adjusting session usage or query patterns as described.

Migrating to SQLAlchemy can initially be effort, but the payoff is cleaner, more robust database code and easier maintenance down the road. Many have gone through this and reported success, noting improvements like drastically reduced lines of code for queries and fewer bugs related to SQL (like no more string formatting errors or forgetting to sanitize inputs, since SQLAlchemy handles parameters safely).

Resources and further reading

To deepen your understanding of SQLAlchemy and stay updated, here are comprehensive resources:

  • Official documentation: The primary source of truth.

    • SQLAlchemy 2.0 Documentation – Extensive and well-structured, includes a Unified Tutorial for ORM and Core, how-to guides, API reference, and migration notes. Accessible at docs.sqlalchemy.org. (Highly recommended to read the tutorial and FAQs there.)

    • SQLAlchemy Official Homepage – www.sqlalchemy.org provides links to documentation by version, release notes, and the philosophy behind SQLAlchemy.

    • Alembic Documentation – If you use Alembic for migrations (which is likely in a SQLAlchemy project), see alembic.sqlalchemy.org for guides on database migrations.

    • GitHub Repository: The SQLAlchemy source is on GitHub (sqlalchemy/sqlalchemy). Reading through issues can be informative for edge cases, and it’s the platform to report bugs or see upcoming features.

    • Release Change Logs: Whenever upgrading, consult the changelog in docs or on the site to see what changed.

  • PyPI and source:

    • PyPI (SQLAlchemy) – pypi.org/project/SQLAlchemy for latest version info (current version 2.0.43 as of Aug 2025) and install instructions.

    • SQLAlchemy on GitHub – github.com/sqlalchemy/sqlalchemy. You can find an active discussions section and issue tracker. The maintainers (including Mike Bayer) often answer questions there.

  • Tutorials and books:

    • The SQLAlchemy Unified Tutorial (in docs) – A step-by-step guide that covers from basic to advanced usage, highly recommended to go through.

    • “Essential SQLAlchemy” by Jason Myers and Rick Copeland – An O'Reilly book (2nd ed. covers up to SQLAlchemy 1.0, but concepts still apply). Good for learning fundamentals of ORM and Core.

    • “Mastering SQLAlchemy” – Check if newer books or online courses targeting SQLAlchemy 1.4/2.0 exist (the landscape evolves; e.g., Packt Publishing sometimes releases updated titles).

    • Tutorial sites:

      • RealPython has an article “Using SQLAlchemy to Manage Data in Python” and others.

      • DataCamp has a beginner tutorial “Discover SQLAlchemy”.

      • Full Stack Python (website) has a page on SQLAlchemy with links to resources.

      • Official tutorials and talks – The SQLAlchemy site links to conference talks and tutorials. E.g., Mike Bayer’s PyCon talks (“The SQLAlchemy Session In Depth” on YouTube) which can clarify internal workings.

  • Community Q&A and discussions:

    • Stack Overflow: Over 100k questions tagged [sqlalchemy]. Chances are any error or pattern you’re unsure about has an answer there. For example, questions like “What’s the difference between query.filter and filter_by?” or “How to eager load a nested relationship?” have excellent explanations. Search specifically if you have a problem (e.g., "SQLAlchemy idle in transaction issue") – often answered.

    • Reddit: Subreddits like r/Python and r/learnpython often have discussions or ask for advice on ORMs. E.g., a Reddit thread “Is SQLAlchemy recommended for highly transactional app?” had insightful answers (one citing real-world throughput with Core). Searching Reddit can reveal community sentiment and hidden gems (like performance tips).

    • Mailing List / Google Group: SQLAlchemy had a Google Group (now possibly migrated to a mailing list on groups.io) for asking questions. Not as active as SO for basic Qs, but for deeper discussion or announcement of release candidates, etc.

    • SQLAlchemy GitHub Discussions: The maintainers encourage usage of GitHub Discussions for support now. It’s a forum-like interface where you can ask how-to questions, and the community or devs respond.

  • Blogs and articles:

    • Mike Bayer’s Blog (zzzcomputing.com): Occasionally has articles on new releases or deep dives into design (for example, he wrote about the reasoning behind SQLAlchemy 2.0 changes).

    • Athelas Engineering Blog: They published "Six Ways to Optimize SQLAlchemy" – focusing on performance tweaks (like selecting specific fields, using yield_per).

    • Stack Overflow tag wiki: The tag wiki for sqlalchemy on Stack Overflow has a succinct description and some links.

    • Modern ORM comparisons: InfoWorld’s article “6 Python ORMs for every need” (which we referenced) is a good read for context and second opinions.

  • Video tutorials and podcasts:

    • YouTube: There are many talks – e.g., PyCon 2012 talk by Mike Bayer (a bit dated but still relevant basics), PyCon 2019 “ SQLAlchemy: Beyond the Basics”, etc. Also, search for “SQLAlchemy tutorial” – some channels like Traversy Media or Programming with Mosh have Python-SQLAlchemy videos.

    • Talk Python To Me Podcast:

      • Episode #5 “SQLAlchemy and data access in Python” (early interview with Mike Bayer),

      • Episode #344 “SQLAlchemy 2.0” (recent interview about new features and future).

        Both are great for hearing the philosophy and improvements straight from the source.

    • Python Bytes / Real Python podcasts: They sometimes cover ORMs and SQLAlchemy updates in episodes, which can be insightful for real-world usage tips.

  • Open source projects for reference:

    • Check GitHub for exemplary projects: For example, Invenio (a large digital library framework by CERN) uses SQLAlchemy heavily; Check out their models and usage to see a complex application structure.

    • Flask Mega-Tutorial: Miguel Grinberg’s Flask tutorial has several parts focusing on using Flask-SQLAlchemy (which is SQLAlchemy underneath). While Flask-SQLAlchemy abstracts some setup, the core query syntax and relationships are pure SQLAlchemy. His tutorial code is a good real-world-ish example on GitHub.

    • Django vs SQLAlchemy comparison: If coming from Django, you might find blog posts or stackoverflow answers comparing common patterns (e.g., migrations: Alembic vs Django’s, QuerySet vs Session.query). “Django ORM and SQLAlchemy comparison” articles exist and can be illuminating on differences and similarities.

  • Alembic and migrations:

    • Alembic Official Docs – covers how to integrate with SQLAlchemy models.

    • YouTube: “Schema Migrations with Alembic” – many conference talks or tutorials exist for Alembic, which is crucial for evolving your DB alongside SQLAlchemy models.

    • Stack Overflow (migrations tag) – for solving tricky migration issues (like how to detect changes, custom operations).

  • Community extensions:

    • Some listed on SQLAlchemy site: e.g., SQLAlchemy-Utils (additional field types and utilities), SQLAlchemy-Continuum (versioning data), GeoAlchemy (spatial extensions). These can be found on GitHub/PyPI and often have their own docs. For instance, SQLAlchemy-Utils provides handy things like aggregated attributes, GUID type, etc., which can save time.

  • F.A.Q and cookbook:

    • The official FAQ in docs covers a lot of "Why do I get this warning?" or "How to do X?". It’s a great resource to read through – you will often find answers to common pitfalls (like “Session is already flushed” errors).

    • The “Recipes” section in docs or on the wiki has example patterns (like how to do bulk inserts properly, or how to handle many-to-many with extra columns).

All these resources ensure that whether you are debugging an error message, looking for best practices, or trying to utilize a new SQL feature, you’ll find guidance. SQLAlchemy’s large user base and maintainers’ dedication (the project has been around since 2006 and is still actively updated) mean that new challenges (like asyncio support) are met with documentation and community solutions rapidly.

Finally, remember that learning by doing is key: try building a small project with SQLAlchemy or refactoring a section of an app to use it. Use the above references when you hit a snag. Over time, you’ll harness the full power and flexibility of SQLAlchemy in your projects.

Frequently asked questions (FAQs)

Below is a comprehensive FAQ with answers to common questions about SQLAlchemy. These cover installation, usage, features, errors, performance, integration, best practices, and comparisons. Each Q&A is concise and based on real queries developers often have:

  1. What is SQLAlchemy? – SQLAlchemy is an open-source SQL toolkit and Object-Relational Mapper for Python, allowing you to interact with databases using Python objects and expressions instead of raw SQL. It provides a high-level ORM for mapping classes to tables, as well as a lower-level Core for constructing SQL queries directly.

  2. Who created SQLAlchemy and when? – SQLAlchemy was created by Michael Bayer (also known as Mike Bayer) and first released in February 2006. He remains the lead developer. The project has been actively maintained for nearly two decades.

  3. What’s the latest version of SQLAlchemy? – As of 2025, the latest stable version is SQLAlchemy 2.0.43, released on August 11, 2025. The 2.x series introduced significant improvements like better async support and a simplified API.

  4. Under what license is SQLAlchemy released? – SQLAlchemy is released under the MIT License, which is a permissive open-source license. This means it can be freely used and integrated into both open-source and proprietary projects.

  5. Why use SQLAlchemy instead of writing raw SQL? – Using SQLAlchemy can greatly improve productivity and safety. It abstracts away database-specific SQL differences and uses Python objects, reducing errors (like SQL injection, which it prevents by automatic parameterization). It also provides an ORM so you can work in terms of objects and relationships, making code more maintainable than raw SQL strings scattered in your code.

  6. Is SQLAlchemy an ORM or something more? – SQLAlchemy is both an ORM and a core SQL toolkit. The ORM (Object-Relational Mapper) allows you to map Python classes to database tables and work with objects. The Core (SQL Expression Language) lets you construct SQL queries in a Pythonic way without using the ORM, if you need fine control or want to work at the SQL level.

  7. What databases does SQLAlchemy support? – SQLAlchemy supports virtually all major relational databases: PostgreSQL, MySQL, SQLite, Oracle, Microsoft SQL Server, MariaDB, Firebird, Sybase, and many others. It uses a dialect system – built-in dialects cover these, and there are community dialects for special databases (like Amazon Redshift, Google BigQuery, etc.).

  8. Can SQLAlchemy work with NoSQL databases? – Not directly. SQLAlchemy is designed for SQL relational databases. However, some NoSQL databases (like Cassandra or MongoDB) have their own Python ORMs or drivers. SQLAlchemy can’t be used for, say, MongoDB collections. It’s focused on SQL (the “Alchemy” refers to turning Python into SQL).

  9. How do I install SQLAlchemy? – Install via pip:

    pip install SQLAlchemy

    This will fetch the latest version from PyPI. For Conda users:

    conda install -c anaconda sqlalchemy

    works as well. Ensure you also install a database driver (for example, psycopg2 for PostgreSQL, PyMySQL for MySQL) as needed – SQLAlchemy will use the appropriate driver for the dialect.

  10. Why isn’t pip installing the latest SQLAlchemy? – Possibly because your environment has an older version pinned or an outdated pip. Check pip install --upgrade SQLAlchemy. If using PyPI, it should get the latest (SQLAlchemy 2.x). If you’re on an older Python (SQLAlchemy 2.x requires Python 3.7+), pip might be getting the latest compatible (for example, if you run Python 2, it will install SQLAlchemy 1.3). Upgrade your Python to use the latest SQLAlchemy.

  11. How do I connect to a database with SQLAlchemy? – Create an Engine using create_engine() with a URL. For example:

    engine = create_engine("postgresql+psycopg2://user:password@hostname/dbname")

    This engine can then be used to create sessions or execute SQL. The URL format is dialect+driver://username:password@host:port/database. Omit driver to use default (e.g., postgresql:// uses psycopg2 by default).

  12. What is the Engine and why is it important? – The Engine is the core interface to the database in SQLAlchemy. It manages the connection pool and database dialect. When you issue queries or transactions, they go through the Engine. Think of it as the home base for DB connectivity. You typically create one Engine per database, at application startup, and reuse it.

  13. How do I create tables in the database with SQLAlchemy? – Define your model classes (or Table objects) and then call Base.metadata.create_all(engine). This will issue CREATE TABLE statements for all models on that Base’s metadata. For example:

    Base = declarative_base()
    class User(Base):
    __tablename__ = 'users' id = Column(Integer, primary_key=True)
    name = Column(String)
    # ...
    Base.metadata.create_all(engine)

    will create a “users” table if it doesn’t exist. (Note: create_all doesn’t drop or modify existing tables – use Alembic for migrations).

  14. Does SQLAlchemy support database migrations? – Not within SQLAlchemy itself, but the Alembic library (authored by the same creator) is the recommended migrations tool. Alembic integrates with SQLAlchemy’s MetaData to generate migration scripts (ALTER tables, etc.). So, yes, you can do migrations, but you use Alembic in conjunction.

  15. What is a Session and why do I need it? – A Session is the ORM’s handle for database conversations. It manages a unit of work – a collection of objects being tracked for changes, and it maintains an in-memory identity map of those objects. You use a Session to query objects, add new objects, and flush changes. Think of it as a temporary workspace for objects you’re working with, which wraps a transaction. You need it to orchestrate ORM operations (while Engine can execute SQL, Session provides the ORM context).

  16. How do I create a session? – You typically use sessionmaker to configure a Session factory bound to your Engine:

    from sqlalchemy.orm import sessionmaker
    SessionLocal = sessionmaker(bind=engine)
    session = SessionLocal()  # this is a Session instance 

    In SQLAlchemy 2.0, you can also do with Session(engine) as session: directly (as a context manager). The key is binding the session to an engine/connection.

  17. When should I call session.commit()? – Call session.commit() when you want to finalize a transaction – i.e., persist all changes made in that session to the database. Typically, in web apps, you commit at the end of a request that made changes. Each commit begins a new transaction (the session stays usable unless closed). If you never call commit, your changes won’t be saved (they’ll be rolled back when the session or program ends). Conversely, if you call commit too frequently (after every tiny change), you might reduce performance by not batching operations. So commit at logical units of work.

  18. What does session.rollback() do?session.rollback() aborts the current transaction. It undoes any changes made in the database during that transaction (since the last commit/rollback). It also clears the session’s pending changes and expunges objects that were added. Use it after an exception or when you decide not to persist the changes. After rollback, the session can continue (it starts a new transaction as needed).

  19. Do I need to close a Session? – Yes, it’s good practice to close a Session when you’re done. Session is not a connection, but it holds onto connections from the pool. By closing (or using it in a with context which auto-closes), you release any resources and return connections to the pool. In web apps, you typically close the session at end of request (or use scoped_session that removes it). Not closing a session can lead to connection leaks (if it has an open transaction) or just unnecessarily long object lifetimes.

  20. Can I reuse a session for multiple operations? – In general, you use a Session for a series of operations as long as they are logically related (e.g., a single request or unit of work). You can reuse within that scope. But it’s not meant to be a long-lived (like application-wide) object. Reuse across requests or threads is not recommended. Create new sessions as needed (they are lightweight) and let them go out of scope or close after use.

  21. Is the sssion thread-safe? – No, each Session is intended to be used by a single thread (or async task). If you need to work in multiple threads, you should create a separate Session in each thread or use scoped_session which manages one session per thread behind the scenes. Sharing one session across threads can lead to race conditions or errors.

  22. What is the difference between SQLAlchemy Core and ORM? – The Core (SQL Expression Language) is a lower-level API where you work with Table objects, Column objects, and SQL functions to build queries; it returns result sets of tuples/dictionaries. The ORM builds on Core, letting you map classes to tables and work with objects. With Core you might write:

    stmt = select(users_table.c.name, users_table.c.email).where(users_table.c.id == 5)
    result = conn.execute(stmt)

    With ORM:

    user = session.query(User).get(5)
    print(user.name, user.email)

    The ORM is more convenient for complex data models, while Core is handy for ad-hoc queries, performance optimizations, or when you don’t need the overhead of object mapping.

  23. Can I use SQLAlchemy Core without the ORM? – Absolutely, yes. SQLAlchemy Core is a complete toolkit on its own. You can define tables via Table(...) objects and perform inserts, selects, updates without ever using the ORM’s class mapper. Some projects use SQLAlchemy purely as a lightweight SQL builder and connection manager (no ORM). Conversely, you can also use the ORM mostly and occasionally drop to Core or raw SQL for specific queries.

  24. How do I define a one-to-many relationship? – A one-to-many consists of a ForeignKey on the “many” side and a relationship() in the ORM classes. For example:

    class Parent(Base):
    __tablename__ = 'parents' id = Column(Integer, primary_key=True)
    children = relationship("Child", back_populates="parent")
    class Child(Base):
    __tablename__ = 'children' id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parents.id'))
    parent = relationship("Parent", back_populates="children")

    Now, parent.children gives a list of Child objects, and child.parent gives the Parent. The ForeignKey in Child ensures referential integrity in the DB.

  25. How do I define a many-to-many relationship? – Many-to-many requires an association table. Example:

    association_table = Table('user_to_group', Base.metadata,
    Column('user_id', ForeignKey('users.id')),
    Column('group_id', ForeignKey('groups.id'))
    )
    class User(Base):
    __tablename__ = 'users' id = Column(Integer, primary_key=True)
    groups = relationship("Group", secondary=association_table, back_populates="users")
    class Group(Base):
    __tablename__ = 'groups' id = Column(Integer, primary_key=True)
    users = relationship("User", secondary=association_table, back_populates="groups")

    Now user.groups is a list of Group, and group.users is a list of User. SQLAlchemy will handle the insertion into the association table when you append to the relationship and flush.

  26. How do I add a new object (row) to the database? – Create an instance of your mapped class and session.add() it:

    new_user = User(name="Alice", email="[email protected]")
    session.add(new_user)
    session.commit()

    This will INSERT a new row into the users table. After commit, new_user.id will be populated with the generated primary key (if any).

  27. How do I delete an object? – Query it and then call session.delete(obj) and commit:

    user = session.query(User).get(123)
    session.delete(user)
    session.commit()

    This issues a DELETE statement for that row. If there are related objects, you need to consider cascade rules – by default, SQLAlchemy will prevent deleting a parent if child rows still reference it (IntegrityError), unless you’ve set cascade deletes or configured foreign key ON DELETE CASCADE.

  28. Why is session.delete not removing the object from the session? – Actually, when you call session.delete(obj), SQLAlchemy marks it as deleted, but the object still exists in memory until you commit/flush. So you might still see obj accessible with its attributes (though obj in session would return False after deletion). It’s removed from session’s identity map on flush. So if you iterate session.query(User) after a delete (before commit), you won’t see it. The object instance can still be used (its primary key attribute is still there). After commit, if you access obj.name, it’s still the Python object, but if you access any lazy-loaded attribute it may error (because it’s detached). Typically, just understand that .delete() + commit means it’s gone in the database; you don’t need to remove the object from variable references manually, but it’s no longer in session.

  29. How do I update an object’s fields? – Simply change the attribute values on a retrieved object and commit:

    user = session.query(User).get(5)
    user.email = "[email protected]"
    session.commit()

    SQLAlchemy will issue an UPDATE for the changed columns. You don’t need to call anything like .save() on the object – the act of committing the session auto-flushes changes. (There is also a lower-level session.flush() if you want to push changes to DB without committing transaction.)

  30. What is flushing in SQLAlchemy? – Flushing is the process where the session sends SQL statements to the database for all pending changes in the unit of work. A flush happens automatically before commit, and can also happen before certain queries to ensure results reflect recent changes. You can call session.flush() to force it. After flush, objects still remain in the session, but their state is now synchronized with the database. A commit = flush + end transaction. A rollback = throw away unflushed changes (and if already flushed, revert the database via rollback).

  31. What is the difference between session.commit() and session.flush()?flush() writes changes to the DB but doesn’t end the transaction (no commit at the DB level). commit() will flush and then commit the transaction, releasing the connection. In other words, flush is like “apply changes now but keep transaction open”, commit is “apply and finish/lock-in changes”.

  32. Why do I sometimes see SQL statements execute before session.commit()? – This is because of the autoflush behavior. By default, SQLAlchemy will flush pending changes at certain times, e.g., when you issue a query that might need to account for those changes. For example:

    user = User(name="x")
    session.add(user)
    # We query before commit:
    session.query(User).filter_by(name="x").all()

    Here, when the query is executed, SQLAlchemy flushes the new user to the DB so that the query will return it. Thus an INSERT goes out prior to commit. This is normal. You can disable autoflush on a session or within a with session.no_autoflush: block if needed for specific situations (but usually leave it on to avoid inconsistent reads).

  33. How do I filter or query data? – Use session.query() (SQLAlchemy 1.x) or session.execute(select()) in 2.x. Common usage:

    session.query(User).filter(User.name.ilike("%john%"), User.age > 30).order_by(User.age.desc()).all()

    This returns a list of User objects where name contains "john" (case-insensitive) and age > 30, sorted by age descending. The Query API allows .filter() with conditions (using Python operators ==, <, >, etc., which SQLAlchemy translates to SQL operators) and .filter_by(name="John")for simple equality. You can chain multiple filters (they AND together). Usesession.query(Model).all()to get all,.first()to get first,.one()` to assert exactly one result (or raise), etc. In SQLAlchemy 2.0, the equivalent is:

    stmt = select(User).where(User.name.ilike("%john%"), User.age > 30).order_by(User.age.desc())
    users = session.scalars(stmt).all()

    (Using .scalars() because select(User) yields scalar results of type User).

  34. What is the difference between filter_by and filter?filter_by is a convenience method that uses keyword arguments for equality checks. For example, .filter_by(name="Alice", age=30) is the same as .filter(User.name == "Alice", User.age == 30). It's just shorter for simple comparisons. filter is more general – it takes SQL expression conditions, so you can do complex stuff (like or_(User.age < 18, User.age > 60)). Under the hood, filter_by builds those expressions automatically.

  35. How do I do JOINs in SQLAlchemy (ORM)? – If you have relationships set up, you typically don’t need to manually join for simple traversals – you can use user.orders or query via relationship filters. But you can join explicitly:

    session.query(User).join(User.orders).filter(Order.total > 100).all()

    This will join users and orders on the foreign key automatically (because the relationship is known) and filter orders with total > 100, returning User objects that have such orders. If you want results that include fields from both, you can query multiple entities:

    session.query(User, Order).join(User.orders).all()

    That returns a list of tuples (User, Order) for each matching join pair. You can also join to a model by specifying:

    session.query(Order).join(User, Order.user_id == User.id).filter(User.name == 'Bob').all()

    This returns Order objects for Bob. So, use .join(RelationshipProperty) or .join(OtherModel, condition) depending on need. The ORM will handle proper join conditions if relationships are set. For outer joins, use .outerjoin(...).

  36. How can I eager-load related objects to avoid N+1 queries? – Use joined loading or subquery loading via options() on the query. For example:

    from sqlalchemy.orm import joinedload
    user = session.query(User).options(joinedload(User.orders)).filter_by(id=1).one()

    This will do one SQL JOIN to load user #1 and all their orders in one go. The User.orders relationship collection will be populated without a separate query. Alternatively, selectinload(User.orders) (or subqueryload in 1.x) will issue a second separate query for orders but still avoid per-user queries. Eager loading is great to improve performance when you know you’ll need the related data for many objects. Use it to prevent the N+1 problem.

  37. What is the N+1 problem in ORMs? – It refers to the scenario where you load N parent objects, and then for each parent you lazy-load a collection or related object, resulting in N additional small queries. For example, loading 50 users and then accessing user.orders for each user would run 50 separate queries (N additional queries for N users). This is inefficient. The solution is eager loading (as above) or querying with joins to get all needed data in fewer queries. Many ORMs have this issue if used naïvely – it’s not specific to SQLAlchemy, but SQLAlchemy gives you tools like joinedload to fix it.

  38. How do I get raw SQL from a query for debugging? – You can use the str() function on a SQL expression or query’s statement. For example:

    stmt = select(User).where(User.name == 'x')
    print(str(stmt))

    This prints the SQL (with parameter placeholders). Or for an ORM Query (1.x), str(session.query(User).filter_by(name='x').statement) gives the SQL statement string. Alternatively, set echo=True on create_engine to have all SQL printed to stdout with each execution.

  39. Why are there question marks or “%s” in the printed SQL? – SQLAlchemy uses parameter placeholders (the style depends on DBAPI). For instance, in SQLite or PostgreSQL logs, you might see SELECT * FROM users WHERE name = ?. In MySQL (pymysql) it might show %s. These are not Python format strings; they are DBAPI param tokens. The actual values are sent separately, not printed. This prevents SQL injection and is how prepared statements work. If you see ? or :param and want to see actual values, you can enable logging at the engine level; the SQLAlchemy log will show bound parameters either separately or inline depending on configuration (with echo=True, by default it shows something like INFO - [generated in 0.001s] (...) {'name': 'x'} indicating the param dictionary).

  40. Why are my percent signs doubled in the SQL string? – When you print or log a SQLAlchemy statement that includes a LIKE or ILIKE with a %, you might see it as %%. For example, filter(User.name.like('%abc%')) might print name LIKE %%abc%%. This is because the % is a special character in the SQLAlchemy text representation (it’s escaping it to not be confused with Python printf-style). It’s normal. When executed, it will use the proper single % for the SQL dialect. Just be aware printing it shows doubled to represent a literal percent sign.

  41. How do I do a case-insensitive filter (LIKE) in SQLAlchemy? – Use .ilike() for case-insensitive on databases that support ILIKE (Postgres). For others (like MySQL, which by default is case-insensitive for text depending on collation, or SQLite which can use COLLATE NOCASE), you could use func.lower():

    from sqlalchemy import func
    session.query(User).filter(func.lower(User.name) == "john").all()

    But .ilike is simplest and will be translated to appropriate SQL or emulated if needed.

  42. How do I do an OR filter or AND combination? – Use or_ and and_ from sqlalchemy.sql.expression (or from sqlalchemy import or_). For example:

    from sqlalchemy import or_
    session.query(User).filter(or_(User.age < 18, User.age > 60)).all()

    This will give users under 18 or over 60. You can also use the | and & operators between expressions: e.g., (User.age < 18) | (User.age > 60) inside a filter will produce the same OR condition, because SQLAlchemy overloads these operators. Ensure you group conditions properly with parentheses when using &/| to avoid Python precedence issues.

  43. How can I get a count of results? – Use session.query(Model).count() which emits a SELECT COUNT(*) for that query. For example:

    total_users = session.query(User).filter(User.active == True).count()

    This performs efficiently in the DB. Alternatively, func.count:

    from sqlalchemy import func
    session.query(func.count(User.id)).filter(User.active == True).scalar()

    either is fine. If you already have a query object and want count without re-specifying filters, you can do .count() on it (but note: if it has complex joins, .count() might simplify them or require a subquery – SQLAlchemy tries to handle that).

  44. Why does query.count() sometimes give a different number than len(query.all())? – If your query involves joins that create duplicate rows, count() (which typically translates to COUNT(*) on the underlying FROM with joins) might count duplicates unless you use distinct. For example, if you join User->Order, a user with 3 orders will appear 3 times in query.all(), and count() would count 3. But len(query.all()) would also yield 3 in that scenario, so they’d match. However, if you did len(session.query(User).join(User.orders).all()), you might get a number larger than distinct users count. To get unique count of users, you’d do session.query(User).join(User.orders).distinct().count(). In short, difference arises if you didn’t account for duplicates in one method and did in the other. Also, query.count() executes an SQL COUNT at the DB, whereas len(query.all()) pulls all objects to Python then counts – the latter is less efficient.

  45. How do I get only specific columns or a mix of columns from different tables? – Use entity selection or the Core. For example:

    session.query(User.name, User.email).filter(User.id == 5).first()

    returns a tuple like ('Alice', '[email protected]'). You can mix columns:

    session.query(User.name, Order.total).join(User.orders).filter(Order.id == 10).first()

    returns (name, total) tuple for that join. In 2.0 style, you’d use select(User.name, Order.total).... If you want a dictionary of results, you can use Result.keys() and iterate, or use Core with conn.execute() which returns rows accessible by name. Also note: if you select a single column (like session.query(User.name)), the result of .all() will be list of tuples or a special result object. Actually, query(User.name).all() returns list of scalar values (SQLAlchemy unwraps it because it’s one column) – in 1.x at least. In 2.x, session.scalars(select(User.name)).all() will give list of names.

  46. How do I map a join query to objects? – If you query multiple models, e.g. session.query(User, Order).join(User.orders), you get each User and Order paired. If you want to have related objects loaded in parent (to avoid separate queries), use joinedload as earlier – that way you just query User and it pulls in Orders transparently, and you access via user.orders. The contains_eager option can be used if you manually joined and want to populate the relationship property. E.g.:

    q = session.query(User).join(User.orders).options(contains_eager(User.orders)).filter(Order.total > 100)
    users = q.all()

    Each user in users will have an orders collection loaded with orders (only those meeting filter, which in a join might be a subset of all orders). This is more advanced and used if you manually control the join but still want ORM to assign the results to relationship.

  47. How do I use transactions explicitly? – The Session is transactional by default; everything before commit is in one transaction. If you need to control it:

    • You can use session.begin():

      session.begin()  # starts a transaction (if not already started) # ... do stuff
      session.commit()

      But normally, session.commit() will begin a new one automatically after commit.

    • For multiple transactions in one session, you could do:

      session.commit()  # commits first transaction # now session is ready for next
      session.add(obj2)
      session.commit()  # second commit 

      Each commit/rollback ends the current transaction and you can continue using session.

    • If you want a sub-transaction or savepoint, you can use session.begin_nested() (common for test scenarios).

    • You can also manage transactions at connection level using engine.begin() context (especially in Core usage or if mixing multiple sessions/ engines).

      E.g.,

      with engine.begin() as conn:
      conn.execute(text("UPDATE ..."))
      conn.execute(text("INSERT ..."))

      This will commit at context exit, or rollback on exception.

  48. How do I prevent SQL injection with SQLAlchemy? – SQLAlchemy by default uses parameter binding, which makes it safe from injection as long as you pass user inputs as parameters, not concatenated strings. For example, session.query(User).filter(User.name == user_input) will bind user_input as a parameter (safe). The dangerous scenario is if you were to use text() with untrusted string interpolation. For raw queries:

    session.execute(text("SELECT * FROM users WHERE name=:name"), {"name": user_input})

    is safe. But building SQL by string concatenation is not (e.g., text("... name='" + user_input + "'")). So the answer: just use the library’s parameter system (which you normally would). In short, SQLAlchemy, when used idiomatically, is safe from injection.

  49. Why is my query returning a Row or Tuple instead of model instances? – If you use session.execute(select(...)) in SQLAlchemy 2.0, it returns a Result of Row objects by default. If you selected model columns or used select(User) it will give Row[User] which you need to unpack. The Result.scalars() method is the key – calling .scalars() on the result tells SQLAlchemy to yield the model objects (or scalar values) instead of Row objects. For example:

    result = session.execute(select(User.name))
    for row in result:
     print(row[0])  # each row is a Row, index 0 is name
    result = session.execute(select(User.name))
    for name in result.scalars():
     print(name)  # now we're getting the scalar directly 

    Similarly, session.execute(select(User)) – use .scalars() to get User instances. In ORM 1.x, using session.query(User) directly returns model instances, so this confusion is mostly in the newer 2.x usage. If you do session.query(User.id, User.name), you get a tuple of (id, name) per result.

  50. How do I handle composite primary keys or multiple-column foreign keys? – SQLAlchemy fully supports composite PKs and FKs. You define a composite primary key by marking multiple columns as primary_key=True:

    class Vote(Base):
    __tablename__ = 'votes'
    user_id = Column(Integer, primary_key=True)
    poll_id = Column(Integer, primary_key=True)
    choice = Column(String)

    This means (user_id, poll_id) together identify a Vote. You can then map a relationship with a composite foreign key by using ForeignKeyConstraint on the table or in the relationship use primaryjoin if auto deduction fails. But often you just define ForeignKey on both and SQLAlchemy figures it out. For example, if another table had both user_id and poll_id, you could do:

    class PollVote(Base):
    __tablename__ = 'poll_votes'
    user_id = Column(Integer, ForeignKey('votes.user_id'))
    poll_id = Column(Integer, ForeignKey('votes.poll_id'))
     # SQLAlchemy will pair them as composite foreign key linking to Vote's composite PK. 

    And you’d likely use ForeignKeyConstraint(['user_id', 'poll_id'], ['votes.user_id', 'votes.poll_id']) for clarity in table_args.

  51. Why is query.get() or Session.get() returning None for an existing row? – Possible reasons:

    • The identity map may already have a different instance with that primary key, but with different params – not likely, .get uses identity map.

    • More likely: you’re calling get with the wrong primary key type or value. For session.get(User, some_id), ensure some_id is the exact primary key (if composite PK, .get only works with the full identity tuple e.g., (user_id, poll_id)). If some_id is string but PK is int, it might not auto-cast.

    • If object was in the session as deleted or expired, get might not re-load. Actually, .get will check identity map first. If the object was deleted in this session, .get returns None (because identity map knows it's deleted).

    • If you haven’t flushed a new object, .get wouldn’t see it unless flushed. But .get usually hits the DB anyway if not in identity map.

      In short, check that the ID exists in DB, the session is fresh (or try session.expire_all() and then get), and that you’re using the correct key. Also note, .get() only works by primary key – if you try to use it for other fields, it won’t. Use filter for that.

  52. What is the difference between session.query(User).get(id) and session.query(User).filter_by(id=id).one()?query.get(id) (or in 2.x, Session.get(User, id)) is a shorthand that will return the object or None, without raising if not found. It also will not emit a SQL query if the object is already in the identity map; it will directly return it. filter_by(id=id).one() will always issue a query and will raise NoResultFound or MultipleResultsFound exceptions if criteria not met exactly. .get is convenient and efficient for primary key lookup, whereas .filter_by(...).one() is more general (works for any filter, and ensures one result or exception).

    Example:

    user = session.get(User, 5)
    # vs
    user = session.query(User).filter_by(id=5).one()

    The first returns None if not found, second raises if not found.

  53. How do I handle unique constraints or handle duplicates gracefully? – If you have a unique constraint (like unique username), and you attempt to insert a duplicate, the database will raise an IntegrityError. SQLAlchemy will throw sqlalchemy.exc.IntegrityError. You should catch that in a try/except around session.commit() and respond accordingly (e.g., inform user that name is taken). There’s no built-in “upsert” high-level method in ORM (though SQLAlchemy 1.4 added insert(...).on_conflict_do_update for Postgres in Core). For a graceful upsert, you either catch the error and then do update, or use the Core’s dialect-specific upsert features.

  54. How to do an “upsert” (insert or update on duplicate) with SQLAlchemy? – The ORM doesn’t have a direct upsert. But the Core can for certain DBs:

    • For PostgreSQL, you can use the postgresql.insert(...).on_conflict_do_update(...) method in SQLAlchemy Core.

    • For example:

      from sqlalchemy.dialects.postgresql import insert
      stmt = insert(User).values(id=1, name='x').on_conflict_do_update(
      index_elements=['id'],
      set_=dict(name='x')
      )
      session.execute(stmt)
      session.commit()

    This will insert or update name if id 1 exists. For databases without native upsert, you can simulate by attempting insert and on IntegrityError do update. Or just do a query to see if exists then update or insert accordingly (with transaction serialization to avoid race conditions as needed).

  55. How do I handle transactions that span multiple sessions or processes? – Typically, you don’t span a SQLAlchemy Session across processes – each process should have its own engine/Session. If you need a distributed transaction across different databases or systems, that’s complex (XA transactions, two-phase commit). SQLAlchemy does support two-phase commit among multiple databases via engine.begin_twophase() and session.begin_nested (if configured), but this is advanced. For multiple sessions on same DB in threads, if you want atomicity, you can share a connection between them or use subtransactions. It’s generally easier to just use one session. If you absolutely need to coordinate multiple sessions, you might manually control the transaction with the connection. For instance:

    conn = engine.connect()
    trans = conn.begin()
    sessionA = Session(bind=conn)
    sessionB = Session(bind=conn)
    # ... do stuff with A and B
    sessionA.flush(); sessionB.flush()
    trans.commit()

    But sharing a connection like that is not usual; it could be done for splitting work but better to reorganize to one session if possible.

  56. Can I use SQLAlchemy with asyncio (async/await)? – Yes, as of SQLAlchemy 1.4+, there is support for asyncio via an AsyncEngine and AsyncSession. You use create_async_engine() with an asyncpg or other async driver. Then:

    async with AsyncSession(engine) as session:
    result = await session.execute(select(User).where(User.id==1))
    user = result.scalar_one()

    The support is solid, but note: the ORM usage is essentially the same, just using await for DB calls. Not all dialects have async drivers (Postgres does, MySQL has aiomysql, etc.). There is also async_sessionmaker to create session factories for easier use. In summary, yes SQLAlchemy works with asyncio by providing an async interface to the core and session operations, making it suitable for frameworks like FastAPI or any async event loop environment.

  57. What is Flask-SQLAlchemy and do I need it? – Flask-SQLAlchemy is a Flask extension that wraps SQLAlchemy to integrate with Flask app context. It provides a db object (which includes a Model base class and db.session) so you don’t have to manually configure engine and session for Flask. It also handles context teardown (closing sessions after request). Underneath, it’s the same SQLAlchemy library. If you’re using Flask, Flask-SQLAlchemy is convenient. If not using Flask, you don’t need it. Everything it does can be done with plain SQLAlchemy by configuring sessionmaker and tying session lifecycle to your framework’s events.

  58. How do I integrate SQLAlchemy with FastAPI? – FastAPI is async, so you’d likely use SQLAlchemy’s async engine & session. The common pattern: create an async_sessionmaker(bind=engine, expire_on_commit=False) (expire_on_commit False is often used to prevent issues with detached objects in fastapi – you can turn it on if you manage session scope carefully). Then use FastAPI dependency injection to provide a session per request:

    async def get_session():
     async with async_session() as session:
     yield session

    And then in your path operation dependencies: session: AsyncSession = Depends(get_session). Within the endpoint, use await session.execute(...) etc. This ensures each request gets its own session that is closed at request end (thanks to context manager). There are many FastAPI + SQLAlchemy examples in documentation and tutorials.

  59. Why is my object’s repr or str showing <object at 0x...>? – If you don’t define __repr__ or __str__ in your ORM class, the default from Base might just show the object identity (unless you’re using declarative which might give something like <User(name='x', id=1)> – actually, by default, the declarative base sets a repr that prints the class name and primary key). If you want a nice string, implement def __repr__(self): return f"User(id={self.id}, name={self.name})". It’s good practice for debugging to define repr in your models (as seen in the Wikipedia example).

  60. I’m getting “Class X is not mapped” error – what does that mean? – It means you tried to query or use a class that SQLAlchemy doesn’t know is an ORM model. Possibly you forgot to inherit from Base or the class hasn’t been imported such that Base.metadata is aware of it. Ensure your model class inherits from Base (the declarative base) and that you’ve imported the module where class is defined before creating tables or using it. If you have multiple Base (like multiple metadata), you might accidentally pass wrong class to a session from another Base. Typically, the fix is: verify model class definition and that it’s registered (e.g., if using classical mapping, ensure mapper() is called). In summary, “not mapped” means SQLAlchemy has no mapping for that class name – check for typos or missing declarative decorators.

  61. I changed my model class (e.g., added a column), how do I update the database? – Changing the Python model doesn’t automatically alter the database schema. You need to generate a migration. Use Alembic for this:

    • Initialize Alembic (alembic init).

    • Configure it to your SQLAlchemy Base (in env.py, import your Base and target_metadata = Base.metadata).

    • Run alembic revision --autogenerate -m "Added new column" – Alembic will compare the model (metadata) to DB and create a migration script if possible.

    • Run alembic upgrade head to apply it (this issues an ALTER TABLE to add the column, etc.).

      If you don’t use Alembic, you’d have to manually ALTER the table (via engine.execute("ALTER TABLE ...")). It’s highly recommended to use Alembic for schema changes in production apps.

  62. What is the difference between declarative_base() and automap_base()?declarative_base() is used to define models yourself. automap_base() is a feature that reflects an existing database schema into model classes automatically. With automap:

    Base = automap_base()
    Base.prepare(engine, reflect=True)
    User = Base.classes.users  # assumes there is a users table 

    It generates classes on the fly with attributes for columns and relationships (if foreign keys are present, it can guess relationships). This is useful for quick and dynamic scenarios or interacting with legacy DB without writing models manually. However, for most structured development, you’ll use declarative_base and define classes explicitly.

  63. How can I reflect an existing database to SQLAlchemy models? – Use MetaData.reflect or automap. Example with reflection:

    metadata = MetaData()
    metadata.reflect(engine)  # reads all tables
    users_table = metadata.tables['users']

    Then you could use Table objects or automap to produce classes. Automap (above in 62) is the convenient way to get classes. Note automap won’t produce class names exactly as table names if not valid Python, but usually it will (like table address_emails might become class AddressEmails). Also, automap tries to guess relationships for foreign keys. This is a quick way to use SQLAlchemy on an existing schema without writing classes.

  64. How do I configure cascading deletes or behavior on relationships? – Use the cascade parameter on relationship. Common example:

    • cascade="all, delete-orphan" on a one-to-many relationship means if you delete the parent, all children are deleted as well, and if you remove a child from parent’s collection, that child will be deleted (assuming it’s not attached to another parent).

    • You can also do passive_deletes=True on relationship if the database will handle cascades (ON DELETE CASCADE in FK) and you don’t want SQLAlchemy to issue separate DELETEs for children.

    • By default, cascade="save-update, merge" which means adding a parent will also add its new children (save-update), but deleting a parent will not delete children – it will just nullify the relationship in session if you loaded them. You’d get an IntegrityError if you try to delete parent without handling children.

      So, set cascade="all, delete-orphan" if you want full cascading in the ORM. Otherwise, delete children manually or enable DB ON DELETE.

  65. What does back_populates vs backref do in relationships? – Both create a bidirectional linkage. backref is a shortcut that adds the reverse relationship automatically on the other class. back_populates requires you to explicitly put relationship on both sides and name the connection. For example:

    class Parent(Base):
    __tablename__ = 'parent' id = Column(Integer, primary_key=True)
    children = relationship("Child", back_populates="parent")
    class Child(Base):
    __tablename__ = 'child' id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parent.id'))
    parent = relationship("Parent", back_populates="children")

    vs using backref:

    class Parent(Base):
     id = Column(Integer, primary_key=True)
    children = relationship("Child", backref="parent")
    class Child(Base):
     id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parent.id'))
     # parent relationship auto-added by backref on Parent 

    In both cases, you can do child.parent and parent.children. back_populates is considered clearer in code (explicit), while backref saves a few lines. Functionally they are similar; with backref you can also specify backref(..., cascade="...") options in one place.

Resources

  1. SQLAlchemy documentation: (latest 2.x series) — The official docs are the canonical reference for Core, ORM, typing, async support, and migration topics. Start here for API details, patterns, and upgrade notes.

  2. SQLAlchemy unified tutorial: A step-by-step, 2.0-style tutorial that teaches both Core and ORM together with runnable examples and clear migration guidance from 1.x idioms.

  3. SQLAlchemy GitHub repository: Source code, issues, release notes, and contribution guidelines. Great for tracking changes, reading implementation details, and filing well-scoped bug reports.

  4. SQLAlchemy on PyPI: Package page with the current released version, wheels, and installation metadata. Use this to confirm the latest version before pinning in your project.

  5. Alembic documentation: Official migration tool for SQLAlchemy with tutorials and references for schema evolution, autogenerate, branching, and revision workflows.

  6. Stack Overflow: sqlalchemy tag High-signal Q&A for real-world problems, error messages, and edge cases; search before posting and include a minimal reproducible example.

Katerina Hynkova

Blog

Illustrative image for blog post

Ultimate guide to XGBoost library in Python

By Katerina Hynkova

Updated on August 22, 2025

That’s it, time to try Deepnote

Get started – it’s free
Book a demo

Footer

Solutions

  • Notebook
  • Data apps
  • Machine learning
  • Data teams

Product

Company

Comparisons

Resources

Footer

  • Privacy
  • Terms

© 2025 Deepnote. All rights reserved.