FastAPI async SQLAlchemy pool_pre_ping pool_recycle production

The first time I shipped an async FastAPI service to a managed Postgres, the app looked clean for about thirty minutes. Then the 500s started — one a minute, always the same line: OperationalError: server closed the connection unexpectedly. SQLAlchemy had been quietly handing my code a pool that worked beautifully on my laptop and not at all in production. This article is the version of that fix I wish I'd had pinned in a tab. Five lessons, each one commit in the companion repo: a broken baseline, four keyword arguments added in the right order, and one fallback pool class for when the runtime is serverless. The order matters, and the order is the part nobody writes down.

Lesson 1: the pool you didn't know you had

The single line of code that ships your async FastAPI service to production is the same one that quietly breaks it thirty minutes later. It type-checks. The tests pass. Under sustained traffic it falls over.

import os
from sqlalchemy.ext.asyncio import (
    AsyncSession,
    async_sessionmaker,
    create_async_engine,
)

DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite+aiosqlite:///./app.db")

engine = create_async_engine(DATABASE_URL, echo=False)

AsyncSessionLocal: async_sessionmaker[AsyncSession] = async_sessionmaker(
    engine,
    expire_on_commit=False,
)

async def get_session() -> AsyncSession:
    async with AsyncSessionLocal() as session:
        yield session

What you actually get under that single-line create_async_engine call is AsyncAdaptedQueuePool with five permanent slots, ten overflow slots, no eviction, and no health check. Against aiosqlite it works forever because SQLite never disconnects. Against asyncpg to a real Postgres instance, the first idle period triggers ConnectionDoesNotExistError or InterfaceError: cannot perform operation: another operation is in progress on the next request that pulls from the pool.

Compared to the synchronous QueuePool used by Flask plus SQLAlchemy, AsyncAdaptedQueuePool wraps each checkout in the asyncio event loop. The exhaustion failure mode is identical. The recovery is not. A sync worker thread blocks until a connection frees up. An async FastAPI handler returns a 500 if pool_timeout elapses, because there is no thread to block waiting on a semaphore.

The pool the default gives you also opens connections lazily. The first five requests after process start each pay the full SSL plus auth handshake. That is roughly 5 to 15 milliseconds of cold tail latency per request until the pool warms. Acceptable on its own. Combined with the absence of pool_pre_ping and pool_recycle, it adds up: cold connection at startup, eventually stale connection at the four-hour mark, and the cycle repeats forever.

Commit: 6d3e507. The fix in subsequent commits adds four keyword arguments. Order matters.

Lesson 2: pool_pre_ping, the SELECT 1 tax worth paying

What if SQLAlchemy could ping every pooled connection with a lightweight SELECT 1 before handing it to your code? If the ping raises a disconnect-class exception, the pool invalidates that connection, discards it, and pulls or opens a fresh one. The user code sees nothing.

engine = create_async_engine(
    DATABASE_URL,
    echo=False,
    pool_pre_ping=True,
)

Cost on a same-availability-zone managed Postgres: roughly 0.3 to 1.5 milliseconds per checkout. On a sustained 200 requests per second workload that puts the SELECT 1 in front of the actual query, the added latency at the median is under 0.1 percent. The alternative is 1 in 5 requests failing after a 30-minute lull, which is unacceptable.

What pre_ping does not save you from: the connection that is healthy at checkout but the server kills mid-transaction. That happens when a managed provider fires tcp_keepalive_idle mid-statement, when a maintenance window restarts the database while your transaction is open, or when an explicit pg_terminate_backend(pid) lands on the connection you are holding. Pre_ping has already let the connection through. You will see the disconnect inside session.execute(...). That is pool_recycle plus application-level retry territory, covered next.

A practical trade-off discussion: pre_ping versus a periodic background health-check task. The background-task approach (a coroutine that loops every 60 seconds, opens a session, runs SELECT 1, closes) sounds appealing because it moves the cost off the hot path. It does not work. The pool has many connections; your background task only touches one per tick; and the connection the next user request grabs is most likely a different one. Pre_ping is per-checkout. Per-checkout is the right granularity.

Commit: 9895ee6.

Lesson 3: pool_recycle, eviction before the server gives up

Here is the connection pool_pre_ping cannot save: healthy at checkout, dead three statements later when the provider's idle-timeout fires mid-transaction — which is exactly the gap pool_recycle=3600 was designed to close. Sets a hard ceiling on connection age regardless of recent activity.

engine = create_async_engine(
    DATABASE_URL,
    echo=False,
    pool_pre_ping=True,
    pool_recycle=3600,
)

Pick the value against the shortest external timeout the connection touches:

MySQL wait_timeout default is 28800 seconds (8 hours). 3600 sits comfortably under.
Managed Postgres on RDS, Supabase, Neon, and similar tiers runs 5 to 30 minute idle timeouts. Set pool_recycle=1800 if you operate close to the edge.
PlanetScale caps connection age at about 28 hours. 3600 is conservative.
pgbouncer in transaction mode multiplexes upstream connections, so the app-side connection is short lived anyway and recycle matters less. Pgbouncer in session mode behaves like a direct connection and needs the same recycle setting.

The reason recycle is belt-and-suspenders with pre_ping: pre_ping covers checkout-time staleness, recycle covers in-flight TCP drops. Together they cover the two distinct failure modes. Apply both. Skipping recycle and relying on pre_ping alone is the single most common production mistake on this stack.

A recycle value that is too aggressive has its own cost. Setting pool_recycle=60 means every connection over a minute old gets thrown away, paying the 5 to 15 millisecond handshake more or less continuously. The pool never stabilizes; you are effectively running NullPool with extra steps. The 3600 baseline is the conservative sweet spot for almost every managed database, and the value to drop down from only when you measure idle-timeout failures still happening.

Commit: 356f30f.

Lesson 4: pool_size and max_overflow, bound the blast radius

The default pool_size=5, max_overflow=10 means each worker process can open up to fifteen connections. Four uvicorn workers per pod multiplied by three pods running is up to 180 connections. Managed Postgres free tiers cap at 60 to 100. You hit the ceiling under burst load and the error rate spikes.

engine = create_async_engine(
    DATABASE_URL,
    echo=False,
    pool_pre_ping=True,
    pool_recycle=3600,
    pool_size=20,
    max_overflow=0,
    pool_timeout=30,
)

pool_size=20, max_overflow=0 means exactly twenty connections per process, no more. You plan capacity around that single number. Four workers per pod times two pods is 160 connections steady state. You know exactly what the database has to handle.

Why max_overflow=0 over max_overflow=10? Predictable failure beats unpredictable latency. Overflow connections are created lazily on demand and discarded immediately on return. They do not get reused. Under burst, your latency P99 quietly climbs because each overflow connection pays the full SSL plus auth handshake. With max_overflow=0, you get a clean queue: twenty in-flight at any moment, all reused; anyone past twenty waits in the pool_timeout queue; if they wait longer than thirty seconds, the request fails fast with TimeoutError. That signal is honest. You scale horizontally or raise pool_size. You do not silently degrade.

How to pick pool_size=20 specifically? Rule of thumb: pool_size approximately equals concurrent in-flight queries at peak. With FastAPI plus async, that approximates the number of concurrent requests holding a database session at any moment. If each request holds a session for around 50 milliseconds and the worker pushes 400 requests per second, you need about 20 in-flight at any tick. Validate empirically against pg_stat_activity. Tune up or down based on actual saturation, not on a guess.

pool_timeout=30 is the default and is set explicitly here for documentation. Lower it (5 to 10 seconds) if you sit behind a load balancer that times out faster than thirty seconds. Otherwise you stack queue depth and the load balancer cancels the request anyway, wasting both the queue slot and the eventual connection.

One more failure mode worth naming: long-held sessions. If a single endpoint holds a session open across a 5-second external HTTP call (because the developer did the I/O inside the session context manager), the pool runs out of slots not because of high traffic but because each request is holding its slot 100 times longer than it should. The fix is not to raise pool_size; the fix is to close the session before the external call and reopen after. Pool sizing only solves the steady-state capacity problem; it does not save you from architectural mistakes that hold connections during slow external work.

Commit: 7a484c9.

Lesson 5: NullPool, when the pool itself is the bug

Serverless runtimes do not keep workers warm. AWS Lambda freezes the runtime between invocations. Cloud Run scales pods up and down within seconds. A traditional pool that holds twenty connections on a Lambda is bad: concurrent Lambda count can spike to 200, each holding twenty connections, blowing past database limits within seconds.

from sqlalchemy.pool import NullPool

engine = create_async_engine(
    DATABASE_URL,
    echo=False,
    poolclass=NullPool,
)

poolclass=NullPool instructs SQLAlchemy not to pool at all. Open a fresh connection per request. Close it on session exit. Trade-off: roughly 5 to 15 milliseconds of connect overhead per request, dominated by the SSL handshake plus authentication round-trip. Acceptable when request count per instance is low and connection count needs to stay bounded by the database's view.

Two production patterns combine NullPool with infrastructure-level pooling so you do not pay the handshake every request:

NullPool plus pgbouncer in transaction mode. Pgbouncer holds the upstream connections and multiplexes; your app opens cheap local pseudo-connections that pgbouncer terminates after each transaction.
NullPool plus RDS Proxy or Cloud SQL Proxy. The cloud proxy does the same multiplexing job at the network layer, and your Lambda just sees a fast local endpoint.

Do not use NullPool on a long-running worker. You will pay the handshake 200 times per second and saturate the database connect path before you saturate the query path. NullPool is the scale-to-zero specialization, not a general default.

Commit: 408ab84.

The order matters

Apply the four keyword arguments in this sequence:

pool_pre_ping=True first. Cheap, catches stale connections at checkout. Necessary.
pool_recycle=3600 second. Closes the gap pre_ping does not see, namely mid-transaction kills and provider idle-timeout enforcement.
pool_size=20, max_overflow=0, pool_timeout=30 third. Bounds resource consumption to a known steady-state number.
poolclass=NullPool only when serverless makes step 3 the wrong tool.

Reordering creates broken intermediate states. Setting pool_size=20 before adding pre_ping means your twenty connections eventually all go stale at the same time, and the recovery storm is twenty times worse than what you started with. Adding NullPool first on a long-running worker means you pay the connect handshake on every request, then watch the database connection rate climb without understanding why.

Verifying the changes locally

Point the engine at a real Postgres and force a disconnect to verify pre_ping picks it up. Open psql and run SELECT pid, application_name FROM pg_stat_activity WHERE application_name = 'your-app'. Pick a pid and run SELECT pg_terminate_backend(<pid>). The next request through that connection should silently swap in a fresh one when pool_pre_ping=True is active. Comment out pre_ping and the same test surfaces ConnectionDoesNotExistError on the next request. That is the production failure compressed into a five-second loop on your workstation.

For pool_recycle, the equivalent local test is harder to reproduce but easy to reason about: set pool_recycle=5 for the experiment, run a single request, wait six seconds, run another. Enable echo_pool=True on the engine and the pool log prints the eviction line so you can confirm SQLAlchemy actually rotated the connection rather than reusing it.

For pool_size and max_overflow, run wrk -c 50 -t 4 -d 30s http://localhost:8000/items and watch pg_stat_activity. With max_overflow=0 the connection count holds flat at 20 and the failed-request rate rises cleanly under saturation. With max_overflow=10 the connection count climbs and falls in a sawtooth as overflow slots open and close, masking the saturation point.

Repository

Full source at https://github.com/vytharion/async-sqlalchemy-pool-pre-ping-recycle. Clone it, check out any commit, and run uv sync && uv run pytest -q.

Lesson 1 → 6d3e507: baseline naive pool. Reproduces the bug.
Lesson 2 → 9895ee6: enable pool_pre_ping. Stale-checkout fix.
Lesson 3 → 356f30f: add pool_recycle=3600. Mid-flight eviction.
Lesson 4 → 7a484c9: bound the pool. pool_size=20, max_overflow=0, pool_timeout=30.
Lesson 5 → 408ab84: NullPool variant for serverless.

References

SQLAlchemy 2.0 pool documentation: https://docs.sqlalchemy.org/en/20/core/pooling.html
SQLAlchemy async ORM and engine: https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html
asyncpg connection internals: https://magicstack.github.io/asyncpg/current/

Next steps

Run git log --oneline in the cloned repo. Walk the commits in order. Modify DATABASE_URL to point at a real Postgres and watch how each keyword argument changes behavior under SELECT pg_terminate_backend(...). Once the pattern is in muscle memory, copy the final Lesson 4 engine definition into your service. Plan capacity from a single pool_size number. Add the NullPool branch only when you actually deploy to a scale-to-zero runtime, and pair it with pgbouncer or a cloud SQL proxy so you are not paying the SSL handshake on every request.