Pydantic v2 Discriminated Unions: Type-Safe Variants Without the Runtime Tax
How Field(discriminator=...) in Pydantic v2 turns Union parsing from O(n) trial-and-error into O(1) tag dispatch — with Literal tags, mypy support, and real failure modes.
Pydantic v2 Discriminated Unions: Type-Safe Variants Without the Runtime Tax
Plain Union[A, B, C] in Pydantic is a guessing game. The validator tries each variant in declaration order, keeps the first one that parses, and silently coerces fields when two variants overlap. That's fine for two unrelated types, painful for five message variants in a WebSocket protocol where one wrong cast corrupts the whole stream.
Discriminated unions fix this. You add a Literal tag field to each variant, point Field(discriminator=...) at it, and the parser becomes a dictionary lookup: read the tag, dispatch to the matching model, fail loudly if the tag is unknown. No silent fallbacks, no order-dependent behavior, and mypy actually narrows the type after match statements.
The problem with naive unions
Consider a job dispatcher that accepts three message kinds:
from pydantic import BaseModel
class StartJob(BaseModel):
job_id: str
timeout_s: int = 30
class CancelJob(BaseModel):
job_id: str
class HeartbeatJob(BaseModel):
job_id: str
progress: float = 0.0
A wrapper using a plain union:
from typing import Union
class Envelope(BaseModel):
payload: Union[StartJob, CancelJob, HeartbeatJob]
What happens when you parse {"payload": {"job_id": "abc"}}? Pydantic walks variants in order. StartJob matches because timeout_s has a default. You meant CancelJob. The bug is invisible \u2014 no exception, no warning, just the wrong type silently constructed.
Switching variant order changes behavior. Adding a fourth variant changes behavior. This is the failure mode discriminated unions exist to eliminate.
The discriminated version
Add a kind field with a Literal tag, then declare the discriminator:
from typing import Literal, Annotated, Union
from pydantic import BaseModel, Field
class StartJob(BaseModel):
kind: Literal["start"]
job_id: str
timeout_s: int = 30
class CancelJob(BaseModel):
kind: Literal["cancel"]
job_id: str
class HeartbeatJob(BaseModel):
kind: Literal["heartbeat"]
job_id: str
progress: float = 0.0
JobMessage = Annotated[
Union[StartJob, CancelJob, HeartbeatJob],
Field(discriminator="kind"),
]
class Envelope(BaseModel):
payload: JobMessage
Now {"payload": {"kind": "cancel", "job_id": "abc"}} parses as CancelJob, full stop. {"payload": {"kind": "unknown", "job_id": "abc"}} raises ValidationError with a clear message: Input tag 'unknown' found using 'kind' does not match any of the expected tags: 'start', 'cancel', 'heartbeat'.
Three things changed. Each variant carries a Literal tag that's part of its schema. The union is wrapped in Annotated[..., Field(discriminator="kind")] so Pydantic knows which field to read. Variant order no longer matters because dispatch is by tag, not by trial.
Performance: O(n) versus O(1)
Naive unions try variants sequentially. With three variants the cost is bounded; with twelve variants in a real protocol, every parse runs through all twelve until one matches. Pydantic v2's pydantic-core engine optimizes the discriminated path into a tag-keyed lookup. Parsing 100k messages against a 12-variant union benchmarks roughly 4\u00d7 faster with a discriminator versus the smart-union fallback, and the gap widens with variant count.
The win isn't just speed. Error messages with naive unions return the last failing variant's error, which is rarely the one the caller intended. Discriminated unions return the error for the variant the tag points to, so a typo in timeout_s on a start message says exactly that.
Mypy and type narrowing
The static-analysis story is the other reason to bother. With a discriminated union, mypy treats Literal tags as exhaustive:
def handle(msg: JobMessage) -> str:
match msg.kind:
case "start":
return f"starting {msg.job_id} with timeout {msg.timeout_s}"
case "cancel":
return f"cancelling {msg.job_id}"
case "heartbeat":
return f"heartbeat {msg.job_id} at {msg.progress}"
Inside each case, mypy narrows msg to the correct variant. msg.timeout_s is only accessible inside the "start" arm \u2014 touching it under "cancel" is a type error at lint time. Drop a variant, forget to handle it, and mypy flags the missing case. The runtime validator and the type checker agree on the same set of tags.
This is where discriminated unions earn their keep over isinstance checks. isinstance(msg, StartJob) works, but it requires the runtime classes to be importable everywhere you handle messages. Tag-based matching just needs the Literal strings, which travel cleanly across module boundaries and through serialized JSON.
Three patterns where this pays off
WebSocket protocols. When a server pushes events to a client, every variant carries a tag and the client dispatcher is a match. Adding a new event type means adding one variant and one case arm \u2014 both checked by the runtime validator and mypy. Compare this to the typical "parse as dict, branch on event["type"]" approach, where adding a field is silent until production breaks.
Tool-calling APIs. LLM tool calls return {"type": "tool_use", ...} or {"type": "text", ...}. A discriminated Union[ToolUseBlock, TextBlock, ThinkingBlock] lets the consumer write one parser that handles every block type Anthropic ships, and breaks loudly when the SDK adds a new block kind that the consumer hasn't updated for.
Background job queues. A job table stores polymorphic payloads. Discriminated union on the payload column means the worker reads kind, dispatches, and gets full type info inside each handler \u2014 no cast(StartJob, payload) anywhere.
Failure modes worth knowing
The discriminator field must be present at the top level of every variant. Nesting it inside a sub-model breaks the optimization. Pydantic raises a config error at class-definition time if the tag isn't a Literal, which is helpful \u2014 you catch the mistake before any data hits the validator.
Tags must be unique across variants. Two variants both declaring kind: Literal["start"] is a Pydantic config error, not a silent ambiguity. Good \u2014 silent ambiguity is what naive unions gave you.
Default values on the discriminator break things. kind: Literal["start"] = "start" looks reasonable but means the field is optional in input JSON, and missing-tag input falls back to a default the validator can't fully verify against the union. Keep the discriminator field required.
Forward references and circular variants need model_rebuild() after all classes are defined. If you see "Discriminator field is missing" errors that don't match what's clearly in your code, this is usually the cause. The Pydantic v2 discriminator docs cover the rebuild mechanics.
Migrating from v1 unions
Pydantic v1 used a Field(..., discriminator="kind") syntax inside the model where the union was used. Pydantic v2 moves the annotation onto the union type itself with Annotated[Union[...], Field(discriminator="kind")]. The old syntax still works in many cases but the Annotated form is now canonical and integrates better with TypeAdapter for parsing standalone unions:
from pydantic import TypeAdapter
JobAdapter = TypeAdapter(JobMessage)
msg = JobAdapter.validate_python({"kind": "start", "job_id": "abc"})
TypeAdapter is the right tool when the discriminated union is the whole payload, not a field on a wrapper model. It avoids creating a one-field Envelope just to access the parser.
When not to use discriminators
If your union has two unambiguously-shaped variants \u2014 say Union[int, str] \u2014 a discriminator is overkill. Pydantic's smart-union mode handles those correctly because the input types don't overlap. Discriminated unions earn their keep when variants are structurally similar (multiple BaseModel subclasses) and a wrong cast would corrupt downstream logic.
Use them for protocol messages, tool-call blocks, polymorphic database rows, and anywhere a typo in a tag should fail at parse time instead of three layers downstream. Skip them for trivially-distinguishable unions where the type system already gives you safety for free.
References: