Tools: Stop Parsing Json By Hand: Structured Llm Outputs With Pydantic
Most LLM tutorials end the same way: you get a string back, you write a regex, and you pray.
We spent three months building production AI agents. The single change that eliminated the most bugs was not prompt engineering, not model upgrades, not retry logic. It was making every LLM call return a Pydantic model instead of raw text.
This article covers 4 working approaches to structured LLM outputs in Python — from direct SDK calls to framework-level abstractions. Every code example is verified against official documentation as of February 2026.
Here is what happens when you parse LLM output manually:
The failure modes multiply: missing fields, wrong types, inconsistent formats across calls, and silent data corruption when the model rephrases its output.
Structured outputs solve this at the protocol level. The model is constrained to produce valid JSON matching your schema. No parsing. No regex. No prayer.
OpenAI's Python SDK (v1.x+) has a .parse() method that accepts a Pydantic model directly and returns a typed object.
What happens under the hood: The SDK converts your Pydantic model to a JSON schema, sends it as response_format, and deserializes the response back into your model class. The model uses constrained decoding — it physically cannot produce tokens that violate your schema.
Compatibility note: Structured outputs work with gpt-4o-2024-08-06 and later models. The first request with a new schema has additional latency (typically under 10 seconds) while the schema is compiled. Subsequent requests use a cached grammar.
What to watch for: OpenAI's structured outputs support a subset of JSON Schema. Constraints like minimum, maximum, minLength, and maxLength are stripped before sending. The SDK adds these constraints to field descriptions instead, so the model sees them as instructions rather than hard constraints. Pydantic still validates them on the response.
Source: Dev.to