The Complete Guide to Using Pydantic for Validating LLM Outputs -aitoolstv.com. All rights reserved.

In this article, you will learn how to turn free-form large language model (LLM) text into reliable, schema-validated Python objects with Pydantic.

Topics we will cover include:

Designing robust Pydantic models (including custom validators and nested schemas).
Parsing “messy” LLM outputs safely and surfacing precise validation errors.
Integrating validation with OpenAI, LangChain, and LlamaIndex plus retry strategies.

Let’s break it down.

The Complete Guide to Using Pydantic for Validating LLM Outputs
Image by Editor

Introduction

Large language models generate text, not structured data. Even when you prompt them to return structured data, they’re still generating text that looks like valid JSON. The output may have incorrect field names, missing required fields, wrong data types, or extra text wrapped around the actual data. Without validation, these inconsistencies cause runtime errors that are difficult to debug.

Pydantic helps you validate data at runtime using Python type hints. It checks that LLM outputs match your expected schema, converts types automatically where possible, and provides clear error messages when validation fails. This gives you a reliable contract between the LLM’s output and your application’s requirements.

This article shows you how to use Pydantic to validate LLM outputs. You’ll learn how to define validation schemas, handle malformed responses, work with nested data, integrate with LLM APIs, implement retry logic with validation feedback, and more. Let’s not waste any more time.

🔗 You can find the code on GitHub. Before you go ahead, install Pydantic version 2.x with the optional email dependencies: pip install pydantic[email].

Getting Started

Let’s start with a simple example by building a tool that extracts contact information from text. The LLM reads unstructured text and returns structured data that we validate with Pydantic:

from pydantic import BaseModel, EmailStr, field_validator from typing import Optional class ContactInfo(BaseModel): name: str email: EmailStr phone: Optional[str] = None company: Optional[str] = None @field_validator(‘phone’) @classmethod def validate_phone(cls, v): if v is None: return v cleaned = ”.join(filter(str.isdigit, v)) if len(cleaned) < 10: raise ValueError(‘Phone number must have at least 10 digits’) return cleaned

from pydantic import BaseModel, EmailStr, field_validator

from typing import Optional

class ContactInfo(BaseModel):

email: EmailStr

phone: Optional[str] = None

company: Optional[str] = None

@field_validator(‘phone’)

@classmethod

def validate_phone(cls, v):

if v is None:

return v

cleaned = ”.join(filter(str.isdigit, v))

if len(cleaned) < 10:

raise ValueError(‘Phone number must have at least 10 digits’)

return cleaned

All Pydantic models inherit from BaseModel, which provides automatic validation. Type hints like name: str help Pydantic validate types at runtime. The EmailStr type validates email format without needing a custom regex. Fields marked with Optional[str] = None can be missing or null. The @field_validator decorator lets you add custom validation logic, like cleaning phone numbers and checking their length.

Here’s how to use the model to validate sample LLM output:

import json llm_response=””‘ { “name”: “Sarah Johnson”, “email”: “sarah.johnson@techcorp.com”, “phone”: “(555) 123-4567”, “company”: “TechCorp Industries” } ”’ data = json.loads(llm_response) contact = ContactInfo(**data) print(contact.name) print(contact.email) print(contact.model_dump())

import json

llm_response = ”‘

{

“name”: “Sarah Johnson”,

“email”: “sarah.johnson@techcorp.com”,

“phone”: “(555) 123-4567”,

“company”: “TechCorp Industries”

}

‘”

data = json.loads(llm_response)

contact = ContactInfo(**data)

print(contact.name)

print(contact.email)

print(contact.model_dump())

When you create a ContactInfo instance, Pydantic validates everything automatically. If validation fails, you get a clear error message telling you exactly what went wrong.

Parsing and Validating LLM Outputs

LLMs don’t always return perfect JSON. Sometimes they add markdown formatting, explanatory text, or mess up the structure. Here’s how to handle these cases:

from pydantic import BaseModel, ValidationError, field_validator import json import re class ProductReview(BaseModel): product_name: str rating: int review_text: str would_recommend: bool @field_validator(‘rating’) @classmethod def validate_rating(cls, v): if not 1 <= v <= 5: raise ValueError(‘Rating must be an integer between 1 and 5′) return v def extract_json_from_llm_response(response: str) -> dict: “””Extract JSON from LLM response that might contain extra text.””” json_match = re.search(r’\{.*\}’, response, re.DOTALL) if json_match: return json.loads(json_match.group()) raise ValueError(“No JSON found in response”) def parse_review(llm_output: str) -> ProductReview: “””Safely parse and validate LLM output.””” try: data = extract_json_from_llm_response(llm_output) review = ProductReview(**data) return review except json.JSONDecodeError as e: print(f”JSON parsing error: {e}”) raise except ValidationError as e: print(f”Validation error: {e}”) raise except Exception as e: print(f”Unexpected error: {e}”) raise

from pydantic import BaseModel, ValidationError, field_validator

import json

import re

class ProductReview(BaseModel):

product_name: str

rating: int

review_text: str

would_recommend: bool

@field_validator(‘rating’)

@classmethod

def validate_rating(cls, v):

if not 1 <= v <= 5:

raise ValueError(‘Rating must be an integer between 1 and 5’)

return v

def extract_json_from_llm_response(response: str) -> dict:

“”“Extract JSON from LLM response that might contain extra text.”“”

json_match = re.search(r‘\{.*\}’, response, re.DOTALL)

if json_match:

return json.loads(json_match.group())

raise ValueError(“No JSON found in response”)

def parse_review(llm_output: str) -> ProductReview:

“”“Safely parse and validate LLM output.”“”

try:

data = extract_json_from_llm_response(llm_output)

review = ProductReview(**data)

return review

except json.JSONDecodeError as e:

print(f“JSON parsing error: {e}”)

raise

except ValidationError as e:

print(f“Validation error: {e}”)

raise

except Exception as e:

print(f“Unexpected error: {e}”)

raise

This approach uses regex to find JSON within response text, handling cases where the LLM adds explanatory text before or after the data. We catch different exception types separately:

JSONDecodeError for malformed JSON,
ValidationError for data that doesn’t match the schema, and
General exceptions for unexpected issues.

The extract_json_from_llm_response function handles text cleanup while parse_review handles validation, keeping concerns separated. In production, you’d want to log these errors or retry the LLM call with an improved prompt.

This example shows an LLM response with extra text that our parser handles correctly:

messy_response=””‘ Here’s the review in JSON format: { “product_name”: “Wireless Headphones X100”, “rating”: 4, “review_text”: “Great sound quality, comfortable for long use.”, “would_recommend”: true } Hope this helps! ”’ review = parse_review(messy_response) print(f”Product: {review.product_name}”) print(f”Rating: {review.rating}/5″)

messy_response = ”‘

Here’s the review in JSON format:

{

“product_name”: “Wireless Headphones X100”,

“rating”: 4,

“review_text”: “Great sound quality, comfortable for long use.”,

“would_recommend”: true

}

Hope this helps!

”‘

review = parse_review(messy_response)

print(f“Product: {review.product_name}”)

print(f“Rating: {review.rating}/5”)

The parser extracts the JSON block from the surrounding text and validates it against the ProductReview schema.

Working with Nested Models

Real-world data is rarely flat. Here’s how to handle nested structures like a product with multiple reviews and specifications:

from pydantic import BaseModel, Field, field_validator from typing import List class Specification(BaseModel): key: str value: str class Review(BaseModel): reviewer_name: str rating: int = Field(…, ge=1, le=5) comment: str verified_purchase: bool = False class Product(BaseModel): id: str name: str price: float = Field(…, gt=0) category: str specifications: List[Specification] reviews: List[Review] average_rating: float = Field(…, ge=1, le=5) @field_validator(‘average_rating’) @classmethod def check_average_matches_reviews(cls, v, info): reviews = info.data.get(‘reviews’, []) if reviews: calculated_avg = sum(r.rating for r in reviews) / len(reviews) if abs(calculated_avg – v) > 0.1: raise ValueError( f’Average rating {v} does not match calculated average {calculated_avg:.2f}’ ) return v

from pydantic import BaseModel, Field, field_validator

from typing import List

class Specification(BaseModel):

key: str

value: str

class Review(BaseModel):

reviewer_name: str

rating: int = Field(..., ge=1, le=5)

comment: str

verified_purchase: bool = False

class Product(BaseModel):

id: str

price: float = Field(..., gt=0)

category: str

specifications: List[Specification]

reviews: List[Review]

average_rating: float = Field(..., ge=1, le=5)

@field_validator(‘average_rating’)

@classmethod

def check_average_matches_reviews(cls, v, info):

reviews = info.data.get(‘reviews’, [])

if reviews:

calculated_avg = sum(r.rating for r in reviews) / len(reviews)

if abs(calculated_avg – v) > 0.1:

raise ValueError(

f‘Average rating {v} does not match calculated average {calculated_avg:.2f}’

)

return v

The Product model contains lists of Specification and Review objects, and each nested model is validated independently. Using Field(..., ge=1, le=5) adds constraints directly in the type hint, where ge means “greater than or equal” and gt means “greater than”.

The check_average_matches_reviews validator accesses other fields using info.data, allowing you to validate relationships between fields. When you pass nested dictionaries to Product(**data), Pydantic automatically creates the nested Specification and Review objects.

This structure ensures data integrity at every level. If a single review is malformed, you’ll know exactly which one and why.

This example shows how nested validation works with a complete product structure:

llm_response = { “id”: “PROD-2024-001”, “name”: “Smart Coffee Maker”, “price”: 129.99, “category”: “Kitchen Appliances”, “specifications”: [ {“key”: “Capacity”, “value”: “12 cups”}, {“key”: “Power”, “value”: “1000W”}, {“key”: “Color”, “value”: “Stainless Steel”} ], “reviews”: [ { “reviewer_name”: “Alex M.”, “rating”: 5, “comment”: “Makes excellent coffee every time!”, “verified_purchase”: True }, { “reviewer_name”: “Jordan P.”, “rating”: 4, “comment”: “Good but a bit noisy”, “verified_purchase”: True } ], “average_rating”: 4.5 } product = Product(**llm_response) print(f”{product.name}: ${product.price}”) print(f”Average Rating: {product.average_rating}”) print(f”Number of reviews: {len(product.reviews)}”)

llm_response = {

“id”: “PROD-2024-001”,

“name”: “Smart Coffee Maker”,

“price”: 129.99,

“category”: “Kitchen Appliances”,

“specifications”: [

{“key”: “Capacity”, “value”: “12 cups”},

{“key”: “Power”, “value”: “1000W”},

{“key”: “Color”, “value”: “Stainless Steel”}

“reviews”: [

{

“reviewer_name”: “Alex M.”,

“rating”: 5,

“comment”: “Makes excellent coffee every time!”,

“verified_purchase”: True

{

“reviewer_name”: “Jordan P.”,

“rating”: 4,

“comment”: “Good but a bit noisy”,

“verified_purchase”: True

}

“average_rating”: 4.5

}

product = Product(**llm_response)

print(f“{product.name}: ${product.price}”)

print(f“Average Rating: {product.average_rating}”)

print(f“Number of reviews: {len(product.reviews)}”)

Pydantic validates the entire nested structure in one call, checking that specifications and reviews are properly formed and that the average rating matches the individual review ratings.

Using Pydantic with LLM APIs and Frameworks

So far, we’ve learned that we need a reliable way to convert free-form text into structured, validated data. Now let’s see how to use Pydantic validation with OpenAI’s API, as well as frameworks like LangChain and LlamaIndex. Be sure to install the required SDKs.

Using Pydantic with OpenAI API

Here’s how to extract structured data from unstructured text using OpenAI’s API with Pydantic validation:

from openai import OpenAI from pydantic import BaseModel from typing import List import os client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”)) class BookSummary(BaseModel): title: str author: str genre: str key_themes: List[str] main_characters: List[str] brief_summary: str recommended_for: List[str] def extract_book_info(text: str) -> BookSummary: “””Extract structured book information from unstructured text.””” prompt = f””” Extract book information from the following text and return it as JSON. Required format: {{ “title”: “book title”, “author”: “author name”, “genre”: “genre”, “key_themes”: [“theme1”, “theme2”], “main_characters”: [“character1”, “character2”], “brief_summary”: “summary in 2-3 sentences”, “recommended_for”: [“audience1”, “audience2”] }} Text: {text} Return ONLY the JSON, no additional text. “”” response = client.chat.completions.create( model=”gpt-4o-mini”, messages=[ {“role”: “system”, “content”: “You are a helpful assistant that extracts structured data.”}, {“role”: “user”, “content”: prompt} ], temperature=0 ) llm_output = response.choices[0].message.content import json data = json.loads(llm_output) return BookSummary(**data)

from openai import OpenAI

from pydantic import BaseModel

from typing import List

import os

client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”))

class BookSummary(BaseModel):

title: str

author: str

genre: str

key_themes: List[str]

main_characters: List[str]

brief_summary: str

recommended_for: List[str]

def extract_book_info(text: str) -> BookSummary:

“”“Extract structured book information from unstructured text.”“”

prompt = f“”“

Extract book information from the following text and return it as JSON.

Required format:

{{

“title“: “book title“,

“author“: “author name“,

“genre“: “genre“,

“key_themes“: [“theme1“, “theme2“],

“main_characters“: [“character1“, “character2“],

“brief_summary“: “summary in 2–3 sentences“,

“recommended_for“: [“audience1“, “audience2“]

}}

Text: {text}

Return ONLY the JSON, no additional text.

““”

response = client.chat.completions.create(

model=“gpt-4o-mini”,

messages=[

{“role”: “system”, “content”: “You are a helpful assistant that extracts structured data.”},

{“role”: “user”, “content”: prompt}

temperature=0

)

llm_output = response.choices[0].message.content

import json

data = json.loads(llm_output)

return BookSummary(**data)

The prompt includes the exact JSON structure we expect, guiding the LLM to return data matching our Pydantic model. Setting temperature=0 makes the LLM more deterministic and less creative, which is what we want for structured data extraction. The system message primes the model to be a data extractor rather than a conversational assistant. Even with careful prompting, we still validate with Pydantic because you should never trust LLM output without verification.

This example extracts structured information from a book description:

book_text = “”” ‘The Midnight Library’ by Matt Haig is a contemporary fiction novel that explores themes of regret, mental health, and the infinite possibilities of life. The story follows Nora Seed, a woman who finds herself in a library between life and death, where each book represents a different life she could have lived. Through her journey, she encounters various versions of herself and must decide what truly makes a life worth living. The book resonates with readers dealing with depression, anxiety, or life transitions. “”” try: book_info = extract_book_info(book_text) print(f”Title: {book_info.title}”) print(f”Author: {book_info.author}”) print(f”Themes: {‘, ‘.join(book_info.key_themes)}”) except Exception as e: print(f”Error extracting book info: {e}”)

book_text = “”“

‘The Midnight Library’ by Matt Haig is a contemporary fiction novel that explores

themes of regret, mental health, and the infinite possibilities of life. The story

follows Nora Seed, a woman who finds herself in a library between life and death,

where each book represents a different life she could have lived. Through her journey,

she encounters various versions of herself and must decide what truly makes a life worth living.

The book resonates with readers dealing with depression, anxiety, or life transitions.

““”

try:

book_info = extract_book_info(book_text)

print(f“Title: {book_info.title}”)

print(f“Author: {book_info.author}”)

print(f“Themes: {‘, ‘.join(book_info.key_themes)}”)

except Exception as e:

print(f“Error extracting book info: {e}”)

The function sends the unstructured text to the LLM with clear formatting instructions, then validates the response against the BookSummary schema.

Using LangChain with Pydantic

LangChain provides built-in support for structured output extraction with Pydantic models. There are two main approaches that handle the complexity of prompt engineering and parsing for you.

The first method uses PydanticOutputParser, which works with any LLM by using prompt engineering to guide the model’s output format. The parser automatically generates detailed format instructions from your Pydantic model:

from langchain_openai import ChatOpenAI from langchain.output_parsers import PydanticOutputParser from langchain.prompts import PromptTemplate from pydantic import BaseModel, Field from typing import List, Optional class Restaurant(BaseModel): “””Information about a restaurant.””” name: str = Field(description=”The name of the restaurant”) cuisine: str = Field(description=”Type of cuisine served”) price_range: str = Field(description=”Price range: $, $$, $$$, or $$$$”) rating: Optional[float] = Field(default=None, description=”Rating out of 5.0″) specialties: List[str] = Field(description=”Signature dishes or specialties”) def extract_restaurant_with_parser(text: str) -> Restaurant: “””Extract restaurant info using LangChain’s PydanticOutputParser.””” parser = PydanticOutputParser(pydantic_object=Restaurant) prompt = PromptTemplate( template=”Extract restaurant information from the following text.\n{format_instructions}\n{text}\n”, input_variables=[“text”], partial_variables={“format_instructions”: parser.get_format_instructions()} ) llm = ChatOpenAI(model=”gpt-4o-mini”, temperature=0) chain = prompt | llm | parser result = chain.invoke({“text”: text}) return result

from langchain_openai import ChatOpenAI

from langchain.output_parsers import PydanticOutputParser

from langchain.prompts import PromptTemplate

from pydantic import BaseModel, Field

from typing import List, Optional

class Restaurant(BaseModel):

“”“Information about a restaurant.”“”

cuisine: str = Field(description=“Type of cuisine served”)

price_range: str = Field(description=“Price range: $, $$, $$$, or $$$$”)

rating: Optional[float] = Field(default=None, description=“Rating out of 5.0”)

specialties: List[str] = Field(description=“Signature dishes or specialties”)

def extract_restaurant_with_parser(text: str) -> Restaurant:

“”“Extract restaurant info using LangChain’s PydanticOutputParser.”“”

parser = PydanticOutputParser(pydantic_object=Restaurant)

prompt = PromptTemplate(

template=“Extract restaurant information from the following text.\n{format_instructions}\n{text}\n”,

input_variables=[“text”],

partial_variables={“format_instructions”: parser.get_format_instructions()}

)

llm = ChatOpenAI(model=“gpt-4o-mini”, temperature=0)

chain = prompt | llm | parser

result = chain.invoke({“text”: text})

return result

The PydanticOutputParser automatically generates format instructions from your Pydantic model, including field descriptions and type information. It works with any LLM that can follow instructions and doesn’t require function calling support. The chain syntax makes it easy to compose complex workflows.

The second method is to use the native function calling capabilities of modern LLMs through the with_structured_output() function:

def extract_restaurant_structured(text: str) -> Restaurant: “””Extract restaurant info using with_structured_output.””” llm = ChatOpenAI(model=”gpt-4o-mini”, temperature=0) structured_llm = llm.with_structured_output(Restaurant) prompt = PromptTemplate.from_template( “Extract restaurant information from the following text:\n\n{text}” ) chain = prompt | structured_llm result = chain.invoke({“text”: text}) return result

def extract_restaurant_structured(text: str) -> Restaurant:

“”“Extract restaurant info using with_structured_output.”“”

llm = ChatOpenAI(model=“gpt-4o-mini”, temperature=0)

structured_llm = llm.with_structured_output(Restaurant)

prompt = PromptTemplate.from_template(

“Extract restaurant information from the following text:\n\n{text}”

)

chain = prompt | structured_llm

result = chain.invoke({“text”: text})

return result

This method produces cleaner, more concise code and makes use of the model’s native function calling capabilities for more reliable extraction. You don’t need to manually create parsers or format instructions, and it’s generally more accurate than prompt-based approaches.

Here’s an example of how to use these functions:

restaurant_text = “”” Mama’s Italian Kitchen is a cozy family-owned restaurant serving authentic Italian cuisine. Rated 4.5 stars, it’s known for its homemade pasta and wood-fired pizzas. Prices are moderate ($$), and their signature dishes include lasagna bolognese and tiramisu. “”” try: restaurant_info = extract_restaurant_structured(restaurant_text) print(f”Restaurant: {restaurant_info.name}”) print(f”Cuisine: {restaurant_info.cuisine}”) print(f”Specialties: {‘, ‘.join(restaurant_info.specialties)}”) except Exception as e: print(f”Error: {e}”)

restaurant_text = “”“

Mama’s Italian Kitchen is a cozy family-owned restaurant serving authentic

Italian cuisine. Rated 4.5 stars, it’s known for its homemade pasta and

wood-fired pizzas. Prices are moderate ($$), and their signature dishes

include lasagna bolognese and tiramisu.

““”

try:

restaurant_info = extract_restaurant_structured(restaurant_text)

print(f“Restaurant: {restaurant_info.name}”)

print(f“Cuisine: {restaurant_info.cuisine}”)

print(f“Specialties: {‘, ‘.join(restaurant_info.specialties)}”)

except Exception as e:

print(f“Error: {e}”)

Using LlamaIndex with Pydantic

LlamaIndex provides multiple approaches for structured extraction, with particularly strong integration for document-based workflows. It’s especially useful when you need to extract structured data from large document collections or build RAG systems.

The most straightforward approach in LlamaIndex is using LLMTextCompletionProgram, which requires minimal boilerplate code:

from llama_index.core.program import LLMTextCompletionProgram from pydantic import BaseModel, Field from typing import List, Optional class Product(BaseModel): “””Information about a product.””” name: str = Field(description=”Product name”) brand: str = Field(description=”Brand or manufacturer”) category: str = Field(description=”Product category”) price: float = Field(description=”Price in USD”) features: List[str] = Field(description=”Key features”) rating: Optional[float] = Field(default=None, description=”Customer rating out of 5″) def extract_product_simple(text: str) -> Product: “””Extract product info using LlamaIndex’s simple approach.””” prompt_template_str = “”” Extract product information from the following text and structure it properly: {text} “”” program = LLMTextCompletionProgram.from_defaults( output_cls=Product, prompt_template_str=prompt_template_str, verbose=False ) result = program(text=text) return result

from llama_index.core.program import LLMTextCompletionProgram

from pydantic import BaseModel, Field

from typing import List, Optional

class Product(BaseModel):

“”“Information about a product.”“”

brand: str = Field(description=“Brand or manufacturer”)

category: str = Field(description=“Product category”)

price: float = Field(description=“Price in USD”)

features: List[str] = Field(description=“Key features”)

rating: Optional[float] = Field(default=None, description=“Customer rating out of 5”)

def extract_product_simple(text: str) -> Product:

“”“Extract product info using LlamaIndex’s simple approach.”“”

prompt_template_str = “”“

Extract product information from the following text and structure it properly:

{text}

““”

program = LLMTextCompletionProgram.from_defaults(

output_cls=Product,

prompt_template_str=prompt_template_str,

verbose=False

)

result = program(text=text)

return result

The output_cls parameter automatically handles Pydantic validation. This works with any LLM through prompt engineering and is good for quick prototyping and simple extraction tasks.

For models that support function calling, you can use FunctionCallingProgram. And when you need explicit control over parsing behavior, you can use the PydanticOutputParser method:

from llama_index.core.program import LLMTextCompletionProgram from llama_index.core.output_parsers import PydanticOutputParser from llama_index.llms.openai import OpenAI def extract_product_with_parser(text: str) -> Product: “””Extract product info using explicit parser.””” prompt_template_str = “”” Extract product information from the following text: {text} {format_instructions} “”” llm = OpenAI(model=”gpt-4o-mini”, temperature=0) program = LLMTextCompletionProgram.from_defaults( output_parser=PydanticOutputParser(output_cls=Product), prompt_template_str=prompt_template_str, llm=llm, verbose=False ) result = program(text=text) return result

from llama_index.core.program import LLMTextCompletionProgram

from llama_index.core.output_parsers import PydanticOutputParser

from llama_index.llms.openai import OpenAI

def extract_product_with_parser(text: str) -> Product:

“”“Extract product info using explicit parser.”“”

prompt_template_str = “”“

Extract product information from the following text:

{text}

{format_instructions}

““”

llm = OpenAI(model=“gpt-4o-mini”, temperature=0)

program = LLMTextCompletionProgram.from_defaults(

output_parser=PydanticOutputParser(output_cls=Product),

prompt_template_str=prompt_template_str,

llm=llm,

verbose=False

)

result = program(text=text)

return result

Here’s how you’d extract product information in practice:

product_text = “”” The Sony WH-1000XM5 wireless headphones feature industry-leading noise cancellation, exceptional sound quality, and up to 30 hours of battery life. Priced at $399.99, these premium headphones include Adaptive Sound Control, multipoint connection, and speak-to-chat technology. Customers rate them 4.7 out of 5 stars. “”” try: product_info = extract_product_with_parser(product_text) print(f”Product: {product_info.name}”) print(f”Brand: {product_info.brand}”) print(f”Price: ${product_info.price}”) print(f”Features: {‘, ‘.join(product_info.features)}”) except Exception as e: print(f”Error: {e}”)

product_text = “”“

The Sony WH-1000XM5 wireless headphones feature industry-leading noise cancellation,

exceptional sound quality, and up to 30 hours of battery life. Priced at $399.99,

these premium headphones include Adaptive Sound Control, multipoint connection,

and speak-to-chat technology. Customers rate them 4.7 out of 5 stars.

““”

try:

product_info = extract_product_with_parser(product_text)

print(f“Product: {product_info.name}”)

print(f“Brand: {product_info.brand}”)

print(f“Price: ${product_info.price}”)

print(f“Features: {‘, ‘.join(product_info.features)}”)

except Exception as e:

print(f“Error: {e}”)

Use explicit parsing when you need custom parsing logic, are working with models that don’t support function calling, or are debugging extraction issues.

Retrying LLM Calls with Better Prompts

When the LLM returns invalid data, you can retry with an improved prompt that includes the error message from the failed validation attempt:

from pydantic import BaseModel, ValidationError from typing import Optional import json class EventExtraction(BaseModel): event_name: str date: str location: str attendees: int event_type: str def extract_with_retry(llm_call_function, max_retries: int = 3) -> Optional[EventExtraction]: “””Try to extract valid data, retrying with error feedback if validation fails.””” last_error = None for attempt in range(max_retries): try: response = llm_call_function(last_error) data = json.loads(response) return EventExtraction(**data) except ValidationError as e: last_error = str(e) print(f”Attempt {attempt + 1} failed: {last_error}”) if attempt == max_retries – 1: print(“Max retries reached, giving up”) return None except json.JSONDecodeError: print(f”Attempt {attempt + 1}: Invalid JSON”) last_error = “The response was not valid JSON. Please return only valid JSON.” if attempt == max_retries – 1: return None return None

from pydantic import BaseModel, ValidationError

from typing import Optional

import json

class EventExtraction(BaseModel):

event_name: str

date: str

location: str

attendees: int

event_type: str

def extract_with_retry(llm_call_function, max_retries: int = 3) -> Optional[EventExtraction]:

“”“Try to extract valid data, retrying with error feedback if validation fails.”“”

last_error = None

for attempt in range(max_retries):

try:

response = llm_call_function(last_error)

data = json.loads(response)

return EventExtraction(**data)

except ValidationError as e:

last_error = str(e)

print(f“Attempt {attempt + 1} failed: {last_error}”)

if attempt == max_retries – 1:

print(“Max retries reached, giving up”)

return None

except json.JSONDecodeError:

print(f“Attempt {attempt + 1}: Invalid JSON”)

last_error = “The response was not valid JSON. Please return only valid JSON.”

if attempt == max_retries – 1:

return None

Each retry includes the previous error message, helping the LLM understand what went wrong. After max_retries, the function returns None instead of crashing, allowing the calling code to handle the failure gracefully. Printing each attempt’s error makes it easy to debug why extraction is failing.

In a real application, your llm_call_function would construct a new prompt including the Pydantic error message, like "Previous attempt failed with error: {error}. Please fix and try again."

This example shows the retry pattern with a mock LLM function that progressively improves:

def mock_llm_call(previous_error: Optional[str] = None) -> str: “””Simulate an LLM that improves based on error feedback.””” if previous_error is None: return ‘{“event_name”: “Tech Conference 2024”, “date”: “2024-06-15”, “location”: “San Francisco”}’ elif “attendees” in previous_error.lower(): return ‘{“event_name”: “Tech Conference 2024”, “date”: “2024-06-15”, “location”: “San Francisco”, “attendees”: “about 500”, “event_type”: “Conference”}’ else: return ‘{“event_name”: “Tech Conference 2024”, “date”: “2024-06-15”, “location”: “San Francisco”, “attendees”: 500, “event_type”: “Conference”}’ result = extract_with_retry(mock_llm_call) if result: print(f”\nSuccess! Extracted event: {result.event_name}”) print(f”Expected attendees: {result.attendees}”) else: print(“Failed to extract valid data”)

def mock_llm_call(previous_error: Optional[str] = None) -> str:

“”“Simulate an LLM that improves based on error feedback.”“”

if previous_error is None:

return ‘{“event_name”: “Tech Conference 2024”, “date”: “2024-06-15”, “location”: “San Francisco”}’

elif “attendees” in previous_error.lower():

return ‘{“event_name”: “Tech Conference 2024”, “date”: “2024-06-15”, “location”: “San Francisco”, “attendees”: “about 500”, “event_type”: “Conference”}’

else:

return ‘{“event_name”: “Tech Conference 2024”, “date”: “2024-06-15”, “location”: “San Francisco”, “attendees”: 500, “event_type”: “Conference”}’

result = extract_with_retry(mock_llm_call)

if result:

print(f“\nSuccess! Extracted event: {result.event_name}”)

print(f“Expected attendees: {result.attendees}”)

else:

print(“Failed to extract valid data”)

The first attempt misses the required attendees field, the second attempt includes it but with the wrong type, and the third attempt gets everything correct. The retry mechanism handles these progressive improvements.

Conclusion

Pydantic helps you go from unreliable LLM outputs into validated, type-safe data structures. By combining clear schemas with robust error handling, you can build AI-powered applications that are both powerful and reliable.

Here are the key takeaways:

Define clear schemas that match your needs
Validate everything and handle errors gracefully with retries and fallbacks
Use type hints and validators to enforce data integrity
Include schemas in your prompts to guide the LLM

Start with simple models and add validation as you find edge cases in your LLM outputs. Happy exploring!

The Complete Guide to Using Pydantic for Validating LLM Outputs

Introduction

Getting Started

Parsing and Validating LLM Outputs

Working with Nested Models

Using Pydantic with LLM APIs and Frameworks

Using Pydantic with OpenAI API

Using LangChain with Pydantic

Using LlamaIndex with Pydantic

Retrying LLM Calls with Better Prompts

Conclusion

References and Further Reading

Leave a Reply Cancel reply

The Complete Guide to Using Pydantic for Validating LLM Outputs

A new AI agent for multi-source knowledge

The Role of AI and Robotics in Improving Efficiency in Manufacturing

The Future of Work: How AI and Robotics are Changing Industries

The Impact of Artificial Intelligence and Robotics on Society

Advancements in AI and Robotics: What to Expect in the Coming Years

The best AI productivity tools in 2026

AI governance: What it is + why it’s important

December 8, 2025 – Earn As Much As 4.94% – Forbes Advisor

December 5, 2025 – Earn As Much As 4.94% – Forbes Advisor

December 8, 2025 – Earn As Much As 4.94% – Forbes Advisor

Hinge’s new AI feature helps daters move beyond boring small talk