Introduction to Pydantic and Data Validation
The Validation Problem
Imagine you're building an AI agent that accepts user data. A user registers with their name, email, and age. But what happens if someone submits:
{
"name": "Alice",
"email": "not-an-email",
"age": "twenty-five"
}
Your code might crash silently, store invalid data, or worse—send bad data to your AI, which then generates incorrect responses. The problem: Python's type hints only document what SHOULD be there, but don't enforce what IS actually there at runtime.
This is where Pydantic enters the game. Pydantic is a library that validates data at runtime—it checks that your data actually matches your requirements before your code uses it. Type hints say "this SHOULD be an int"; Pydantic makes it "this MUST be an int or validation fails."
Why This Matters for AI-Native Development
When Claude Code generates JSON for you, you need to validate it's correct BEFORE using it. When you build APIs with FastAPI, Pydantic automatically validates every request. When you load configuration files, Pydantic ensures they're valid. In production systems, validation is not optional—it's your safety net.
Section 1: Your First Pydantic Model
Installing Pydantic
Like any Python library, Pydantic needs to be installed first. You've already learned this pattern in Chapter 12 with uv:
uv add pydantic
This installs Pydantic V2 (the modern version). Pydantic V1 is deprecated—always use V2.
Creating Your First Model: A Book
Let's start simple. Imagine you're building a library application that stores books. Each book has:
- title (text, required)
- author (text, required)
- year (whole number, required, between 1000-2100)
- price (decimal number, required, must be >= 0)
- isbn (text, optional)
With Pydantic, you describe this structure in code:
from pydantic import BaseModel
class Book(BaseModel):
title: str
author: str
year: int
price: float
isbn: str | None = None # Optional field with default value
That's it. You've created a Pydantic model. Now let's use it.
Validation Happens Automatically
Creating a valid book works exactly as you'd expect:
# Valid data - no errors
book = Book(
title="Python Guide",
author="Jane Doe",
year=2024,
price=29.99
)
print(book)
# Output: Book(title='Python Guide', author='Jane Doe', year=2024, price=29.99, isbn=None)
# Access fields like normal attributes
print(book.title) # Output: Python Guide
print(book.price) # Output: 29.99
But try passing invalid data:
from pydantic import ValidationError
try:
bad_book = Book(
title="Test Book",
author="Author",
year="not a year", # ERROR: should be int, got str
price=-10 # ERROR: must be >= 0
)
except ValidationError as e:
print(e)
Output (showing what validation catches):
2 validation errors for Book
year
Input should be a valid integer [type=int_type, input_value='not a year', input_type=str]
price
Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-10, input_type=float]
Pydantic caught BOTH errors at once. This is powerful—you don't have to debug one error, fix it, then discover another. You see everything that's wrong.
💬 AI Colearning Prompt
"Ask your AI: What happens if I pass a string to an int field in Pydantic? Show me the validation error and explain what type coercion means."
Section 2: Understanding Validation Errors
Reading ValidationError Messages
Pydantic's error messages are designed to help you. Let's break down what you're seeing:
from pydantic import BaseModel, ValidationError
class User(BaseModel):
name: str
age: int
email: str
try:
user = User(
name="Bob",
age="thirty", # Error 1: not an int
email="bob@example" # Error 2: doesn't look like email
)
except ValidationError as e:
# Print full error details
print(e)
# Or access error details programmatically
for error in e.errors():
print(f"Field: {error['loc']}") # Which field?
print(f"Problem: {error['msg']}") # What's wrong?
print(f"Type: {error['type']}") # What type of error?
This gives you:
- loc (location): Which field has the problem?
- msg (message): What's wrong in plain English?
- type (type of error): Was it a type mismatch? A constraint violation? A format issue?
🎓 Instructor Commentary
Type hints are suggestions; Pydantic makes them laws. In AI-native development, you often receive data from external sources (APIs, AI agents, users). Pydantic ensures that data is valid BEFORE your code tries to use it. This defensive approach prevents subtle bugs that only show up later.
Multiple Errors at Once
One of Pydantic's superpowers is reporting ALL validation problems simultaneously. This saves debugging time:
try:
bad_user = User(
name=123, # Error: not a string
age="not a number", # Error: not an int
email="missing-at-sign" # Error: invalid format
)
except ValidationError as e:
# Shows all 3 errors at once
print(f"Found {len(e.errors())} validation errors")
for error in e.errors():
print(f" - {error['loc'][0]}: {error['msg']}")
✨ Teaching Tip
In production code, always wrap Pydantic operations in try/except blocks. Use the error information to provide helpful feedback to users rather than crashing silently. Example: Instead of "Error: validation failed", tell users "Invalid email format. Please use [email protected]".
Section 3: Nested Models
Real Data Is Complex
So far we've created flat models with simple fields. But real data is hierarchical. A Book might have an Author, and an Author has multiple attributes:
from pydantic import BaseModel
class Author(BaseModel):
name: str
bio: str
class Book(BaseModel):
title: str
authors: list[Author] # List of Author objects!
publication_date: str
Notice authors: list[Author]—this is a list of Author models. Pydantic validates each Author in the list.
Using Nested Models
Creating a book with authors:
# Method 1: Create Author objects first
author1 = Author(name="Alice Smith", bio="Python expert")
author2 = Author(name="Bob Johnson", bio="Data scientist")
book = Book(
title="Advanced Python",
authors=[author1, author2],
publication_date="2024-01-15"
)
# Method 2: Pass dictionaries - Pydantic converts them
book2 = Book(
title="Web Development",
authors=[
{"name": "Charlie Brown", "bio": "Full-stack developer"},
{"name": "Diana Prince", "bio": "Frontend specialist"}
],
publication_date="2024-03-20"
)
# Serialize back to dictionary for APIs or storage
print(book.model_dump())
# Output: {
# 'title': 'Advanced Python',
# 'authors': [
# {'name': 'Alice Smith', 'bio': 'Python expert'},
# {'name': 'Bob Johnson', 'bio': 'Data scientist'}
# ],
# 'publication_date': '2024-01-15'
# }
Validation happens at all levels. If an Author's name is missing, Pydantic catches it:
try:
bad_book = Book(
title="Test",
authors=[
{"name": "Valid Author", "bio": "Good"},
{"bio": "Missing name!"} # ERROR: name is required
],
publication_date="2024-01-01"
)
except ValidationError as e:
print(e)
# Shows error in nested structure:
# authors.1.name: Field required
🚀 CoLearning Challenge
Ask your AI Co-Teacher:
"Create an Author model with name and bio fields. Then create a Book model that contains a single author field (not a list—just one Author). Generate code that creates a Book with a nested Author and shows the validation error when author data is missing."
Expected Outcome: Working nested model structure, validation demonstrating that missing nested fields are caught.
Section 4: Common Mistakes
Mistake 1: Forgetting BaseModel
Pydantic models must inherit from BaseModel:
# WRONG - just a regular class, no validation
class Book: # Missing: BaseModel
title: str
author: str
book = Book(title="Test", author="Author")
# This works but does NO validation!
# CORRECT - inherits from BaseModel
from pydantic import BaseModel
class Book(BaseModel): # Inherits validation
title: str
author: str
book = Book(title="Test", author="Author")
# Now validation works
Mistake 2: Not Handling ValidationError
If you don't catch ValidationError, your program crashes:
# WRONG - will crash if data is invalid
book = Book(title="Test", author=123) # Crash!
# CORRECT - handle the error gracefully
try:
book = Book(title="Test", author=123)
except ValidationError as e:
print(f"Invalid data: {e}")
# Program continues, user sees helpful message
Mistake 3: Mixing Up Type Hints
Type hints must be precise. list is different from list[str]:
# Ambiguous - what's in the list?
tags: list # Could contain anything
# Precise - list of strings
tags: list[str] # Validates each item is a string
class Post(BaseModel):
title: str
tags: list[str] # Pydantic validates each tag
# Valid
post = Post(title="AI", tags=["python", "pydantic"])
# Invalid - number in a list that should contain strings
try:
post = Post(title="AI", tags=["python", 123]) # ERROR
except ValidationError as e:
print(e) # tags.1: Expected string, got int
Try With AI
Using your AI companion (Claude Code, Gemini CLI, or ChatGPT), practice Pydantic validation. These prompts progress from understanding concepts to creating production-ready validation.
Prompt 1: Understand Type Hints vs Validation
Ask your AI:
"Explain the difference between Python's built-in type hints and Pydantic's runtime validation. Give 2-3 concrete examples showing when type hints aren't enough and why Pydantic matters."
Expected Outcome: Your AI should explain that type hints are static (only visible to your IDE), while Pydantic enforces rules at runtime. Examples might include API requests with wrong data types, AI-generated JSON that doesn't match your schema, or configuration files with invalid values.
Prompt 2: Apply - Create Your First Model
Tell your AI:
"Create a Pydantic model for a Product with: name (required string), price (required float, must be >= 0), quantity (required integer), and description (optional string). Then write code that validates both valid and invalid product data, showing what validation errors look like."
Expected Outcome: A working Product model with proper field types, test cases showing successful validation, and examples of validation errors with explanations of what went wrong.
Prompt 3: Analyze - When to Use Pydantic
Ask your AI:
"When would you use Pydantic instead of just type hints? Give 3 real-world scenarios where runtime validation is critical. For each scenario, explain what could go wrong without validation."
Expected Outcome: Scenarios like API input validation (preventing bad data from reaching your code), LLM output validation (ensuring AI responses match your expected format), configuration file validation (catching typos in .env or YAML), or database schema enforcement. Each should explain the consequence of missing validation.
Prompt 4: Create - Nested Models with Validation
Tell your AI:
"Build a UserProfile model with a nested Address model. Include: UserProfile (name, email, age), Address (street, city, zip_code). Add validation that zip_code must be exactly 5 digits. Generate test data showing both valid and invalid zip codes, and demonstrate the validation errors."
Expected Outcome: Nested UserProfile and Address models with custom validation on zip_code, example data for valid cases (zip: "12345"), invalid cases (zip: "ABC"), and clear error messages showing which field failed validation and why.