Installing and Setting Up Pydantic
Pydantic has become one of the most popular data validation libraries in the Python ecosystem. Before we dive into its powerful features, let’s get it installed and ready to use.
Basic Installation
Installing Pydantic is straightforward using pip:
pip install pydantic
Version Considerations
As of this writing, Pydantic has two major versions available:
- Pydantic v1: The stable, widely-used version that most tutorials and libraries currently support
- Pydantic v2: The newer version with significant performance improvements and some API changes
To install a specific version:
# For v1 (legacy)
pip install "pydantic<2.0.0"
# For v2 (latest)
pip install "pydantic>=2.0.0"
Optional Dependencies
Pydantic offers several optional dependencies for extended functionality:
## For email validation
pip install pydantic[email]
## For URL validation
pip install pydantic[url]
## For all extras
pip install pydantic[all]
Verifying Your Installation
Let’s verify that Pydantic is installed correctly with a simple test:
from pydantic import BaseModel
class User(BaseModel):
username: str
email: str
active: bool = True
## Create a user
user = User(username="john_doe", email="john@example.com")
print(user)
## Should output: username='john_doe' email='john@example.com' active=True
If you see the user information printed without errors, congratulations! Pydantic is installed correctly and you’re ready to start using it.
In the next section, we’ll explore the basics of creating and using Pydantic models.
Basic Usage and Typing with Pydantic
After installing Pydantic, it’s time to explore its core functionality. At its heart, Pydantic is all about creating models that enforce type hints at runtime, providing automatic validation and helpful error messages.
Creating Your First Pydantic Model
A Pydantic model is a class that inherits from BaseModel
. Here’s a simple example:
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
is_active: bool = True # Default value
email: str | None = None # Optional field (Python 3.10+ syntax)
This model defines a user with four fields:
id
: An integer (required)name
: A string (required)is_active
: A boolean with a default value ofTrue
email
: An optional string field that defaults toNone
Basic Field Types
Pydantic supports all the standard Python types:
from datetime import datetime
from typing import List, Dict, Set
from pydantic import BaseModel
class Product(BaseModel):
id: int
name: str
price: float
tags: List[str] = [] # A list of strings
created_at: datetime
metadata: Dict[str, str] = {} # A dictionary with string keys and values
related_ids: Set[int] = set() # A set of integers
Type Validation at Runtime
The magic of Pydantic happens when you create an instance of your model. Let’s see it in action:
## Valid data - types match
user = User(id=1, name="Alice")
print(user.model_dump()) # In v2, use model_dump() instead of dict()
## Output: {'id': 1, 'name': 'Alice', 'is_active': True, 'email': None}
try:
# Invalid data - 'id' should be an integer
user = User(id="not-an-integer", name="Bob")
except Exception as e:
print(f"Validation error: {e}")
# Output will show detailed validation errors
Pydantic automatically converts data when possible (like strings to integers) and raises clear validation errors when conversion isn’t possible.
Type Coercion
One powerful feature of Pydantic is its ability to coerce types when it makes sense:
## Pydantic will convert "42" to the integer 42
user = User(id="42", name="Charlie")
print(user.id) # Output: 42 (as an integer, not a string)
print(type(user.id)) # Output: <class 'int'>
Field Validation
For more control over validation, use the Field
function:
from pydantic import BaseModel, Field
class Product(BaseModel):
id: int
name: str = Field(..., min_length=3) # Required, minimum 3 characters
price: float = Field(gt=0) # Must be greater than 0
discount: float = Field(default=0, ge=0, le=1) # Between 0 and 1
The ...
in Field(...)
indicates that the field is required but doesn’t have a default value.
Accessing Model Data
Pydantic models behave like dataclasses with some additional methods:
user = User(id=1, name="Alice")
## Access fields as attributes
print(user.name) # Output: Alice
## Convert to dictionary
user_dict = user.model_dump() # In v1, use dict()
print(user_dict) # Output: {'id': 1, 'name': 'Alice', 'is_active': True, 'email': None}
## Convert to JSON
user_json = user.model_dump_json() # In v1, use json()
print(user_json) # Output: {"id":1,"name":"Alice","is_active":true,"email":null}
## Check if a field was explicitly set or uses default
print(user.model_fields_set) # Output: {'id', 'name'}
Model Methods and Properties
Pydantic models come with several useful methods:
## Create a copy
user2 = user.model_copy() # In v1, use copy()
## Create a copy with updates
user3 = user.model_copy(update={"name": "Alice Smith"}) # In v1, use copy()
## Get JSON schema
schema = User.model_json_schema() # In v1, use schema()
print(schema)
Type Annotations with Python’s Typing Module
Pydantic leverages Python’s typing module for more complex type definitions:
from typing import List, Dict, Optional, Union, Literal
from pydantic import BaseModel
class AdvancedUser(BaseModel):
# Union type (either string or int)
id: Union[str, int] # In Python 3.10+: id: str | int
# Optional is equivalent to Union[T, None]
middle_name: Optional[str] = None # In Python 3.10+: middle_name: str | None = None
# List of specific objects
tags: List[str] = []
# Literal for specific allowed values
status: Literal["active", "inactive", "pending"] = "active"
# Dictionary with specific key and value types
metadata: Dict[str, Union[str, int, bool]] = {}
This example demonstrates the basics of creating and using Pydantic models. The library’s power comes from combining Python’s type annotations with runtime validation, giving you both the benefits of static typing and the safety of runtime checks.
In the next section, we’ll explore how to configure your models with more advanced options using the Config
class.
Pydantic Model Configuration
One of Pydantic’s most powerful features is the ability to customize model behavior through the Config
class. This inner class allows you to control validation, serialization, and many other aspects of how your models work.
The Config Class
Every Pydantic model can include a Config
class that defines model-wide settings:
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
class Config:
# Configuration options go here
title = "User Model"
frozen = True # Make instances immutable
Let’s explore the most useful configuration options.
Common Configuration Options
Controlling Extra Fields
By default, Pydantic will ignore extra fields that aren’t defined in your model. You can change this behavior:
class User(BaseModel):
id: int
name: str
class Config:
# Options: 'ignore', 'forbid', or 'allow'
extra = "forbid" # Raise an error if extra fields are provided
Options:
"ignore"
(default): Extra fields are ignored"forbid"
: Raises an error if extra fields are provided"allow"
: Extra fields are included in the model
Field Aliases
Sometimes your data source uses different field names than you want in your model:
from pydantic import BaseModel, Field
class User(BaseModel):
user_id: int = Field(alias="id")
full_name: str = Field(alias="name")
class Config:
# Allow populating by field name, not just by alias
populate_by_name = True
With this configuration, your model will accept both {"id": 1, "name": "John"}
and {"user_id": 1, "full_name": "John"}
.
Case Sensitivity
You can make field names case-insensitive:
class CaseInsensitiveModel(BaseModel):
NAME: str
Age: int
class Config:
case_sensitive = False
This allows {"name": "John", "age": 30}
to be valid input.
Allowing Arbitrary Types
By default, Pydantic only works with types it knows how to validate. To use custom types:
from pydantic import BaseModel
from PIL import Image # A non-Pydantic type
class Profile(BaseModel):
name: str
avatar: Image.Image
class Config:
arbitrary_types_allowed = True
Immutable Models
You can make your models immutable (frozen):
class ImmutableUser(BaseModel):
id: int
name: str
class Config:
frozen = True # In Pydantic v1, this was called "allow_mutation = False"
After creation, attempting to modify a field will raise an error:
user = ImmutableUser(id=1, name="Alice")
user.name = "Bob" # This will raise an error
Schema Customization
You can customize the JSON Schema generated for your model:
class User(BaseModel):
id: int
name: str
class Config:
title = "User Information"
description = "A model representing a user in our system"
schema_extra = {
"examples": [
{
"id": 1,
"name": "John Doe"
}
]
}
Validation Behavior
Control how validation works:
class StrictModel(BaseModel):
id: int
ratio: float
class Config:
validate_assignment = True # Validate when attributes are set
strict = True # Disable automatic type conversion
With strict = True
, providing a string like "42"
for an integer field will raise an error instead of converting it.
Config in Practice: Complete Example
Here’s a more comprehensive example showing several configuration options together:
from datetime import datetime
from pydantic import BaseModel, Field
class UserProfile(BaseModel):
user_id: int = Field(alias="id")
name: str
created_at: datetime
last_login: datetime | None = None
settings: dict = {}
class Config:
# Allow population by field name and alias
populate_by_name = True
# Validate when attributes are assigned
validate_assignment = True
# Forbid extra fields
extra = "forbid"
# Custom JSON schema metadata
title = "User Profile"
description = "Complete user profile information"
# JSON serialization options
json_encoders = {
datetime: lambda dt: dt.isoformat(),
}
# Example for documentation
schema_extra = {
"examples": [
{
"id": 1,
"name": "Jane Doe",
"created_at": "2023-01-01T00:00:00",
"last_login": "2023-06-15T14:30:00",
"settings": {"theme": "dark", "notifications": True}
}
]
}
Version Differences (v1 vs v2)
Some configuration options have changed between Pydantic v1 and v2:
v1 Option | v2 Option | Description |
---|---|---|
allow_mutation | frozen | Controls whether model is immutable |
orm_mode | from_attributes | Enables loading data from ORM objects |
schema | json_schema_extra | Adds extra info to schema |
allow_population_by_field_name | populate_by_name | Allows using field names alongside aliases |
Config Inheritance
Config settings are inherited when you subclass models:
class BaseConfig(BaseModel):
class Config:
extra = "forbid"
frozen = True
class UserModel(BaseConfig): # Inherits Config from BaseConfig
name: str
email: str
class Config:
# Override specific options while inheriting others
frozen = False
The Config
class is a powerful way to customize how your Pydantic models behave. By setting the right configuration options, you can make your models more flexible, strict, or tailored to your specific use case.
In the next section, we’ll explore Pydantic’s data validation capabilities in more depth, including custom validators and complex validation rules.
Data Validation with Pydantic
One of Pydantic’s most powerful features is its robust data validation system. While basic type checking happens automatically, Pydantic offers many ways to implement complex validation rules for your data.
Built-in Validators
Pydantic includes many built-in validators through the Field
function:
from pydantic import BaseModel, Field
class Product(BaseModel):
id: int
name: str = Field(min_length=3, max_length=50)
price: float = Field(gt=0) # greater than 0
discount: float = Field(ge=0, le=1) # between 0 and 1 inclusive
tags: list[str] = Field(min_items=1, max_items=10) # between 1 and 10 items
sku: str = Field(pattern=r'^[A-Z]{2}-\d{6}$') # regex pattern validation
Common Field Constraints
Constraint | Type | Description |
---|---|---|
gt , ge | Numbers | Greater than (or equal) |
lt , le | Numbers | Less than (or equal) |
min_length , max_length | Strings, Lists | Min/max length |
regex , pattern | Strings | Regular expression pattern |
min_items , max_items | Lists | Min/max number of items |
min_digits , max_digits | Decimal | Digit count constraints |
Custom Validators with @validator
For more complex validation logic, use the @validator
decorator:
from pydantic import BaseModel, validator
from datetime import datetime
class User(BaseModel):
id: int
username: str
password: str
password_confirm: str
birth_date: datetime = None
# Validate a single field
@validator('username')
def username_alphanumeric(cls, v):
if not v.isalnum():
raise ValueError('must be alphanumeric')
return v
# Validate password confirmation
@validator('password_confirm')
def passwords_match(cls, v, values):
if 'password' in values and v != values['password']:
raise ValueError('passwords do not match')
return v
# Validate birth_date is in the past
@validator('birth_date')
def birth_date_in_past(cls, v):
if v and v > datetime.now():
raise ValueError('birth date must be in the past')
return v
The @validator
decorator takes the field name(s) to validate and can access:
- The value being validated
- Previously validated values through the
values
parameter
Validator Options
The @validator
decorator accepts several options:
class Item(BaseModel):
name: str
quantity: int
@validator('quantity', pre=True) # Run before type conversion
def check_quantity_positive(cls, v):
if isinstance(v, str) and v.isdigit():
v = int(v)
if v <= 0:
raise ValueError('must be positive')
return v
@validator('name', always=True) # Run even if field is missing (has default)
def check_name_not_empty(cls, v):
if not v.strip():
raise ValueError('cannot be empty')
return v.strip()
Root Validators
For validations that depend on multiple fields, use @root_validator
:
from pydantic import BaseModel, root_validator
class Payment(BaseModel):
amount: float
discount: float = 0
final_amount: float = None
@root_validator
def calculate_final_amount(cls, values):
amount = values.get('amount', 0)
discount = values.get('discount', 0)
if amount < 0:
raise ValueError('amount must be positive')
if discount < 0 or discount > 1:
raise ValueError('discount must be between 0 and 1')
# Calculate the final amount after discount
values['final_amount'] = amount * (1 - discount)
return values
Root validators are executed after all field validation and can:
- Access all fields at once
- Implement cross-field validations
- Set derived fields based on other values
Field Validators (Pydantic v2)
In Pydantic v2, there’s a new way to define field-specific validators using the field_validator
decorator:
from pydantic import BaseModel, field_validator
class User(BaseModel):
username: str
email: str
@field_validator('username')
@classmethod # Required in v2
def validate_username(cls, value):
if len(value) < 3:
raise ValueError('Username must be at least 3 characters')
if not value.isalnum():
raise ValueError('Username must be alphanumeric')
return value
@field_validator('email')
@classmethod
def validate_email(cls, value):
if '@' not in value:
raise ValueError('Invalid email format')
return value.lower() # Normalize emails to lowercase
Model Validators (Pydantic v2)
In Pydantic v2, root_validator
is replaced with model_validator
:
from pydantic import BaseModel, model_validator
class Order(BaseModel):
item_count: int
items: list[str]
@model_validator(mode='after')
def check_items_count(self):
if len(self.items) != self.item_count:
raise ValueError(f'Item count ({self.item_count}) does not match items list length ({len(self.items)})')
return self
The mode
parameter can be:
'before'
: Run before validation (similar topre=True
in v1)'after'
: Run after validation (default)
Custom Error Messages
You can customize error messages for better user experience:
from pydantic import BaseModel, Field, validator
class User(BaseModel):
username: str = Field(..., min_length=3, max_length=20)
email: str
@validator('email')
def validate_email(cls, v):
if '@' not in v:
raise ValueError('Please provide a valid email address')
return v
For more complex scenarios, you can raise ValueError
with custom messages in your validators.
Error Handling
Pydantic provides detailed validation errors that you can handle in your application:
from pydantic import BaseModel, ValidationError
class User(BaseModel):
username: str
email: str
age: int
try:
user = User(username="john", email="not-an-email", age="twenty")
except ValidationError as e:
print(f"Validation error: {e}")
# Access structured error data
errors = e.errors()
for error in errors:
print(f"Field: {error['loc'][0]}, Error: {error['msg']}")
# Convert to JSON
json_errors = e.json()
print(f"JSON errors: {json_errors}")
Conditional Validation
Sometimes you need to validate fields based on conditions:
from pydantic import BaseModel, validator
class Product(BaseModel):
name: str
is_digital: bool = False
shipping_weight: float = None
download_url: str = None
@validator('shipping_weight')
def validate_shipping_weight(cls, v, values):
is_digital = values.get('is_digital', False)
if not is_digital and (v is None or v <= 0):
raise ValueError('Physical products must have a shipping weight')
return v
@validator('download_url')
def validate_download_url(cls, v, values):
is_digital = values.get('is_digital', False)
if is_digital and not v:
raise ValueError('Digital products must have a download URL')
return v
Pydantic’s validation system gives you the tools to ensure your data meets your application’s requirements. From simple type checks to complex cross-field validations, you can implement almost any validation logic while keeping your models clean and maintainable.
Working with Complex Data Types in Pydantic
Real-world data is rarely as simple as strings and integers. Pydantic excels at handling complex, nested data structures that better represent the relationships in your data. In this section, we’ll explore how to work with nested models, collections, and other complex data types.
Nested Models
One of Pydantic’s most powerful features is the ability to nest models within each other:
from pydantic import BaseModel
from typing import List
class Address(BaseModel):
street: str
city: str
zip_code: str
country: str
class User(BaseModel):
name: str
email: str
address: Address # Nested model
When creating a User
instance, you can provide the address as a dictionary, and Pydantic will automatically convert it to an Address
instance:
user = User(
name="John Doe",
email="john@example.com",
address={
"street": "123 Main St",
"city": "Anytown",
"zip_code": "12345",
"country": "US"
}
)
print(user.address) # Output: address=Address(street='123 Main St', city='Anytown', zip_code='12345', country='US')
print(type(user.address)) # Output: <class '__main__.Address'>
Lists, Sets, and Tuples
Pydantic supports various collection types with type validation for their contents:
from typing import List, Set, Tuple
from pydantic import BaseModel
class BlogPost(BaseModel):
title: str
content: str
tags: List[str] = [] # A list of strings
categories: Set[str] = set() # A set of strings (no duplicates)
related_posts: List[int] = [] # A list of post IDs
coordinates: Tuple[float, float] = None # A tuple with two floats
Usage example:
post = BlogPost(
title="Working with Pydantic",
content="Pydantic is awesome...",
tags=["python", "pydantic", "validation"],
categories={"tutorial", "programming"},
related_posts=[1, 2, 3],
coordinates=(40.7128, -74.0060) # New York coordinates
)
Dictionaries
Dictionaries can have typed keys and values:
from typing import Dict, Any
from pydantic import BaseModel
class Configuration(BaseModel):
# Dictionary with string keys and string values
string_settings: Dict[str, str] = {}
# Dictionary with string keys and any value type
mixed_settings: Dict[str, Any] = {}
# Dictionary with string keys and integer values
numeric_settings: Dict[str, int] = {}
Usage:
config = Configuration(
string_settings={"theme": "dark", "language": "en-US"},
mixed_settings={
"theme": "dark",
"timeout": 30,
"debug": True,
"factors": [1.1, 2.2, 3.3]
},
numeric_settings={"timeout": 30, "max_retries": 5}
)
Union Types
Union types allow a field to accept multiple types:
from typing import Union, List
from pydantic import BaseModel
class Item(BaseModel):
# Can be either an integer or a string
id: Union[int, str] # Python 3.10+: id: int | str
# Can be a string or a list of strings
tags: Union[str, List[str]] = [] # Python 3.10+: tags: str | list[str] = []
This allows flexibility in your data model:
## Both are valid
item1 = Item(id=1, tags=["electronics", "gadget"])
item2 = Item(id="ABC-123", tags="electronics")
## For item2, if a string is provided for tags, you might want to convert it to a list
if isinstance(item2.tags, str):
tags_list = [item2.tags]
Optional Fields
Optional fields can be defined in two ways:
from typing import Optional
from pydantic import BaseModel
class User(BaseModel):
# Using Optional from typing
middle_name: Optional[str] = None # Python 3.10+: middle_name: str | None = None
# Using Union with None
nickname: Union[str, None] = None # Python 3.10+: nickname: str | None = None
# Implicitly optional by providing None as default
bio: str = None
Working with Datetime Objects
Pydantic has excellent support for date and time types:
from datetime import datetime, date, time, timedelta
from pydantic import BaseModel
class Event(BaseModel):
name: str
start_date: date
end_date: date
start_time: time
duration: timedelta
created_at: datetime
Pydantic can parse various string formats automatically:
event = Event(
name="Conference",
start_date="2023-09-15", # ISO format string
end_date=date(2023, 9, 17), # Python date object
start_time="09:00:00", # Time string
duration="3h 30m", # Duration string
created_at="2023-06-01T12:30:45" # ISO format datetime
)
print(event.start_date) # Output: 2023-09-15
print(type(event.start_date)) # Output: <class 'datetime.date'>
Enumerations
Pydantic works well with Python’s Enum class:
from enum import Enum, auto
from pydantic import BaseModel
class UserRole(str, Enum):
ADMIN = "admin"
EDITOR = "editor"
VIEWER = "viewer"
class PaymentStatus(Enum):
PENDING = auto()
COMPLETED = auto()
FAILED = auto()
class User(BaseModel):
name: str
role: UserRole = UserRole.VIEWER
class Payment(BaseModel):
amount: float
status: PaymentStatus = PaymentStatus.PENDING
Usage:
## Using string value for enum
user = User(name="Alice", role="admin") # Automatically converted to UserRole.ADMIN
print(user.role) # Output: UserRole.ADMIN
print(user.role == UserRole.ADMIN) # Output: True
## Using enum directly
user2 = User(name="Bob", role=UserRole.EDITOR)
Custom Data Types
You can create custom data types by implementing validation logic:
from pydantic import BaseModel, validator
import re
## Custom email type
class Email(str):
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v):
if not isinstance(v, str):
raise TypeError('string required')
pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
if not re.match(pattern, v):
raise ValueError('invalid email format')
return cls(v)
class User(BaseModel):
name: str
email: Email
In Pydantic v2, custom types are created differently:
from pydantic import BaseModel, field_validator
from pydantic.types import StringConstraints
from typing import Annotated
## Using Annotated with constraints
Email = Annotated[str, StringConstraints(pattern=r'^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$')]
class User(BaseModel):
name: str
email: Email
Recursive Models
You can create recursive models for tree-like structures:
from typing import List, Optional
from pydantic import BaseModel, Field
class Comment(BaseModel):
id: int
text: str
replies: List['Comment'] = [] # Self-reference
## This is needed for the self-reference to work
Comment.model_rebuild()
## Create a nested comment structure
comment = Comment(
id=1,
text="Great article!",
replies=[
Comment(id=2, text="I agree!"),
Comment(
id=3,
text="Thanks!",
replies=[Comment(id=4, text="You're welcome!")]
)
]
)
Forward References
When models reference each other, you can use string literals as forward references:
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
name: str
posts: List['Post'] = [] # Forward reference
class Post(BaseModel):
title: str
author: Optional['User'] = None # Forward reference
## Update the models to resolve forward references
User.model_rebuild()
Post.model_rebuild()
Pydantic’s support for complex data types allows you to model even the most sophisticated data structures while maintaining type safety and validation. By combining nested models, collections, and custom types, you can create expressive, self-documenting data models that accurately represent your application’s domain.
Schema Generation with Pydantic
One of Pydantic’s most powerful features is its ability to automatically generate JSON Schema from your data models. This capability is especially valuable when building APIs or documenting data structures, as it provides a standardized way to describe the expected shape and constraints of your data.
What is JSON Schema?
JSON Schema is a vocabulary that allows you to annotate and validate JSON documents. It provides a contract for what JSON data is required for a given application and how that data should be structured. This is particularly useful for:
- Validating client-submitted data
- Generating documentation
- Creating mock data
- Enabling auto-completion in IDEs
- Supporting code generation
Basic Schema Generation
Every Pydantic model can generate its JSON Schema representation:
from pydantic import BaseModel, Field
from typing import List, Optional
import json
class User(BaseModel):
id: int = Field(gt=0, description="The user ID")
name: str = Field(min_length=2, description="The user's full name")
email: str
is_active: bool = True
tags: List[str] = []
## Generate the JSON Schema
schema = User.model_json_schema() # In v1: schema()
## Pretty print the schema
print(json.dumps(schema, indent=2))
This produces a JSON Schema that describes the model:
{
"title": "User",
"type": "object",
"properties": {
"id": {
"title": "Id",
"description": "The user ID",
"exclusiveMinimum": 0,
"type": "integer"
},
"name": {
"title": "Name",
"description": "The user's full name",
"minLength": 2,
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
},
"is_active": {
"title": "Is Active",
"default": true,
"type": "boolean"
},
"tags": {
"title": "Tags",
"default": [],
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"id",
"name",
"email"
]
}
Customizing Schema Generation
Pydantic provides several ways to customize the generated schema:
Field Customization
The Field
function allows you to add metadata to your model fields:
from pydantic import BaseModel, Field
class Product(BaseModel):
id: int = Field(
..., # ... means required
gt=0,
description="Unique product identifier",
examples=[1, 2, 3]
)
name: str = Field(
...,
min_length=3,
max_length=50,
description="Product name",
examples=["Smartphone", "Laptop"]
)
price: float = Field(
...,
gt=0,
description="Product price in USD",
examples=[499.99, 1299.99]
)
Schema Customization via Config
You can customize the schema at the model level using the Config
class:
class Product(BaseModel):
id: int
name: str
price: float
class Config:
title = "Product Information"
description = "Detailed information about a product in our catalog"
schema_extra = {
"examples": [
{
"id": 1,
"name": "Smartphone",
"price": 699.99
}
]
}
Schema for Nested Models
Pydantic automatically handles nested models in schema generation:
from pydantic import BaseModel, Field
from typing import List, Optional
class Address(BaseModel):
street: str
city: str
country: str
postal_code: str
class User(BaseModel):
id: int
name: str
addresses: List[Address]
## Get the schema with nested models
schema = User.model_json_schema()
The generated schema will include the full definition of the Address
model.
Schema References with $ref
For more complex models with shared components, Pydantic can generate schemas with references:
from pydantic import BaseModel
from typing import List
class Tag(BaseModel):
id: int
name: str
class Category(BaseModel):
id: int
name: str
class Product(BaseModel):
id: int
name: str
tags: List[Tag]
category: Category
## Generate schema with references
schema = Product.model_json_schema(ref_template="#/components/schemas/{model}")
This produces a schema with references to component definitions.
OpenAPI Integration
Pydantic’s schema generation is particularly valuable when working with OpenAPI (formerly Swagger) for API documentation. Libraries like FastAPI use Pydantic’s schema generation to automatically create OpenAPI documentation:
from fastapi import FastAPI, Path
from pydantic import BaseModel
app = FastAPI()
class Item(BaseModel):
id: int
name: str
description: str = None
price: float
tax: float = None
@app.post("/items/", response_model=Item)
async def create_item(item: Item):
return item
@app.get("/items/{item_id}", response_model=Item)
async def read_item(item_id: int = Path(..., gt=0)):
# Retrieve item from database
return {"id": item_id, "name": "Example", "price": 9.99}
FastAPI uses the Pydantic models to:
- Validate request and response data
- Generate OpenAPI documentation
- Create automatic interactive documentation with Swagger UI
Schema Customization with Field Types
Pydantic provides specialized field types that affect schema generation:
from pydantic import BaseModel, Field, HttpUrl, EmailStr, constr, confloat
class User(BaseModel):
id: int
name: str
email: EmailStr # Specialized email string type
website: HttpUrl # URL type with validation
username: constr(min_length=3, max_length=20, pattern=r'^[a-zA-Z0-9_-]+$') # Constrained string
rating: confloat(ge=0, le=5) # Constrained float
These specialized types add appropriate validations and formats to the schema.
Version Differences (v1 vs v2)
Schema generation has some differences between Pydantic v1 and v2:
Version | Method | Notes |
---|---|---|
v1 | model.schema() | Original schema generation method |
v2 | model.model_json_schema() | Renamed method with improved functionality |
In v2, there are also changes to how schema customization works:
## Pydantic v1
class ModelV1(BaseModel):
class Config:
schema_extra = {"examples": [{"id": 1}]}
## Pydantic v2
class ModelV2(BaseModel):
model_config = {
"json_schema_extra": {"examples": [{"id": 1}]}
}
Practical Applications
Generating Documentation
You can use the generated schema to create documentation for your data models:
import json
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
## Save schema to a file
with open('user_schema.json', 'w') as f:
json.dump(User.model_json_schema(), f, indent=2)
Data Validation with JSON Schema
The generated schema can be used with JSON Schema validators:
import jsonschema
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
## Get the schema
schema = User.model_json_schema()
## Valid data
valid_data = {"id": 1, "name": "John", "email": "john@example.com"}
## Invalid data
invalid_data = {"id": "not an integer", "name": "John"}
## Validate
try:
jsonschema.validate(instance=valid_data, schema=schema)
print("Valid data validated successfully")
except jsonschema.exceptions.ValidationError as e:
print(f"Validation error: {e}")
try:
jsonschema.validate(instance=invalid_data, schema=schema)
print("Invalid data validated successfully (shouldn't happen)")
except jsonschema.exceptions.ValidationError as e:
print(f"Validation error (expected): {e}")
Mock Data Generation
You can use the schema to generate mock data for testing:
from pydantic import BaseModel
import json
import requests
class User(BaseModel):
id: int
name: str
email: str
is_active: bool
## Generate schema
schema = User.model_json_schema()
## Use a service like json-schema-faker or mockend
response = requests.post(
"https://some-mock-service.com/generate",
json={"schema": schema, "count": 5}
)
mock_users = response.json()
print(json.dumps(mock_users, indent=2))
Pydantic’s schema generation capabilities provide a powerful way to document and validate your data models. By leveraging JSON Schema, you can create self-documenting code that integrates seamlessly with modern API frameworks, documentation tools, and validation libraries.
This feature is particularly valuable in larger projects where maintaining consistent data structures and clear documentation is essential. Whether you’re building APIs, processing complex data, or integrating with external systems, Pydantic’s schema generation helps ensure your data models are well-defined and properly validated.