Python Data Validation with Pydantic

Installing and Setting Up Pydantic

Pydantic has become one of the most popular data validation libraries in the Python ecosystem. Before we dive into its powerful features, let’s get it installed and ready to use.

Basic Installation

Installing Pydantic is straightforward using pip:

pip install pydantic

Version Considerations

As of this writing, Pydantic has two major versions available:

  • Pydantic v1: The stable, widely-used version that most tutorials and libraries currently support
  • Pydantic v2: The newer version with significant performance improvements and some API changes

To install a specific version:

# For v1 (legacy)
pip install "pydantic<2.0.0"

# For v2 (latest)
pip install "pydantic>=2.0.0"

Optional Dependencies

Pydantic offers several optional dependencies for extended functionality:

## For email validation
pip install pydantic[email]

## For URL validation
pip install pydantic[url]

## For all extras
pip install pydantic[all]

Verifying Your Installation

Let’s verify that Pydantic is installed correctly with a simple test:

from pydantic import BaseModel

class User(BaseModel):
    username: str
    email: str
    active: bool = True

## Create a user
user = User(username="john_doe", email="john@example.com")
print(user)
## Should output: username='john_doe' email='john@example.com' active=True

If you see the user information printed without errors, congratulations! Pydantic is installed correctly and you’re ready to start using it.

In the next section, we’ll explore the basics of creating and using Pydantic models.

Basic Usage and Typing with Pydantic

After installing Pydantic, it’s time to explore its core functionality. At its heart, Pydantic is all about creating models that enforce type hints at runtime, providing automatic validation and helpful error messages.

Creating Your First Pydantic Model

A Pydantic model is a class that inherits from BaseModel. Here’s a simple example:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    is_active: bool = True  # Default value
    email: str | None = None  # Optional field (Python 3.10+ syntax)

This model defines a user with four fields:

  • id: An integer (required)
  • name: A string (required)
  • is_active: A boolean with a default value of True
  • email: An optional string field that defaults to None

Basic Field Types

Pydantic supports all the standard Python types:

from datetime import datetime
from typing import List, Dict, Set
from pydantic import BaseModel

class Product(BaseModel):
    id: int
    name: str
    price: float
    tags: List[str] = []  # A list of strings
    created_at: datetime
    metadata: Dict[str, str] = {}  # A dictionary with string keys and values
    related_ids: Set[int] = set()  # A set of integers

Type Validation at Runtime

The magic of Pydantic happens when you create an instance of your model. Let’s see it in action:

## Valid data - types match
user = User(id=1, name="Alice")
print(user.model_dump())  # In v2, use model_dump() instead of dict()
## Output: {'id': 1, 'name': 'Alice', 'is_active': True, 'email': None}

try:
    # Invalid data - 'id' should be an integer
    user = User(id="not-an-integer", name="Bob")
except Exception as e:
    print(f"Validation error: {e}")
    # Output will show detailed validation errors

Pydantic automatically converts data when possible (like strings to integers) and raises clear validation errors when conversion isn’t possible.

Type Coercion

One powerful feature of Pydantic is its ability to coerce types when it makes sense:

## Pydantic will convert "42" to the integer 42
user = User(id="42", name="Charlie")
print(user.id)  # Output: 42 (as an integer, not a string)
print(type(user.id))  # Output: <class 'int'>

Field Validation

For more control over validation, use the Field function:

from pydantic import BaseModel, Field

class Product(BaseModel):
    id: int
    name: str = Field(..., min_length=3)  # Required, minimum 3 characters
    price: float = Field(gt=0)  # Must be greater than 0
    discount: float = Field(default=0, ge=0, le=1)  # Between 0 and 1

The ... in Field(...) indicates that the field is required but doesn’t have a default value.

Accessing Model Data

Pydantic models behave like dataclasses with some additional methods:

user = User(id=1, name="Alice")

## Access fields as attributes
print(user.name)  # Output: Alice

## Convert to dictionary
user_dict = user.model_dump()  # In v1, use dict()
print(user_dict)  # Output: {'id': 1, 'name': 'Alice', 'is_active': True, 'email': None}

## Convert to JSON
user_json = user.model_dump_json()  # In v1, use json()
print(user_json)  # Output: {"id":1,"name":"Alice","is_active":true,"email":null}

## Check if a field was explicitly set or uses default
print(user.model_fields_set)  # Output: {'id', 'name'}

Model Methods and Properties

Pydantic models come with several useful methods:

## Create a copy
user2 = user.model_copy()  # In v1, use copy()

## Create a copy with updates
user3 = user.model_copy(update={"name": "Alice Smith"})  # In v1, use copy()

## Get JSON schema
schema = User.model_json_schema()  # In v1, use schema()
print(schema)

Type Annotations with Python’s Typing Module

Pydantic leverages Python’s typing module for more complex type definitions:

from typing import List, Dict, Optional, Union, Literal
from pydantic import BaseModel

class AdvancedUser(BaseModel):
    # Union type (either string or int)
    id: Union[str, int]  # In Python 3.10+: id: str | int
    
    # Optional is equivalent to Union[T, None]
    middle_name: Optional[str] = None  # In Python 3.10+: middle_name: str | None = None
    
    # List of specific objects
    tags: List[str] = []
    
    # Literal for specific allowed values
    status: Literal["active", "inactive", "pending"] = "active"
    
    # Dictionary with specific key and value types
    metadata: Dict[str, Union[str, int, bool]] = {}

This example demonstrates the basics of creating and using Pydantic models. The library’s power comes from combining Python’s type annotations with runtime validation, giving you both the benefits of static typing and the safety of runtime checks.

In the next section, we’ll explore how to configure your models with more advanced options using the Config class.

Pydantic Model Configuration

One of Pydantic’s most powerful features is the ability to customize model behavior through the Config class. This inner class allows you to control validation, serialization, and many other aspects of how your models work.

The Config Class

Every Pydantic model can include a Config class that defines model-wide settings:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    
    class Config:
        # Configuration options go here
        title = "User Model"
        frozen = True  # Make instances immutable

Let’s explore the most useful configuration options.

Common Configuration Options

Controlling Extra Fields

By default, Pydantic will ignore extra fields that aren’t defined in your model. You can change this behavior:

class User(BaseModel):
    id: int
    name: str
    
    class Config:
        # Options: 'ignore', 'forbid', or 'allow'
        extra = "forbid"  # Raise an error if extra fields are provided

Options:

  • "ignore" (default): Extra fields are ignored
  • "forbid": Raises an error if extra fields are provided
  • "allow": Extra fields are included in the model

Field Aliases

Sometimes your data source uses different field names than you want in your model:

from pydantic import BaseModel, Field

class User(BaseModel):
    user_id: int = Field(alias="id")
    full_name: str = Field(alias="name")
    
    class Config:
        # Allow populating by field name, not just by alias
        populate_by_name = True

With this configuration, your model will accept both {"id": 1, "name": "John"} and {"user_id": 1, "full_name": "John"}.

Case Sensitivity

You can make field names case-insensitive:

class CaseInsensitiveModel(BaseModel):
    NAME: str
    Age: int
    
    class Config:
        case_sensitive = False

This allows {"name": "John", "age": 30} to be valid input.

Allowing Arbitrary Types

By default, Pydantic only works with types it knows how to validate. To use custom types:

from pydantic import BaseModel
from PIL import Image  # A non-Pydantic type

class Profile(BaseModel):
    name: str
    avatar: Image.Image
    
    class Config:
        arbitrary_types_allowed = True

Immutable Models

You can make your models immutable (frozen):

class ImmutableUser(BaseModel):
    id: int
    name: str
    
    class Config:
        frozen = True  # In Pydantic v1, this was called "allow_mutation = False"

After creation, attempting to modify a field will raise an error:

user = ImmutableUser(id=1, name="Alice")
user.name = "Bob"  # This will raise an error

Schema Customization

You can customize the JSON Schema generated for your model:

class User(BaseModel):
    id: int
    name: str
    
    class Config:
        title = "User Information"
        description = "A model representing a user in our system"
        schema_extra = {
            "examples": [
                {
                    "id": 1,
                    "name": "John Doe"
                }
            ]
        }

Validation Behavior

Control how validation works:

class StrictModel(BaseModel):
    id: int
    ratio: float
    
    class Config:
        validate_assignment = True  # Validate when attributes are set
        strict = True  # Disable automatic type conversion

With strict = True, providing a string like "42" for an integer field will raise an error instead of converting it.

Config in Practice: Complete Example

Here’s a more comprehensive example showing several configuration options together:

from datetime import datetime
from pydantic import BaseModel, Field

class UserProfile(BaseModel):
    user_id: int = Field(alias="id")
    name: str
    created_at: datetime
    last_login: datetime | None = None
    settings: dict = {}
    
    class Config:
        # Allow population by field name and alias
        populate_by_name = True
        
        # Validate when attributes are assigned
        validate_assignment = True
        
        # Forbid extra fields
        extra = "forbid"
        
        # Custom JSON schema metadata
        title = "User Profile"
        description = "Complete user profile information"
        
        # JSON serialization options
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
        }
        
        # Example for documentation
        schema_extra = {
            "examples": [
                {
                    "id": 1,
                    "name": "Jane Doe",
                    "created_at": "2023-01-01T00:00:00",
                    "last_login": "2023-06-15T14:30:00",
                    "settings": {"theme": "dark", "notifications": True}
                }
            ]
        }

Version Differences (v1 vs v2)

Some configuration options have changed between Pydantic v1 and v2:

v1 Optionv2 OptionDescription
allow_mutationfrozenControls whether model is immutable
orm_modefrom_attributesEnables loading data from ORM objects
schemajson_schema_extraAdds extra info to schema
allow_population_by_field_namepopulate_by_nameAllows using field names alongside aliases

Config Inheritance

Config settings are inherited when you subclass models:

class BaseConfig(BaseModel):
    class Config:
        extra = "forbid"
        frozen = True

class UserModel(BaseConfig):  # Inherits Config from BaseConfig
    name: str
    email: str
    
    class Config:
        # Override specific options while inheriting others
        frozen = False

The Config class is a powerful way to customize how your Pydantic models behave. By setting the right configuration options, you can make your models more flexible, strict, or tailored to your specific use case.

In the next section, we’ll explore Pydantic’s data validation capabilities in more depth, including custom validators and complex validation rules.

Data Validation with Pydantic

One of Pydantic’s most powerful features is its robust data validation system. While basic type checking happens automatically, Pydantic offers many ways to implement complex validation rules for your data.

Built-in Validators

Pydantic includes many built-in validators through the Field function:

from pydantic import BaseModel, Field

class Product(BaseModel):
    id: int
    name: str = Field(min_length=3, max_length=50)
    price: float = Field(gt=0)  # greater than 0
    discount: float = Field(ge=0, le=1)  # between 0 and 1 inclusive
    tags: list[str] = Field(min_items=1, max_items=10)  # between 1 and 10 items
    sku: str = Field(pattern=r'^[A-Z]{2}-\d{6}$')  # regex pattern validation

Common Field Constraints

ConstraintTypeDescription
gtgeNumbersGreater than (or equal)
ltleNumbersLess than (or equal)
min_lengthmax_lengthStrings, ListsMin/max length
regexpatternStringsRegular expression pattern
min_itemsmax_itemsListsMin/max number of items
min_digitsmax_digitsDecimalDigit count constraints

Custom Validators with @validator

For more complex validation logic, use the @validator decorator:

from pydantic import BaseModel, validator
from datetime import datetime

class User(BaseModel):
    id: int
    username: str
    password: str
    password_confirm: str
    birth_date: datetime = None
    
    # Validate a single field
    @validator('username')
    def username_alphanumeric(cls, v):
        if not v.isalnum():
            raise ValueError('must be alphanumeric')
        return v
    
    # Validate password confirmation
    @validator('password_confirm')
    def passwords_match(cls, v, values):
        if 'password' in values and v != values['password']:
            raise ValueError('passwords do not match')
        return v
    
    # Validate birth_date is in the past
    @validator('birth_date')
    def birth_date_in_past(cls, v):
        if v and v > datetime.now():
            raise ValueError('birth date must be in the past')
        return v

The @validator decorator takes the field name(s) to validate and can access:

  • The value being validated
  • Previously validated values through the values parameter

Validator Options

The @validator decorator accepts several options:

class Item(BaseModel):
    name: str
    quantity: int
    
    @validator('quantity', pre=True)  # Run before type conversion
    def check_quantity_positive(cls, v):
        if isinstance(v, str) and v.isdigit():
            v = int(v)
        if v <= 0:
            raise ValueError('must be positive')
        return v
    
    @validator('name', always=True)  # Run even if field is missing (has default)
    def check_name_not_empty(cls, v):
        if not v.strip():
            raise ValueError('cannot be empty')
        return v.strip()

Root Validators

For validations that depend on multiple fields, use @root_validator:

from pydantic import BaseModel, root_validator

class Payment(BaseModel):
    amount: float
    discount: float = 0
    final_amount: float = None
    
    @root_validator
    def calculate_final_amount(cls, values):
        amount = values.get('amount', 0)
        discount = values.get('discount', 0)
        
        if amount < 0:
            raise ValueError('amount must be positive')
        
        if discount < 0 or discount > 1:
            raise ValueError('discount must be between 0 and 1')
        
        # Calculate the final amount after discount
        values['final_amount'] = amount * (1 - discount)
        
        return values

Root validators are executed after all field validation and can:

  • Access all fields at once
  • Implement cross-field validations
  • Set derived fields based on other values

Field Validators (Pydantic v2)

In Pydantic v2, there’s a new way to define field-specific validators using the field_validator decorator:

from pydantic import BaseModel, field_validator

class User(BaseModel):
    username: str
    email: str
    
    @field_validator('username')
    @classmethod  # Required in v2
    def validate_username(cls, value):
        if len(value) < 3:
            raise ValueError('Username must be at least 3 characters')
        if not value.isalnum():
            raise ValueError('Username must be alphanumeric')
        return value
    
    @field_validator('email')
    @classmethod
    def validate_email(cls, value):
        if '@' not in value:
            raise ValueError('Invalid email format')
        return value.lower()  # Normalize emails to lowercase

Model Validators (Pydantic v2)

In Pydantic v2, root_validator is replaced with model_validator:

from pydantic import BaseModel, model_validator

class Order(BaseModel):
    item_count: int
    items: list[str]
    
    @model_validator(mode='after')
    def check_items_count(self):
        if len(self.items) != self.item_count:
            raise ValueError(f'Item count ({self.item_count}) does not match items list length ({len(self.items)})')
        return self

The mode parameter can be:

  • 'before': Run before validation (similar to pre=True in v1)
  • 'after': Run after validation (default)

Custom Error Messages

You can customize error messages for better user experience:

from pydantic import BaseModel, Field, validator

class User(BaseModel):
    username: str = Field(..., min_length=3, max_length=20)
    email: str
    
    @validator('email')
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Please provide a valid email address')
        return v

For more complex scenarios, you can raise ValueError with custom messages in your validators.

Error Handling

Pydantic provides detailed validation errors that you can handle in your application:

from pydantic import BaseModel, ValidationError

class User(BaseModel):
    username: str
    email: str
    age: int

try:
    user = User(username="john", email="not-an-email", age="twenty")
except ValidationError as e:
    print(f"Validation error: {e}")
    
    # Access structured error data
    errors = e.errors()
    for error in errors:
        print(f"Field: {error['loc'][0]}, Error: {error['msg']}")
    
    # Convert to JSON
    json_errors = e.json()
    print(f"JSON errors: {json_errors}")

Conditional Validation

Sometimes you need to validate fields based on conditions:

from pydantic import BaseModel, validator

class Product(BaseModel):
    name: str
    is_digital: bool = False
    shipping_weight: float = None
    download_url: str = None
    
    @validator('shipping_weight')
    def validate_shipping_weight(cls, v, values):
        is_digital = values.get('is_digital', False)
        
        if not is_digital and (v is None or v <= 0):
            raise ValueError('Physical products must have a shipping weight')
        
        return v
    
    @validator('download_url')
    def validate_download_url(cls, v, values):
        is_digital = values.get('is_digital', False)
        
        if is_digital and not v:
            raise ValueError('Digital products must have a download URL')
        
        return v

Pydantic’s validation system gives you the tools to ensure your data meets your application’s requirements. From simple type checks to complex cross-field validations, you can implement almost any validation logic while keeping your models clean and maintainable.

Working with Complex Data Types in Pydantic

Real-world data is rarely as simple as strings and integers. Pydantic excels at handling complex, nested data structures that better represent the relationships in your data. In this section, we’ll explore how to work with nested models, collections, and other complex data types.

Nested Models

One of Pydantic’s most powerful features is the ability to nest models within each other:

from pydantic import BaseModel
from typing import List

class Address(BaseModel):
    street: str
    city: str
    zip_code: str
    country: str

class User(BaseModel):
    name: str
    email: str
    address: Address  # Nested model

When creating a User instance, you can provide the address as a dictionary, and Pydantic will automatically convert it to an Address instance:

user = User(
    name="John Doe",
    email="john@example.com",
    address={
        "street": "123 Main St",
        "city": "Anytown",
        "zip_code": "12345",
        "country": "US"
    }
)

print(user.address)  # Output: address=Address(street='123 Main St', city='Anytown', zip_code='12345', country='US')
print(type(user.address))  # Output: <class '__main__.Address'>

Lists, Sets, and Tuples

Pydantic supports various collection types with type validation for their contents:

from typing import List, Set, Tuple
from pydantic import BaseModel

class BlogPost(BaseModel):
    title: str
    content: str
    tags: List[str] = []  # A list of strings
    categories: Set[str] = set()  # A set of strings (no duplicates)
    related_posts: List[int] = []  # A list of post IDs
    coordinates: Tuple[float, float] = None  # A tuple with two floats

Usage example:

post = BlogPost(
    title="Working with Pydantic",
    content="Pydantic is awesome...",
    tags=["python", "pydantic", "validation"],
    categories={"tutorial", "programming"},
    related_posts=[1, 2, 3],
    coordinates=(40.7128, -74.0060)  # New York coordinates
)

Dictionaries

Dictionaries can have typed keys and values:

from typing import Dict, Any
from pydantic import BaseModel

class Configuration(BaseModel):
    # Dictionary with string keys and string values
    string_settings: Dict[str, str] = {}
    
    # Dictionary with string keys and any value type
    mixed_settings: Dict[str, Any] = {}
    
    # Dictionary with string keys and integer values
    numeric_settings: Dict[str, int] = {}

Usage:

config = Configuration(
    string_settings={"theme": "dark", "language": "en-US"},
    mixed_settings={
        "theme": "dark",
        "timeout": 30,
        "debug": True,
        "factors": [1.1, 2.2, 3.3]
    },
    numeric_settings={"timeout": 30, "max_retries": 5}
)

Union Types

Union types allow a field to accept multiple types:

from typing import Union, List
from pydantic import BaseModel

class Item(BaseModel):
    # Can be either an integer or a string
    id: Union[int, str]  # Python 3.10+: id: int | str
    
    # Can be a string or a list of strings
    tags: Union[str, List[str]] = []  # Python 3.10+: tags: str | list[str] = []

This allows flexibility in your data model:

## Both are valid
item1 = Item(id=1, tags=["electronics", "gadget"])
item2 = Item(id="ABC-123", tags="electronics")

## For item2, if a string is provided for tags, you might want to convert it to a list
if isinstance(item2.tags, str):
    tags_list = [item2.tags]

Optional Fields

Optional fields can be defined in two ways:

from typing import Optional
from pydantic import BaseModel

class User(BaseModel):
    # Using Optional from typing
    middle_name: Optional[str] = None  # Python 3.10+: middle_name: str | None = None
    
    # Using Union with None
    nickname: Union[str, None] = None  # Python 3.10+: nickname: str | None = None
    
    # Implicitly optional by providing None as default
    bio: str = None

Working with Datetime Objects

Pydantic has excellent support for date and time types:

from datetime import datetime, date, time, timedelta
from pydantic import BaseModel

class Event(BaseModel):
    name: str
    start_date: date
    end_date: date
    start_time: time
    duration: timedelta
    created_at: datetime

Pydantic can parse various string formats automatically:

event = Event(
    name="Conference",
    start_date="2023-09-15",  # ISO format string
    end_date=date(2023, 9, 17),  # Python date object
    start_time="09:00:00",  # Time string
    duration="3h 30m",  # Duration string
    created_at="2023-06-01T12:30:45"  # ISO format datetime
)

print(event.start_date)  # Output: 2023-09-15
print(type(event.start_date))  # Output: <class 'datetime.date'>

Enumerations

Pydantic works well with Python’s Enum class:

from enum import Enum, auto
from pydantic import BaseModel

class UserRole(str, Enum):
    ADMIN = "admin"
    EDITOR = "editor"
    VIEWER = "viewer"

class PaymentStatus(Enum):
    PENDING = auto()
    COMPLETED = auto()
    FAILED = auto()

class User(BaseModel):
    name: str
    role: UserRole = UserRole.VIEWER

class Payment(BaseModel):
    amount: float
    status: PaymentStatus = PaymentStatus.PENDING

Usage:

## Using string value for enum
user = User(name="Alice", role="admin")  # Automatically converted to UserRole.ADMIN
print(user.role)  # Output: UserRole.ADMIN
print(user.role == UserRole.ADMIN)  # Output: True

## Using enum directly
user2 = User(name="Bob", role=UserRole.EDITOR)

Custom Data Types

You can create custom data types by implementing validation logic:

from pydantic import BaseModel, validator
import re

## Custom email type
class Email(str):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate
    
    @classmethod
    def validate(cls, v):
        if not isinstance(v, str):
            raise TypeError('string required')
        
        pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
        if not re.match(pattern, v):
            raise ValueError('invalid email format')
        
        return cls(v)

class User(BaseModel):
    name: str
    email: Email

In Pydantic v2, custom types are created differently:

from pydantic import BaseModel, field_validator
from pydantic.types import StringConstraints
from typing import Annotated

## Using Annotated with constraints
Email = Annotated[str, StringConstraints(pattern=r'^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$')]

class User(BaseModel):
    name: str
    email: Email

Recursive Models

You can create recursive models for tree-like structures:

from typing import List, Optional
from pydantic import BaseModel, Field

class Comment(BaseModel):
    id: int
    text: str
    replies: List['Comment'] = []  # Self-reference

## This is needed for the self-reference to work
Comment.model_rebuild()

## Create a nested comment structure
comment = Comment(
    id=1,
    text="Great article!",
    replies=[
        Comment(id=2, text="I agree!"),
        Comment(
            id=3,
            text="Thanks!",
            replies=[Comment(id=4, text="You're welcome!")]
        )
    ]
)

Forward References

When models reference each other, you can use string literals as forward references:

from typing import List, Optional
from pydantic import BaseModel

class User(BaseModel):
    name: str
    posts: List['Post'] = []  # Forward reference

class Post(BaseModel):
    title: str
    author: Optional['User'] = None  # Forward reference

## Update the models to resolve forward references
User.model_rebuild()
Post.model_rebuild()

Pydantic’s support for complex data types allows you to model even the most sophisticated data structures while maintaining type safety and validation. By combining nested models, collections, and custom types, you can create expressive, self-documenting data models that accurately represent your application’s domain.

Schema Generation with Pydantic

One of Pydantic’s most powerful features is its ability to automatically generate JSON Schema from your data models. This capability is especially valuable when building APIs or documenting data structures, as it provides a standardized way to describe the expected shape and constraints of your data.

What is JSON Schema?

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents. It provides a contract for what JSON data is required for a given application and how that data should be structured. This is particularly useful for:

  • Validating client-submitted data
  • Generating documentation
  • Creating mock data
  • Enabling auto-completion in IDEs
  • Supporting code generation

Basic Schema Generation

Every Pydantic model can generate its JSON Schema representation:

from pydantic import BaseModel, Field
from typing import List, Optional
import json

class User(BaseModel):
    id: int = Field(gt=0, description="The user ID")
    name: str = Field(min_length=2, description="The user's full name")
    email: str
    is_active: bool = True
    tags: List[str] = []

## Generate the JSON Schema
schema = User.model_json_schema()  # In v1: schema()

## Pretty print the schema
print(json.dumps(schema, indent=2))

This produces a JSON Schema that describes the model:

{
  "title": "User",
  "type": "object",
  "properties": {
    "id": {
      "title": "Id",
      "description": "The user ID",
      "exclusiveMinimum": 0,
      "type": "integer"
    },
    "name": {
      "title": "Name",
      "description": "The user's full name",
      "minLength": 2,
      "type": "string"
    },
    "email": {
      "title": "Email",
      "type": "string"
    },
    "is_active": {
      "title": "Is Active",
      "default": true,
      "type": "boolean"
    },
    "tags": {
      "title": "Tags",
      "default": [],
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": [
    "id",
    "name",
    "email"
  ]
}

Customizing Schema Generation

Pydantic provides several ways to customize the generated schema:

Field Customization

The Field function allows you to add metadata to your model fields:

from pydantic import BaseModel, Field

class Product(BaseModel):
    id: int = Field(
        ...,  # ... means required
        gt=0,
        description="Unique product identifier",
        examples=[1, 2, 3]
    )
    name: str = Field(
        ...,
        min_length=3,
        max_length=50,
        description="Product name",
        examples=["Smartphone", "Laptop"]
    )
    price: float = Field(
        ...,
        gt=0,
        description="Product price in USD",
        examples=[499.99, 1299.99]
    )

Schema Customization via Config

You can customize the schema at the model level using the Config class:

class Product(BaseModel):
    id: int
    name: str
    price: float
    
    class Config:
        title = "Product Information"
        description = "Detailed information about a product in our catalog"
        schema_extra = {
            "examples": [
                {
                    "id": 1,
                    "name": "Smartphone",
                    "price": 699.99
                }
            ]
        }

Schema for Nested Models

Pydantic automatically handles nested models in schema generation:

from pydantic import BaseModel, Field
from typing import List, Optional

class Address(BaseModel):
    street: str
    city: str
    country: str
    postal_code: str

class User(BaseModel):
    id: int
    name: str
    addresses: List[Address]

## Get the schema with nested models
schema = User.model_json_schema()

The generated schema will include the full definition of the Address model.

Schema References with $ref

For more complex models with shared components, Pydantic can generate schemas with references:

from pydantic import BaseModel
from typing import List

class Tag(BaseModel):
    id: int
    name: str

class Category(BaseModel):
    id: int
    name: str

class Product(BaseModel):
    id: int
    name: str
    tags: List[Tag]
    category: Category

## Generate schema with references
schema = Product.model_json_schema(ref_template="#/components/schemas/{model}")

This produces a schema with references to component definitions.

OpenAPI Integration

Pydantic’s schema generation is particularly valuable when working with OpenAPI (formerly Swagger) for API documentation. Libraries like FastAPI use Pydantic’s schema generation to automatically create OpenAPI documentation:

from fastapi import FastAPI, Path
from pydantic import BaseModel

app = FastAPI()

class Item(BaseModel):
    id: int
    name: str
    description: str = None
    price: float
    tax: float = None

@app.post("/items/", response_model=Item)
async def create_item(item: Item):
    return item

@app.get("/items/{item_id}", response_model=Item)
async def read_item(item_id: int = Path(..., gt=0)):
    # Retrieve item from database
    return {"id": item_id, "name": "Example", "price": 9.99}

FastAPI uses the Pydantic models to:

  1. Validate request and response data
  2. Generate OpenAPI documentation
  3. Create automatic interactive documentation with Swagger UI

Schema Customization with Field Types

Pydantic provides specialized field types that affect schema generation:

from pydantic import BaseModel, Field, HttpUrl, EmailStr, constr, confloat

class User(BaseModel):
    id: int
    name: str
    email: EmailStr  # Specialized email string type
    website: HttpUrl  # URL type with validation
    username: constr(min_length=3, max_length=20, pattern=r'^[a-zA-Z0-9_-]+$')  # Constrained string
    rating: confloat(ge=0, le=5)  # Constrained float

These specialized types add appropriate validations and formats to the schema.

Version Differences (v1 vs v2)

Schema generation has some differences between Pydantic v1 and v2:

VersionMethodNotes
v1model.schema()Original schema generation method
v2model.model_json_schema()Renamed method with improved functionality

In v2, there are also changes to how schema customization works:

## Pydantic v1
class ModelV1(BaseModel):
    class Config:
        schema_extra = {"examples": [{"id": 1}]}

## Pydantic v2
class ModelV2(BaseModel):
    model_config = {
        "json_schema_extra": {"examples": [{"id": 1}]}
    }

Practical Applications

Generating Documentation

You can use the generated schema to create documentation for your data models:

import json
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

## Save schema to a file
with open('user_schema.json', 'w') as f:
    json.dump(User.model_json_schema(), f, indent=2)

Data Validation with JSON Schema

The generated schema can be used with JSON Schema validators:

import jsonschema
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

## Get the schema
schema = User.model_json_schema()

## Valid data
valid_data = {"id": 1, "name": "John", "email": "john@example.com"}

## Invalid data
invalid_data = {"id": "not an integer", "name": "John"}

## Validate
try:
    jsonschema.validate(instance=valid_data, schema=schema)
    print("Valid data validated successfully")
except jsonschema.exceptions.ValidationError as e:
    print(f"Validation error: {e}")

try:
    jsonschema.validate(instance=invalid_data, schema=schema)
    print("Invalid data validated successfully (shouldn't happen)")
except jsonschema.exceptions.ValidationError as e:
    print(f"Validation error (expected): {e}")

Mock Data Generation

You can use the schema to generate mock data for testing:

from pydantic import BaseModel
import json
import requests

class User(BaseModel):
    id: int
    name: str
    email: str
    is_active: bool

## Generate schema
schema = User.model_json_schema()

## Use a service like json-schema-faker or mockend
response = requests.post(
    "https://some-mock-service.com/generate",
    json={"schema": schema, "count": 5}
)

mock_users = response.json()
print(json.dumps(mock_users, indent=2))

Pydantic’s schema generation capabilities provide a powerful way to document and validate your data models. By leveraging JSON Schema, you can create self-documenting code that integrates seamlessly with modern API frameworks, documentation tools, and validation libraries.

This feature is particularly valuable in larger projects where maintaining consistent data structures and clear documentation is essential. Whether you’re building APIs, processing complex data, or integrating with external systems, Pydantic’s schema generation helps ensure your data models are well-defined and properly validated.