Categories ML-AI Python

BaseModel vs TypedDict in LangGraph Agent State Management

When developing agents with LangGraph, one of the fundamental decisions developers face is choosing between Pydantic’s BaseModel and Python’s TypedDict for state management. Let’s explore these options to help you make the right choice for your agent implementation. For more information on Pydantic read our Pydantic data validation blog.

Understanding the Fundamentals

BaseModel

Pydantic’s BaseModel offers a robust, class-based approach to data modeling with built-in validation. It’s like having a strict but helpful guardian for your agent’s state, ensuring that data remains consistent and valid throughout the agent’s lifecycle.

from pydantic import BaseModel

class AgentState(BaseModel):
    current_step: str
    memory: list[str]
    context: dict

TypedDict

TypedDict, on the other hand, provides a lighter, more streamlined approach. It’s Python’s native way of adding type hints to dictionaries, offering static type checking without the runtime overhead.

from typing import TypedDict

class AgentState(TypedDict):
    current_step: str
    memory: list[str]
    context: dict

Key Differences and Trade-offs

Validation and Type Checking

  • BaseModel performs runtime validation and type checking, catching errors as they happen
  • TypedDict only provides static type hints, checked by tools like mypy but not during runtime

Feature Set

  • BaseModel offers:
    • Automatic type coercion
    • Nested model support
    • Rich validation rules
    • Detailed error messages
  • TypedDict provides:
    • Lightweight type definitions
    • Native Python integration
    • Minimal overhead
    • Basic type checking

When to Use Each Approach

Choose BaseModel When:

  1. Your agent has complex state requirements
  2. You need runtime validation
  3. You’re working with external APIs or untrusted data
  4. You want rich error messages and debugging support

Choose TypedDict When:

  1. Performance is a priority
  2. Your state structure is simple
  3. Static type checking is sufficient

Performance Considerations

Performance differences become apparent in larger applications:

  • BaseModel:
    • Higher memory usage
    • Additional validation overhead
    • Better for complex data structures
  • TypedDict:
    • Minimal memory footprint
    • Fast instantiation
    • Ideal for simple data structures

Implementation Examples

Let’s walk through complete agent implementations using both approaches to demonstrate the practical differences.

BaseModel Implementation

from pydantic import BaseModel, Field
from typing import List, Dict, Literal
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import AIMessage, HumanMessage

# Define state with BaseModel
class AgentState(BaseModel):
    conversation_history: List[HumanMessage | AIMessage] = Field(default_factory=list)
    research_findings: Dict[str, str] = Field(default_factory=dict)
    current_task: str = ""
    status: Literal["researching", "answering", END] = "researching"
    
    # Pydantic validation ensures these fields maintain correct types
    # and we can use default_factory to initialize empty collections

# Define our nodes (agent components)
def researcher(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-3.5-turbo")
    
    # We can access state attributes directly as properties
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a research assistant. Find information about: {task}"),
        ("human", "I need information about {task}. Provide key facts.")
    ])
    
    chain = prompt | llm
    response = chain.invoke({"task": state.current_task})
    
    # Update state using model methods
    updated_state = state.model_copy(deep=True)
    updated_state.research_findings[state.current_task] = response.content
    updated_state.conversation_history.append(HumanMessage(content=f"Research: {state.current_task}"))
    updated_state.conversation_history.append(AIMessage(content=response.content))
    updated_state.status = "answering"
    
    return updated_state

def answerer(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-3.5-turbo")
    
    research = state.research_findings.get(state.current_task, "No research found")
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant. Use the research to answer the question."),
        ("human", "Question: {task}\n\nResearch: {research}")
    ])
    
    chain = prompt | llm
    response = chain.invoke({"task": state.current_task, "research": research})
    
    # Create updated state with validation
    updated_state = state.model_copy(deep=True)
    updated_state.conversation_history.append(AIMessage(content=response.content))
    updated_state.status = "complete"
    
    return updated_state

# Define conditional edges
def router(state: AgentState) -> str:
    return state.status

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher)
workflow.add_node("answerer", answerer)

# Add edges
workflow.add_edge("researcher", "answerer")
workflow.add_edge("answerer", END)
workflow.set_entry_point("researcher")

# Compile the graph
agent = workflow.compile()

# Run the agent
result = agent.invoke({
    "current_task": "quantum computing basics",
    "status": "researching"
})

# Correct access for the result
# The result is returned as a dict-like object, not directly as our BaseModel
final_answer = result["conversation_history"][-1].content
print(final_answer)

TypedDict Implementation

from typing import TypedDict, List, Dict, Literal, Union, cast
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import AIMessage, HumanMessage, BaseMessage

# Define state with TypedDict
class AgentState(TypedDict, total=False):
    conversation_history: List[Union[HumanMessage, AIMessage]]
    research_findings: Dict[str, str]
    current_task: str
    status: Literal["researching", "answering", END]

# Define our nodes (agent components)
def researcher(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-3.5-turbo")
    
    # With TypedDict, we access via dictionary style
    task = state.get("current_task", "")
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a research assistant. Find information about: {task}"),
        ("human", "I need information about {task}. Provide key facts.")
    ])
    
    chain = prompt | llm
    response = chain.invoke({"task": task})
    
    # Need to create a new dictionary for the updated state
    # No validation happens here - we must be careful with types
    conversation_history = state.get("conversation_history", [])
    research_findings = state.get("research_findings", {})
    
    return {
        "conversation_history": conversation_history + [
            HumanMessage(content=f"Research: {task}"),
            AIMessage(content=response.content)
        ],
        "research_findings": {
            **research_findings,
            task: response.content
        },
        "current_task": task,
        "status": "answering"
    }

def answerer(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-3.5-turbo")
    
    task = state.get("current_task", "")
    research_findings = state.get("research_findings", {})
    research = research_findings.get(task, "No research found")
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant. Use the research to answer the question."),
        ("human", "Question: {task}\n\nResearch: {research}")
    ])
    
    chain = prompt | llm
    response = chain.invoke({"task": task, "research": research})
    
    # Create updated state (manual copy required)
    conversation_history = state.get("conversation_history", [])
    
    return {
        "conversation_history": conversation_history + [
            AIMessage(content=response.content)
        ],
        "research_findings": research_findings,
        "current_task": task, 
        "status": "complete"
    }

# Define conditional edges
def router(state: AgentState) -> str:
    return state.get("status", "researching")

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher)
workflow.add_node("answerer", answerer)

# Add edges
workflow.add_edge("researcher", "answerer")
workflow.add_edge("answerer", END)
workflow.set_entry_point("researcher")

# Compile the graph
agent = workflow.compile()

# Run the agent
result = agent.invoke({
    "current_task": "quantum computing basics",
    "status": "researching",
    "conversation_history": [],
    "research_findings": {}
})

# Access results with dictionary syntax
final_answer = result["conversation_history"][-1].content
print(final_answer)

Notice the key implementation differences:

  1. State Access and Modification: BaseModel uses attribute access and structured copying, while TypedDict uses dictionary-style access
  2. Default Values: BaseModel handles defaults elegantly with Field(default_factory=list), while TypedDict requires manual defaults with .get()
  3. Validation: BaseModel enforces types at runtime, while TypedDict won’t raise errors for mismatched types
  4. State Updates: BaseModel uses .model_copy() for proper state updates, while TypedDict requires manual dictionary construction

Best Practices and Recommendations

  1. Start Simple: Begin with TypedDict if your agent state is straightforward
  2. Scale Up: Migrate to BaseModel when you need more robust validation
  3. Consider Context: Match your choice to your use case requirements
  4. Balance Features: Weigh validation needs against performance requirements

Conclusion

Both BaseModel and TypedDict are valid choices for LangGraph agent state management. BaseModel offers robust validation and rich features at the cost of performance, while TypedDict provides lightweight, efficient state management with static type checking. Choose based on your specific needs for validation, performance, and complexity.

More From Author

You May Also Like