Test-driven development (TDD) is a powerful approach for building robust and maintainable LangGraph agents. This guide will walk you through the process of implementing TDD practices using pytest while developing LangGraph-based agent systems.
For more information more information on creating LangGraph Agents read our article LangGraph Basics: Building Advanced AI Agents with Graph Architecture.
Getting Started
Before diving into TDD, let’s set up our development environment. LangGraph agents require several key packages:
# Run these commands in your project root directory
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install langgraph langchain pytest pytest-asyncio pytest-cov faker langgraph-cli langchain-openai langchain-community
These commands create a virtual environment and install all the necessary packages for developing and testing LangGraph agents, including pytest and its coverage extension.
Project Structure
A well-organized project structure is crucial for maintainable tests. Here’s a recommended layout:
project/
├── src/
│ ├── agents/ # Agent implementations
├── tests/
│ └── test_agents/ # Agent-specific tests
├── conftest.py # Shared pytest fixtures
├── pyproject.toml
└── pytest.ini # pytest configuration
This structure separates your agent implementation code from test code, making the codebase easier to navigate and maintain.
Testing Foundation
Setting Up Fixtures
Create reusable test components using pytest fixtures in conftest.py
:
# filepath: conftest.py
import pytest
from langchain_community.chat_models import ChatOpenAI
from langchain_core.messages import SystemMessage
@pytest.fixture
def mock_llm():
return ChatOpenAI(temperature=0)
@pytest.fixture
def base_system_message():
return SystemMessage(content="You are a helpful assistant.")
These fixtures provide common components that can be reused across multiple tests. The mock_llm
fixture provides a deterministic language model instance, while base_system_message
provides a standard system message for agent testing.
Writing Your First Agent Test
Start with a basic agent test following TDD principles:
# filepath: tests/test_agents/test_base_agent.py
import pytest
from src.agents.base_agent import BaseAgent
def test_agent_initialization():
agent = BaseAgent()
assert agent.state is None
assert agent.name == "base_agent"
@pytest.mark.asyncio
async def test_agent_process():
agent = BaseAgent()
result = await agent.process({"input": "test"})
assert "output" in result
This test file verifies that our agent initializes correctly with default values and can process input as expected. The @pytest.mark.asyncio
decorator enables testing of asynchronous functions.
Create a pyproject.toml
for development mode:
# filepath: pyproject.toml
[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "langgraph-agents"
version = "0.1.0"
description = "LangGraph agent testing examples"
readme = "README.md"
authors = [
{
name = "Your Name",
email = "your.email@example.com"
}
]
requires-python = ">=3.9"
dependencies = [
"langgraph",
"langchain",
"langchain-community",
"langchain-openai",
"langgraph-cli",
]
[project.optional-dependencies]
dev = [ "pytest", "pytest-asyncio", "pytest-cov", "faker", ]
[tool.setuptools]
packages = ["src"]
Then install it in development mode:
pip install -e .
This will install your package in development mode, making the src module available in your Python path.
Running Your First Tests
Once you’ve created your first agent and test files, run the tests from your project root:
# Run all tests
pytest
# Run a specific test file
pytest tests/test_agents/test_base_agent.py
# Run tests with detailed output
pytest -v
# Run tests and show print statements
pytest -v -s
If your tests pass, you should see output similar to:
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-7.3.1, pluggy-1.0.0
rootdir: /your/project/path
collected 2 items
tests/test_agents/test_base_agent.py .. [100%]
============================== 2 passed in 0.12s ==============================
At this point our tests will be failing because we haven’t written the agent to test yet. Let’s do that.
Basic Agent for Our Test
# filepath: src/agents/base_agent.py
from typing import Dict, Any, Optional
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, AIMessage
class BaseAgent:
"""
A basic LangGraph agent implementation for testing purposes.
"""
def __init__(self):
"""
Initialize the base agent with default parameters.
"""
self.name = "base_agent"
self.state = None
self.graph = self._build_graph()
def _build_graph(self) -> StateGraph:
"""
Build a simple state graph for the agent.
"""
# Define a simple state type
class State(dict):
pass
# Create a simple graph
graph = StateGraph(State)
# Define a simple processing node
def process_node(state: Dict[str, Any]) -> Dict[str, Any]:
input_text = state.get("input", "")
return {"input": input_text, "output": f"Processed: {input_text}"}
# Add node and configure graph
graph.add_node("process", process_node)
graph.set_entry_point("process")
graph.add_edge("process", END)
return graph.compile()
async def process(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Process input through the agent workflow.
Args:
input_data: Dictionary containing input data with at least an "input" key
Returns:
Dictionary containing the processed output
"""
try:
# Call invoke - it returns synchronously even though our method is async
result = self.graph.invoke(input_data)
self.state = result
# Return the result directly - the issue might be that result was None
if result is None:
# Provide a fallback if the graph didn't return a result
return {"input": input_data.get("input", ""), "output": f"Processed: {input_data.get('input', '')}"}
return result
except Exception as e:
# Handle any exceptions that might occur during graph execution
print(f"Error in agent processing: {e}")
return {"input": input_data.get("input", ""), "output": "Error processing input", "error": str(e)}
def analyze_input(self, text: str) -> Dict[str, Any]:
"""
Analyze input text to extract basic information.
Args:
text: Input text string
Returns:
Dictionary with analysis results
"""
# Simple intent analysis for testing
intent = "greeting" if any(word in text.lower() for word in ["hi", "hello", "hey"]) else "query"
return {
"intent": intent,
"text": text,
"length": len(text)
}
This BaseAgent
class implements a minimal LangGraph agent with a simple state graph. The graph contains just one processing node that transforms input text. The agent provides process()
and analyze_input()
methods that serve as the foundation for our testing examples.
Now when we run pytest
again the tests should pass. In the next section we’ll add more tests for our agent.
Testing Patterns for LangGraph
1. Unit Testing Agents
Test individual agent behaviors in isolation:
# filepath: tests/test_agents/test_base_agent.py
def test_agent_behavior():
agent = BaseAgent()
# Test specific agent capabilities
response = agent.analyze_input("Hello")
assert isinstance(response, dict)
assert "intent" in response
assert response["intent"] == "greeting" # Should detect "Hello" as a greeting
This test focuses on a single method of our agent, verifying that the analyze_input()
function correctly identifies greeting intents and returns the expected data structure.
Running Focused Unit Tests
To run only your unit tests, use:
# Run all unit tests for base_agent
pytest tests/test_agents/test_base_agent.py -v
# Run a specific test function
pytest tests/test_agents/test_base_agent.py::test_agent_behavior -v
# Run tests matching a pattern
pytest -k "agent_behavior" -v
The -k
option is particularly useful as your test suite grows, allowing you to run tests matching specific patterns without having to specify exact paths.
2. Testing Graph Flows
Verify complete workflow executions:
# filepath: tests/test_agents/test_agent_workflow.py
from src.agents.agent_workflow import AgentWorkflow
def test_graph_execution():
graph = AgentWorkflow()
initial_state = {"message": "Hello"}
result = graph.invoke(initial_state)
assert "response" in result
assert result["step_count"] > 0
This test verifies that a complete agent workflow runs as expected, checking that the state is properly updated throughout the execution and that the output contains necessary fields.
Now let’s implement the AgentWorkflow
class:
# filepath: src/agents/agent_workflow.py
from typing import Dict, Any, List
from langgraph.graph import StateGraph, END
class AgentWorkflow:
"""
A simple workflow that manages state across multiple steps
"""
def __init__(self):
self.graph = self._build_graph()
def _build_graph(self) -> StateGraph:
"""
Build a workflow graph with multiple steps and state tracking
"""
# Define a simple state type
class State(dict):
pass
# Create a graph
graph = StateGraph(State)
# Define processing steps
def process_input(state: Dict[str, Any]) -> Dict[str, Any]:
# Extract input message
message = state.get("message", "")
# Update state with processed content
return {
"message": message,
"processed_input": message.lower(),
"step_count": state.get("step_count", 0) + 1,
"data": state.get("data", []) + ["input processed"]
}
def generate_response(state: Dict[str, Any]) -> Dict[str, Any]:
# Generate response based on processed input
processed = state.get("processed_input", "")
# Update state with response
return {
**state,
"response": f"Response to: {processed}",
"step_count": state.get("step_count", 0) + 1,
"data": state.get("data", []) + ["response generated"]
}
# Add nodes
graph.add_node("process_input", process_input)
graph.add_node("generate_response", generate_response)
# Define edges
graph.set_entry_point("process_input")
graph.add_edge("process_input", "generate_response")
graph.add_edge("generate_response", END)
return graph.compile()
def invoke(self, initial_state: Dict[str, Any]) -> Dict[str, Any]:
"""
Execute the workflow with the given initial state
Args:
initial_state: Dictionary with starting state values
Returns:
Final state after workflow execution
"""
# Initialize step counter if not present
if "step_count" not in initial_state:
initial_state = {**initial_state, "step_count": 0}
# Initialize data collection if not present
if "data" not in initial_state:
initial_state = {**initial_state, "data": []}
# Execute the graph
result = self.graph.invoke(initial_state)
# Make sure we return a valid result
if result is None:
# If graph returns None, create a minimal valid result
return {
"message": initial_state.get("message", ""),
"response": f"Response to: {initial_state.get('message', '').lower()}",
"step_count": 1,
"data": ["fallback response generated"] # Added data key to the fallback response
}
return result
This implementation creates a simple two-step workflow that:
- Processes the input message
- Generates a response based on the processed input
- Tracks state changes through the workflow with a step counter and data collection
Now when we run pytest
again the tests should pass.
3. Mocking External Dependencies
Handle external dependencies using pytest’s mocking capabilities:
# filepath: tests/test_agents/test_llm_integration.py
import pytest
from src.agents.llm_agent import LLMAgent
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage
@pytest.fixture
def mock_llm_response(monkeypatch):
"""
Mock the LLM invoke method to return a predictable response
"""
# Create a mock that returns an AIMessage
mock_response = AIMessage(content="Mocked response")
# Create a mock for the invoke method
mock_invoke = MagicMock(return_value=mock_response)
# Replace the actual method with our mock
monkeypatch.setattr(
"langchain_community.chat_models.ChatOpenAI.invoke",
mock_invoke
)
return mock_invoke
def test_agent_with_mocked_llm(mock_llm_response):
"""
Test that our agent works correctly with a mocked LLM
"""
# Create agent with the mocked LLM
agent = LLMAgent()
# Generate a response
result = agent.generate_response("Test prompt")
# Verify the mock was called once
mock_llm_response.assert_called_once()
# Verify we got the expected response
assert result == "Mocked response"
# Check that our mocked method received the correct arguments
args, _ = mock_llm_response.call_args
messages = args[0]
# There should be two messages (system + human)
assert len(messages) == 2
assert messages[1].content == "Test prompt"
This test demonstrates how to mock external LLM calls to ensure consistent test behavior. By replacing the actual LLM call with a mock function, tests become deterministic and independent of external services.
Now following TDD best practices of writing tests before code let’s implement the LLM Agent:
# filepath: src/agents/llm_agent.py
from typing import Dict, Any, Optional, List
from langchain_community.chat_models import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langgraph.graph import StateGraph, END
class LLMAgent:
"""
An agent that uses an external LLM for generating responses
"""
def __init__(self, model_name="gpt-3.5-turbo", temperature=0):
"""
Initialize the LLM agent with a language model
"""
self.name = "llm_agent"
self.llm = ChatOpenAI(model_name=model_name, temperature=temperature)
self.system_message = SystemMessage(content="You are a helpful assistant.")
def generate_response(self, prompt: str) -> str:
"""
Generate a response using the LLM
Args:
prompt: The text prompt to send to the LLM
Returns:
The generated response as a string
"""
# Create a conversation with system and user message
messages = [
self.system_message,
HumanMessage(content=prompt)
]
# Generate response from LLM
response = self.llm.invoke(messages)
# Extract and return the content
return response.content
async def generate_response_async(self, prompt: str) -> str:
"""
Asynchronously generate a response using the LLM
Args:
prompt: The text prompt to send to the LLM
Returns:
The generated response as a string
"""
# Create a conversation with system and user message
messages = [
self.system_message,
HumanMessage(content=prompt)
]
# Generate response from LLM asynchronously
response = await self.llm.ainvoke(messages)
# Extract and return the content
return response.content
Running Tests with Different Categories
As your test suite grows, you’ll want to organize and run tests by category:
Create a pytest.ini
file:
# filepath: pytest.ini
[pytest]
markers =
unit: unit tests
integration: integration tests
slow: tests that take longer to run
llm: tests that involve language models
You can run specific test categories:
# Run only unit tests
pytest -m unit
# Run all tests except slow ones
pytest -m "not slow"
Test Coverage
To generate coverage reports, make sure you’ve installed the pytest-cov extension:
# Install pytest-cov if not already installed
pip install pytest-cov
# Run tests with coverage reporting
pytest --cov=src
# Generate detailed HTML coverage report
pytest --cov=src --cov-report=html
# View the report
# On Linux
xdg-open htmlcov/index.html
# On macOS
open htmlcov/index.html
# On Windows
start htmlcov/index.html
The coverage report will show you which parts of your code are being tested and which parts need more test coverage. Aim for at least 80% coverage for critical agent components.
Testing State Management
Verify proper state transitions in your agent workflows:
# filepath: tests/test_agents/test_state_management.py
from src.agents.agent_workflow import AgentWorkflow
def test_state_transitions():
graph = AgentWorkflow()
# Initialize state with just the message
# Let the workflow handle initialization of other fields
state = {"message": "Test"}
# Execute workflow
result = graph.invoke(state)
# Verify state changes
assert result["step_count"] > 0
# First check if data exists before asserting its length
assert "data" in result, "Data key missing in result"
assert len(result["data"]) > 0
This test ensures that an agent workflow correctly updates state during execution, validating that step counters increment and data collections grow as expected during processing.
Wrapping Up
Test-driven development provides a solid foundation for building robust LangGraph agents that can handle complex interactions reliably. By following the practices outlined in this guide, you can:
Key Takeaways
- Start with Tests: Writing tests before implementation helps clarify requirements and ensures your agents behave as expected from the beginning.
- Build Incrementally: Add features one at a time with corresponding tests, ensuring each component works correctly before moving to the next.
- Mock External Dependencies: Use pytest’s powerful mocking capabilities to create deterministic tests that don’t rely on external services like OpenAI.
- Monitor Coverage: Regularly check test coverage to identify untested code paths and potential vulnerabilities.
- Structure Tests Logically: Organize tests by functionality and complexity—unit tests for individual components, integration tests for workflows.
Next Steps
To continue improving your LangGraph agent testing:
- Expand Test Cases: Add edge cases and stress tests to ensure your agents handle unexpected inputs gracefully.
- Performance Profiling: Establish benchmarks for agent performance and monitor them over time.
- Use Property-Based Testing: Consider tools like Hypothesis to generate inputs that might find unexpected bugs.
By embracing test-driven development for LangGraph agents, you’re not just building more reliable systems—you’re creating a development workflow that adapts well to the rapidly evolving landscape of AI agents. Tests serve as living documentation of your agent’s capabilities and constraints, making it easier for teams to collaborate and maintain complex agent systems over time.