Categories ML-AI Python

Using Playwright in LangGraph Agents for Automated Website Login

Introduction

LangGraph provides a powerful framework for building agent-based applications powered by language models. When these applications need to interact with the web, Playwright offers a robust solution for browser automation. In this blog post, we’ll explore how to combine LangGraph and Playwright to create an AI agent that can automatically log into websites.

Why Combine LangGraph Agents with Playwright?

LangGraph agents excel at reasoning and decision-making, while Playwright provides precise control over web browsers. Together, they create a powerful solution for:

  • Automating repetitive login workflows
  • Testing authentication systems
  • Building web automation assistants
  • Creating agents that can access web-based information behind login screens

Project Setup

Let’s walk through building a LangGraph ReAct agent with Playwright integration:

Environment Setup

Create a new directory to work from:

mkdir langgraph-playwright-login
cd langgraph-playwright-login

Our project’s dependencies are defined in pyproject.toml:

[build-system]
requires = ["setuptools>=73.0.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "react_agent"
version = "0.1.0"
description = "ReAct Agent with LangGraph"
readme = "README.md"
requires-python = ">=3.11,<4.0"
license = {text = "MIT"}
authors = [
    { name = "You", email = "your@email.com" },
]
dependencies = [
    "langgraph>=0.2.6",
    "langchain-openai>=0.1.22",
    "langchain-anthropic>=0.1.23",
    "langchain>=0.2.14",
    "langchain-fireworks>=0.1.7",
    "python-dotenv>=1.0.1",
    "langchain-tavily>=0.1",
    "playwright",
]

[project.optional-dependencies]
dev = [
    "mypy>=1.11.1",
    "ruff>=0.6.1",
    "pytest>=8.3.5",
    "black",
    "isort",
]

We’ll use uv for Python environment management:

# Create and activate a virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install the project in development mode
uv pip install -e .

# Install development dependencies
uv pip install -e ".[dev]"

After installation, make sure to install the Playwright browsers:

playwright install

Configuration

Our agent needs access to login credentials. For security, we’ll store these in environment variables. Create a .env file:

# Add required credentials to .env
LOGIN_URL='https://example.com/login'
LOGIN_USERNAME='your_username'
LOGIN_PASSWORD='your_password'

We’ll create a src/react_agent/configuration.py file to handle these settings:

"""Configuration for the ReAct agent."""

import os
from dataclasses import dataclass, field
from typing import ClassVar, Optional

from langchain_core.runnables import RunnableConfig


@dataclass
class Configuration:
    """Configuration for the ReAct agent."""

    # The model to use, in the format "provider/model"
    model: str = "openai/gpt-4o"
    
    # Maximum number of search results to return
    max_search_results: int = 3
    
    # Login URL from environment variable
    login_url: str = field(default_factory=lambda: os.environ.get("LOGIN_URL", ""))
    
    # Login credentials from environment variables
    username: str = field(default_factory=lambda: os.environ.get("LOGIN_USERNAME", ""))
    password: str = field(default_factory=lambda: os.environ.get("LOGIN_PASSWORD", ""))
    
    # System prompt template
    system_prompt: str = """You are a helpful AI assistant. The current time is {system_time}.
    
When users ask questions, try to answer them as helpfully as possible. You have access to tools
that can help you find information or perform actions. Use them when necessary.

If you need to log into a website, you can use the login tool with the URL provided in your configuration.
"""

    # Class variable to store the singleton instance
    _instance: ClassVar[Optional["Configuration"]] = None

    @classmethod
    def from_context(cls, config: Optional[RunnableConfig] = None) -> "Configuration":
        """Get the configuration from the context.
        
        If no configuration is found in the context, return the default configuration.
        """
        if cls._instance is None:
            cls._instance = Configuration()
        return cls._instance

Building the Playwright Login Tool

The heart of our implementation is a Playwright-based login tool that can:

  1. Navigate to login pages
  2. Identify common login form fields
  3. Input credentials
  4. Submit the form
  5. Capture a screenshot for verification

Here’s how we implement this in our src/react_agent/tools.py:

"""This module provides tools for web scraping, search, and login functionality.

It includes a basic Tavily search function and a Playwright-based login tool.

"""

import asyncio
from typing import Any, Callable, Dict, List, Optional, cast

# Import playwright with better error handling
try:
    from playwright.async_api import async_playwright, Page
    PLAYWRIGHT_AVAILABLE = True
except ImportError:
    PLAYWRIGHT_AVAILABLE = False
    async_playwright = None
    Page = Any  # Type hint for when playwright is not available

from react_agent.configuration import Configuration


async def login() -> Dict[str, Any]:
    """Log into a website using Playwright.
    
    Uses the login URL, username, and password from the configuration.
    Returns information about the login attempt.
    """
    if not PLAYWRIGHT_AVAILABLE:
        return {
            "success": False, 
            "error": "Playwright is not installed. Please install it with: pip install playwright && python -m playwright install"
        }
        
    configuration = Configuration.from_context()
    
    if not configuration.login_url:
        return {"success": False, "error": "No login URL provided in environment variables"}
    
    if not configuration.username or not configuration.password:
        return {"success": False, "error": "Missing username or password in environment variables"}
    
    try:
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True)
            page = await browser.new_page()
            
            # Navigate to the login page
            await page.goto(configuration.login_url)
            
            # Wait for the page to load
            await page.wait_for_load_state("networkidle")
            
            # Look for common username/email input fields
            username_selectors = [
                'input[type="email"]', 
                'input[name="email"]',
                'input[name="username"]', 
                'input[id="email"]',
                'input[id="username"]'
            ]
            
            # Look for common password input fields
            password_selectors = [
                'input[type="password"]',
                'input[name="password"]',
                'input[id="password"]'
            ]
            
            # Try to find and fill username field
            username_filled = False
            for selector in username_selectors:
                if await page.query_selector(selector):
                    await page.fill(selector, configuration.username)
                    username_filled = True
                    break
            
            if not username_filled:
                return {"success": False, "error": "Could not find username/email field"}
            
            # Try to find and fill password field
            password_filled = False
            for selector in password_selectors:
                if await page.query_selector(selector):
                    await page.fill(selector, configuration.password)
                    password_filled = True
                    break
            
            if not password_filled:
                return {"success": False, "error": "Could not find password field"}
            
            # Look for common submit buttons
            submit_selectors = [
                'button[type="submit"]',
                'input[type="submit"]',
                'button:has-text("Login")',
                'button:has-text("Sign in")',
                'button:has-text("Log in")'
            ]
            
            # Try to find and click submit button
            submit_clicked = False
            for selector in submit_selectors:
                if await page.query_selector(selector):
                    await page.click(selector)
                    submit_clicked = True
                    break
            
            if not submit_clicked:
                return {"success": False, "error": "Could not find submit button"}
            
            # Wait for navigation after login - use a longer timeout and wait for both load states
            try:
                # Wait for navigation to complete
                await page.wait_for_navigation(timeout=10000)
            except:
                # If no navigation occurs, continue anyway
                pass
            
            # Wait for the page to be fully loaded
            await page.wait_for_load_state("domcontentloaded", timeout=10000)
            await page.wait_for_load_state("networkidle", timeout=10000)
            
            # Add a small delay to ensure the page is fully rendered
            await asyncio.sleep(2)
            
            # Get the page title and URL for better login verification
            page_title = await page.title()
            current_url = page.url
            page_content = await page.content()
            
            # Take a screenshot after login
            await page.screenshot(path="login_result.png")
            
            # Close the browser
            await browser.close()
            
            # More comprehensive login success detection
            login_failed_indicators = [
                "incorrect password",
                "login failed",
                "invalid credentials",
                "wrong username",
                "wrong password",
                "authentication failed",
                "sign in to your account"
            ]
            
            login_success_indicators = [
                "welcome",
                "dashboard",
                "account",
                "profile",
                "logged in",
                "sign out",
                "logout"
            ]
            
            # Check for error messages
            has_error = any(indicator in page_content.lower() for indicator in login_failed_indicators)
            
            # Check for success indicators
            has_success = any(indicator in page_content.lower() for indicator in login_success_indicators)
            
            # Check if URL changed from login page
            url_changed = configuration.login_url.lower() not in current_url.lower()
            
            if has_error and not has_success:
                return {
                    "success": False, 
                    "error": "Login appears to have failed based on page content",
                    "current_url": current_url,
                    "page_title": page_title
                }
            
            return {
                "success": url_changed or has_success,
                "message": "Login attempt completed",
                "current_url": current_url,
                "page_title": page_title,
                "screenshot_path": "login_result.png"
            }
            
    except Exception as e:
        return {"success": False, "error": f"Login failed with error: {str(e)}"}


TOOLS: List[Callable[..., Any]] = [login]

Integrating with the LangGraph ReAct Agent

For our LangGraph agent to use the login tool, we need to define the agent’s state and workflow. First, let’s create a simple state definition in src/react_agent/state.py:

"""Define the state structures for the agent."""

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Sequence

from langchain_core.messages import AnyMessage
from langgraph.graph import add_messages
from langgraph.managed import IsLastStep
from typing_extensions import Annotated


@dataclass
class InputState:
    """Defines the input state for the agent, representing a narrower interface to the outside world."""

    messages: Annotated[Sequence[AnyMessage], add_messages] = field(
        default_factory=list
    )
    """
    Messages tracking the primary execution state of the agent.
    """


@dataclass
class State(InputState):
    """Represents the complete state of the agent, extending InputState with additional attributes."""

    is_last_step: IsLastStep = field(default=False)
    """
    Indicates whether the current step is the last one before the graph raises an error.
    """

We also need a utility module in src/react_agent/utils.py to handle model loading and message processing:

"""Utility & helper functions."""

from langchain.chat_models import init_chat_model
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import BaseMessage


def get_message_text(msg: BaseMessage) -> str:
    """Get the text content of a message."""
    content = msg.content
    if isinstance(content, str):
        return content
    elif isinstance(content, dict):
        return content.get("text", "")
    else:
        txts = [c if isinstance(c, str) else (c.get("text") or "") for c in content]
        return "".join(txts).strip()


def load_chat_model(fully_specified_name: str) -> BaseChatModel:
    """Load a chat model from a fully specified name.

    Args:
        fully_specified_name (str): String in the format 'provider/model'.
    """
    provider, model = fully_specified_name.split("/", maxsplit=1)
    return init_chat_model(model, model_provider=provider)

The utils.py module provides two key functions:

  • get_message_text(): Extracts plain text content from message objects, handling various content formats
  • load_chat_model(): Creates the appropriate language model instance based on the provider/model format

Now, let’s set up the full agent workflow in src/react_agent/graph.py:

"""Define a custom Reasoning and Action agent.

Works with a chat model with tool calling support.
"""

from datetime import UTC, datetime
from typing import Dict, List, Literal, cast

from langchain_core.messages import AIMessage
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode

from react_agent.configuration import Configuration
from react_agent.state import InputState, State
from react_agent.tools import TOOLS
from react_agent.utils import load_chat_model


async def call_model(state: State) -> Dict[str, List[AIMessage]]:
    """Call the LLM powering our "agent"."""
    configuration = Configuration.from_context()

    # Initialize the model with tool binding
    model = load_chat_model(configuration.model).bind_tools(TOOLS)

    # Format the system prompt
    system_message = configuration.system_prompt.format(
        system_time=datetime.now(tz=UTC).isoformat()
    )

    # Get the model's response
    response = cast(
        AIMessage,
        await model.ainvoke(
            [{"role": "system", "content": system_message}, *state.messages]
        ),
    )

    # Handle the case when it's the last step and the model still wants to use a tool
    if state.is_last_step and response.tool_calls:
        return {
            "messages": [
                AIMessage(
                    id=response.id,
                    content="Sorry, I could not find an answer to your question in the specified number of steps.",
                )
            ]
        }

    # Return the model's response as a list to be added to existing messages
    return {"messages": [response]}


# Define a new graph
builder = StateGraph(State, input=InputState, config_schema=Configuration)

# Define the two nodes we will cycle between
builder.add_node(call_model)
builder.add_node("tools", ToolNode(TOOLS))

# Set the entrypoint as `call_model`
builder.add_edge("__start__", "call_model")


def route_model_output(state: State) -> Literal["__end__", "tools"]:
    """Determine the next node based on the model's output."""
    last_message = state.messages[-1]
    if not isinstance(last_message, AIMessage):
        raise ValueError(
            f"Expected AIMessage in output edges, but got {type(last_message).__name__}"
        )
    # If there is no tool call, then we finish
    if not last_message.tool_calls:
        return "__end__"
    # Otherwise we execute the requested actions
    return "tools"


# Add conditional edges to determine the next step after `call_model`
builder.add_conditional_edges(
    "call_model",
    route_model_output,
)

# Add a normal edge from `tools` to `call_model`
builder.add_edge("tools", "call_model")

# Compile the builder into an executable graph
graph = builder.compile(name="ReAct Agent")

Running the Login Example

Let’s create an example script to demonstrate the login functionality in action. We’ll create a file called examples/login_example.py:

"""Example demonstrating the ReAct agent with Playwright for website login.

This example shows how to:
1. Configure the agent with login credentials
2. Run the agent to perform a login operation
3. Handle the agent's response
4. Click the login button and take a screenshot of the resulting page

Requirements:
- Playwright installed: pip install playwright
- Playwright browsers installed: playwright install
- Environment variables set for LOGIN_URL, LOGIN_USERNAME, and LOGIN_PASSWORD
"""

import asyncio
import os
from typing import List, Dict, Any

from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser

from react_agent.configuration import Configuration
from react_agent.graph import graph
from react_agent.state import InputState


async def run_login_example():
    """Run the ReAct agent to perform a website login."""
    # Check if required environment variables are set
    required_vars = ["LOGIN_URL", "LOGIN_USERNAME", "LOGIN_PASSWORD"]
    missing_vars = [var for var in required_vars if not os.environ.get(var)]
    
    if missing_vars:
        print(f"Error: Missing required environment variables: {', '.join(missing_vars)}")
        print("Please set these variables before running the example.")
        print("Example:")
        print("  export LOGIN_URL='https://example.com/login'")
        print("  export LOGIN_USERNAME='your_username'")
        print("  export LOGIN_PASSWORD='your_password'")
        return
    
    # Create configuration with login details
    config = Configuration()
    
    # Print configuration details (excluding password)
    print(f"Login URL: {config.login_url}")
    print(f"Username: {config.username}")
    print(f"Model: {config.model}")
    print(f"Max search results: {config.max_search_results}")
    
    # Create input state with a message asking to log in, click the login button, and take a screenshot
    input_state = InputState(
        messages=[
            HumanMessage(content="""
Please log into the website using my credentials. Follow these steps:
1. Navigate to the login page
2. Enter my username and password
3. Click the login button
4. Take a screenshot of the page after login
5. Tell me if the login was successful based on the page content
""")
        ]
    )
    
    # Run the agent
    print("\nRunning the agent to perform login...")
    result = await graph.ainvoke(input_state, {"configurable": {"model": config.model}})
    
    # Extract and print the conversation
    print("\nAgent conversation:")
    if "messages" in result:
        for i, message in enumerate(result["messages"]):
            role = "User" if message.type == "human" else "Assistant"
            print(f"\n{role}: {message.content}")
            
            # If there are tool calls, print them
            if hasattr(message, "tool_calls") and message.tool_calls:
                for tool_call in message.tool_calls:
                    print(f"  Tool Call: {tool_call['name']}")
                    print(f"  Arguments: {tool_call['args']}")
            
            # If there are tool results, print them
            if hasattr(message, "additional_kwargs") and message.additional_kwargs.get("tool_results"):
                tool_results = message.additional_kwargs["tool_results"]
                print(f"  Tool Results: {tool_results}")
    else:
        print("Result format:")
        for key in result:
            print(f"- {key}")
    
    print("\nLogin process completed.")
    
    # Check if a screenshot was taken and inform the user
    if os.path.exists("login_result.png"):
        print("\nA screenshot of the login result was saved as 'login_result.png'")


if __name__ == "__main__":
    asyncio.run(run_login_example())

To run this example:

# First set the required environment variables
export LOGIN_URL='https://example.com/login'
export LOGIN_USERNAME='your_username'
export LOGIN_PASSWORD='your_password'

# Then run the example
python examples/login_example.py

When executed, the script will:

  1. Verify that all required environment variables are set
  2. Initialize the agent with the configured credentials
  3. Send instructions to log in to the specified website
  4. Execute the login process using Playwright
  5. Display the conversation between the user and the agent
  6. Show the results of the login attempt
  7. Save a screenshot of the final page as login_result.png

Making It Work: The Complete Flow

When a user asks the agent to log into a website, the workflow proceeds as follows:

  1. The agent receives the user query asking to log in
  2. The LLM processes this query and determines it needs to use the login tool
  3. The agent selects the login tool, which:
    • Launches a headless browser with Playwright
    • Navigates to the configured URL
    • Identifies login form elements using selector heuristics
    • Enters the username and password
    • Submits the form
    • Takes a screenshot for verification
    • Analyzes the response to determine success
  4. The login tool returns results to the agent
  5. The agent processes these results and reports back to the user

Challenges and Solutions

Challenge: Identifying Login Form Elements

Login forms vary greatly across websites, making identification challenging.

Solution: We use a list of common selectors for username, password, and submit elements, trying each until we find a match. This approach works for many standard login forms but may need customization for highly unique interfaces.

Challenge: Detecting Login Success

It’s not always obvious when a login succeeds or fails.

Solution: We use multiple signals:

  1. URL changes (redirects away from login page)
  2. Page content analysis (checking for success and error indicators)
  3. Screenshot capture for visual verification

For production systems, you might add more site-specific checks.

Challenge: Handling Navigation Events

Different websites handle login flows differently, with varying redirects and page loads.

Solution: We implement robust waiting strategies:

  1. Wait for navigation events with generous timeouts
  2. Wait for both DOM content load and network idle states
  3. Add a small delay to ensure page rendering is complete
  4. Handle cases where no navigation occurs

Challenge: Security Considerations

Working with login credentials requires careful security practices.

Solution:

  1. Store credentials as environment variables, never in code
  2. Use .env files that are excluded from version control
  3. Consider using a secrets manager for production deployments
  4. Implement secure session handling if maintaining logged-in state

Conclusion

By combining LangGraph’s reasoning capabilities with Playwright’s browser automation, we’ve created an agent that can handle the complex task of website login. This approach opens up possibilities for agents that can interact with web applications requiring authentication.

The solution we’ve built demonstrates several powerful concepts:

  1. Using LLMs to interpret user requests for web interaction
  2. Leveraging Playwright’s browser automation to handle real-world websites
  3. Creating flexible tools that can adapt to various login form designs
  4. Building a complete agent workflow with LangGraph’s state management

This implementation is just the beginning. As you extend this pattern, you could add support for:

  • Two-factor authentication handling
  • Captcha solving
  • Session management
  • Interacting with complex web applications post-login

With LangGraph and Playwright, your agents can become truly capable web citizens, able to access and interact with the vast number of web applications that require authentication.

You May Also Like