Command Palette

Search for a command to run...

988

Command Palette

Search for a command to run...

Blog

Building a Multi-Step LangGraph Agent That Handles Orders Autonomously

A D2C brand needed their support bot to actually do things — look up orders, process returns, check shipping status. Here's how I built a LangGraph agent with tool calls and safe-action gating.

Most AI chatbots are retrieval machines — they search a knowledge base and summarise the results. The D2C brand that came to me needed something different. Their customers weren't asking "what's your return policy?" — they were saying "I want to return the blue jacket from order #4521 and get a refund to my original payment method."

That's not a search query. That's a task. It requires looking up the order, verifying the item, checking the return window, creating a return request, and confirming the refund method. Multiple steps, multiple API calls, and a destructive operation (the refund) that needs a safety gate.

I built it with LangGraph. The agent sustained 500+ concurrent sessions on AWS autoscaling workers with no latency degradation.

Why LangGraph, Not a Simple Chain

LangChain's sequential chains work fine for "retrieve → generate" patterns. But multi-step task execution needs:

  • Conditional branching — different tools depending on what the customer asks
  • State management — tracking what's been looked up, what's pending, what's confirmed
  • Human-in-the-loop gates — pausing before destructive operations
  • Retry and fallback — handling API failures gracefully mid-conversation

LangGraph models this as a state graph: nodes are actions, edges are decisions, and state flows through the graph as the conversation progresses.

The Agent's State

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
 
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    order: dict | None
    return_request: dict | None
    pending_action: str | None
    confirmed: bool

The state tracks:

  • messages — the full conversation history
  • order — the currently loaded order (after lookup)
  • return_request — a staged return request awaiting confirmation
  • pending_action — what destructive action is queued ("process_return", "cancel_order")
  • confirmed — whether the customer has explicitly confirmed the pending action

The Tools

Each tool maps to an internal API endpoint. The agent decides which tool to call based on the conversation:

from langchain_core.tools import tool
 
@tool
def lookup_order(order_id: str) -> dict:
    """Look up an order by ID. Returns order details including items, status, and shipping info."""
    response = requests.get(
        f"{API_BASE}/orders/{order_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    if response.status_code == 404:
        return {"error": f"Order {order_id} not found"}
    return response.json()
 
@tool
def check_shipping_status(tracking_number: str) -> dict:
    """Check real-time shipping status for a tracking number."""
    response = requests.get(
        f"{SHIPPING_API}/track/{tracking_number}",
        headers={"Authorization": f"Bearer {SHIPPING_KEY}"},
    )
    return response.json()
 
@tool
def get_return_eligibility(order_id: str, item_id: str) -> dict:
    """Check if an item is eligible for return (within return window, not final sale, etc.)."""
    response = requests.get(
        f"{API_BASE}/orders/{order_id}/items/{item_id}/return-eligibility",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    return response.json()
 
@tool
def create_return_request(order_id: str, item_id: str, reason: str) -> dict:
    """Create a return request for an item. This is a DESTRUCTIVE operation — requires customer confirmation first."""
    response = requests.post(
        f"{API_BASE}/returns",
        json={"order_id": order_id, "item_id": item_id, "reason": reason},
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    return response.json()

The create_return_request tool is the dangerous one. It triggers actual business logic — refund processing, inventory updates, email notifications. This is where safe-action gating matters.

Safe-Action Gating

The agent never calls a destructive tool directly. Instead, it stages the action and asks for confirmation:

def should_gate(tool_name: str) -> bool:
    """Returns True for tools that modify state and need customer confirmation."""
    GATED_TOOLS = {"create_return_request", "cancel_order", "update_shipping_address"}
    return tool_name in GATED_TOOLS
 
def gate_node(state: AgentState) -> AgentState:
    """Intercept destructive tool calls — stage them instead of executing."""
    last_message = state["messages"][-1]
 
    for tool_call in last_message.tool_calls:
        if should_gate(tool_call["name"]):
            return {
                **state,
                "pending_action": tool_call["name"],
                "return_request": tool_call["args"],
                "confirmed": False,
                "messages": [
                    AIMessage(content=format_confirmation_prompt(
                        tool_call["name"],
                        tool_call["args"],
                    ))
                ],
            }
 
    # Non-gated tools execute immediately
    return state

The confirmation prompt is explicit:

I can process a return for:
- Order: #4521
- Item: Blue Denim Jacket (Size M)
- Reason: Doesn't fit
- Refund to: Original payment method (Visa ending 4242)

Would you like me to go ahead? (yes/no)

Only after the customer says "yes" does the agent call the actual API. This pattern prevented every accidental return and cancellation during the first 3 months of operation.

The Graph

from langgraph.graph import StateGraph, END
 
graph = StateGraph(AgentState)
 
# Nodes
graph.add_node("agent", agent_node)           # LLM decides what to do
graph.add_node("tools", tool_node)             # Execute safe tools
graph.add_node("gate", gate_node)              # Intercept destructive tools
graph.add_node("execute_gated", execute_gated) # Execute after confirmation
 
# Edges
graph.add_edge("__start__", "agent")
 
graph.add_conditional_edges("agent", route_agent, {
    "tools": "tools",
    "gate": "gate",
    "end": END,
})
 
graph.add_edge("tools", "agent")       # After tool result, back to agent
graph.add_edge("gate", "agent")        # After staging, back to agent for confirmation
graph.add_conditional_edges("agent", check_confirmation, {
    "confirmed": "execute_gated",
    "denied": "agent",                 # Customer said no, agent responds
})
graph.add_edge("execute_gated", "agent")
 
app = graph.compile()

The flow for a return request:

  1. Customer: "I want to return the blue jacket from order #4521"
  2. agent → decides to call lookup_order
  3. tools → executes lookup_order, returns order details
  4. agent → decides to call get_return_eligibility
  5. tools → executes eligibility check, confirms item is returnable
  6. agent → decides to call create_return_request
  7. gate → intercepts! Stages the return, asks for confirmation
  8. agent → presents confirmation prompt to customer
  9. Customer: "Yes, go ahead"
  10. execute_gated → calls create_return_request for real
  11. agent → "Done! Your return for the Blue Denim Jacket has been created. You'll receive a prepaid shipping label at your email within 24 hours."

Scaling to 500+ Concurrent Sessions

Each customer conversation is a separate graph execution with its own state. The scaling challenge isn't the LLM calls — it's managing hundreds of concurrent stateful sessions.

The setup:

  • FastAPI with async handlers for the chat endpoint
  • Redis for session state persistence (the AgentState dict, serialised as JSON)
  • AWS ECS with autoscaling based on active session count
  • Connection pooling for the internal API calls (the order/shipping/returns APIs)
@app.post("/chat")
async def chat(request: ChatRequest):
    # Load or create session state from Redis
    state = await redis.get(f"session:{request.session_id}")
    if state:
        state = json.loads(state)
    else:
        state = {"messages": [], "order": None, "return_request": None,
                 "pending_action": None, "confirmed": False}
 
    # Add new message
    state["messages"].append(HumanMessage(content=request.message))
 
    # Run the graph
    result = await app.ainvoke(state)
 
    # Persist updated state
    await redis.set(
        f"session:{request.session_id}",
        json.dumps(result, default=str),
        ex=3600,  # 1-hour TTL
    )
 
    # Stream the response
    return StreamingResponse(
        stream_response(result["messages"][-1].content),
        media_type="text/event-stream",
    )

The autoscaling policy: scale up when average active sessions per instance exceeds 50, scale down when it drops below 20. This kept response times under 3 seconds at p95 during peak hours (500+ concurrent sessions).

What I'd Do Differently

Use LangGraph's built-in persistence instead of manual Redis serialisation. LangGraph now has checkpoint savers (PostgreSQL, Redis) that handle state persistence natively. I built the manual version before this feature existed.

Add a supervisor node for complex multi-tool sequences. The current graph lets the agent chain tools freely, which occasionally leads to unnecessary API calls (looking up shipping status for a cancelled order). A supervisor that validates the tool sequence before execution would catch these.

Instrument everything. I added LangSmith tracing after launch, but I should have started with it. Debugging a multi-step agent from logs alone is painful — you need to see the full state graph execution, including which branch was taken and why.


The key insight from this project: the hard part of building an AI agent isn't the LLM or the tools — it's the safety layer. An agent that can look up orders is useful. An agent that can process refunds without confirmation is dangerous. The gate pattern is simple to implement but fundamental to deploy.

LangGraph made this possible by treating the conversation as a graph traversal problem instead of a linear chain. The same pattern applies to any agent that needs to take real-world actions: medical appointment booking, financial transactions, infrastructure changes. If the action is irreversible, gate it.