A D2C brand needed their support bot to actually do things — look up orders, process returns, check shipping status. Here's how I built a LangGraph agent with tool calls and safe-action gating.
Most AI chatbots are retrieval machines — they search a knowledge base and summarise the results. The D2C brand that came to me needed something different. Their customers weren't asking "what's your return policy?" — they were saying "I want to return the blue jacket from order #4521 and get a refund to my original payment method."
That's not a search query. That's a task. It requires looking up the order, verifying the item, checking the return window, creating a return request, and confirming the refund method. Multiple steps, multiple API calls, and a destructive operation (the refund) that needs a safety gate.
I built it with LangGraph. The agent sustained 500+ concurrent sessions on AWS autoscaling workers with no latency degradation.
Why LangGraph, Not a Simple Chain
LangChain's sequential chains work fine for "retrieve → generate" patterns. But multi-step task execution needs:
- Conditional branching — different tools depending on what the customer asks
- State management — tracking what's been looked up, what's pending, what's confirmed
- Human-in-the-loop gates — pausing before destructive operations
- Retry and fallback — handling API failures gracefully mid-conversation
LangGraph models this as a state graph: nodes are actions, edges are decisions, and state flows through the graph as the conversation progresses.
The Agent's State
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
order: dict | None
return_request: dict | None
pending_action: str | None
confirmed: boolThe state tracks:
- messages — the full conversation history
- order — the currently loaded order (after lookup)
- return_request — a staged return request awaiting confirmation
- pending_action — what destructive action is queued ("process_return", "cancel_order")
- confirmed — whether the customer has explicitly confirmed the pending action
The Tools
Each tool maps to an internal API endpoint. The agent decides which tool to call based on the conversation:
from langchain_core.tools import tool
@tool
def lookup_order(order_id: str) -> dict:
"""Look up an order by ID. Returns order details including items, status, and shipping info."""
response = requests.get(
f"{API_BASE}/orders/{order_id}",
headers={"Authorization": f"Bearer {API_KEY}"},
)
if response.status_code == 404:
return {"error": f"Order {order_id} not found"}
return response.json()
@tool
def check_shipping_status(tracking_number: str) -> dict:
"""Check real-time shipping status for a tracking number."""
response = requests.get(
f"{SHIPPING_API}/track/{tracking_number}",
headers={"Authorization": f"Bearer {SHIPPING_KEY}"},
)
return response.json()
@tool
def get_return_eligibility(order_id: str, item_id: str) -> dict:
"""Check if an item is eligible for return (within return window, not final sale, etc.)."""
response = requests.get(
f"{API_BASE}/orders/{order_id}/items/{item_id}/return-eligibility",
headers={"Authorization": f"Bearer {API_KEY}"},
)
return response.json()
@tool
def create_return_request(order_id: str, item_id: str, reason: str) -> dict:
"""Create a return request for an item. This is a DESTRUCTIVE operation — requires customer confirmation first."""
response = requests.post(
f"{API_BASE}/returns",
json={"order_id": order_id, "item_id": item_id, "reason": reason},
headers={"Authorization": f"Bearer {API_KEY}"},
)
return response.json()The create_return_request tool is the dangerous one. It triggers actual business logic — refund processing, inventory updates, email notifications. This is where safe-action gating matters.
Safe-Action Gating
The agent never calls a destructive tool directly. Instead, it stages the action and asks for confirmation:
def should_gate(tool_name: str) -> bool:
"""Returns True for tools that modify state and need customer confirmation."""
GATED_TOOLS = {"create_return_request", "cancel_order", "update_shipping_address"}
return tool_name in GATED_TOOLS
def gate_node(state: AgentState) -> AgentState:
"""Intercept destructive tool calls — stage them instead of executing."""
last_message = state["messages"][-1]
for tool_call in last_message.tool_calls:
if should_gate(tool_call["name"]):
return {
**state,
"pending_action": tool_call["name"],
"return_request": tool_call["args"],
"confirmed": False,
"messages": [
AIMessage(content=format_confirmation_prompt(
tool_call["name"],
tool_call["args"],
))
],
}
# Non-gated tools execute immediately
return stateThe confirmation prompt is explicit:
I can process a return for:
- Order: #4521
- Item: Blue Denim Jacket (Size M)
- Reason: Doesn't fit
- Refund to: Original payment method (Visa ending 4242)
Would you like me to go ahead? (yes/no)
Only after the customer says "yes" does the agent call the actual API. This pattern prevented every accidental return and cancellation during the first 3 months of operation.
The Graph
from langgraph.graph import StateGraph, END
graph = StateGraph(AgentState)
# Nodes
graph.add_node("agent", agent_node) # LLM decides what to do
graph.add_node("tools", tool_node) # Execute safe tools
graph.add_node("gate", gate_node) # Intercept destructive tools
graph.add_node("execute_gated", execute_gated) # Execute after confirmation
# Edges
graph.add_edge("__start__", "agent")
graph.add_conditional_edges("agent", route_agent, {
"tools": "tools",
"gate": "gate",
"end": END,
})
graph.add_edge("tools", "agent") # After tool result, back to agent
graph.add_edge("gate", "agent") # After staging, back to agent for confirmation
graph.add_conditional_edges("agent", check_confirmation, {
"confirmed": "execute_gated",
"denied": "agent", # Customer said no, agent responds
})
graph.add_edge("execute_gated", "agent")
app = graph.compile()The flow for a return request:
- Customer: "I want to return the blue jacket from order #4521"
- agent → decides to call
lookup_order - tools → executes
lookup_order, returns order details - agent → decides to call
get_return_eligibility - tools → executes eligibility check, confirms item is returnable
- agent → decides to call
create_return_request - gate → intercepts! Stages the return, asks for confirmation
- agent → presents confirmation prompt to customer
- Customer: "Yes, go ahead"
- execute_gated → calls
create_return_requestfor real - agent → "Done! Your return for the Blue Denim Jacket has been created. You'll receive a prepaid shipping label at your email within 24 hours."
Scaling to 500+ Concurrent Sessions
Each customer conversation is a separate graph execution with its own state. The scaling challenge isn't the LLM calls — it's managing hundreds of concurrent stateful sessions.
The setup:
- FastAPI with async handlers for the chat endpoint
- Redis for session state persistence (the
AgentStatedict, serialised as JSON) - AWS ECS with autoscaling based on active session count
- Connection pooling for the internal API calls (the order/shipping/returns APIs)
@app.post("/chat")
async def chat(request: ChatRequest):
# Load or create session state from Redis
state = await redis.get(f"session:{request.session_id}")
if state:
state = json.loads(state)
else:
state = {"messages": [], "order": None, "return_request": None,
"pending_action": None, "confirmed": False}
# Add new message
state["messages"].append(HumanMessage(content=request.message))
# Run the graph
result = await app.ainvoke(state)
# Persist updated state
await redis.set(
f"session:{request.session_id}",
json.dumps(result, default=str),
ex=3600, # 1-hour TTL
)
# Stream the response
return StreamingResponse(
stream_response(result["messages"][-1].content),
media_type="text/event-stream",
)The autoscaling policy: scale up when average active sessions per instance exceeds 50, scale down when it drops below 20. This kept response times under 3 seconds at p95 during peak hours (500+ concurrent sessions).
What I'd Do Differently
Use LangGraph's built-in persistence instead of manual Redis serialisation. LangGraph now has checkpoint savers (PostgreSQL, Redis) that handle state persistence natively. I built the manual version before this feature existed.
Add a supervisor node for complex multi-tool sequences. The current graph lets the agent chain tools freely, which occasionally leads to unnecessary API calls (looking up shipping status for a cancelled order). A supervisor that validates the tool sequence before execution would catch these.
Instrument everything. I added LangSmith tracing after launch, but I should have started with it. Debugging a multi-step agent from logs alone is painful — you need to see the full state graph execution, including which branch was taken and why.
The key insight from this project: the hard part of building an AI agent isn't the LLM or the tools — it's the safety layer. An agent that can look up orders is useful. An agent that can process refunds without confirmation is dangerous. The gate pattern is simple to implement but fundamental to deploy.
LangGraph made this possible by treating the conversation as a graph traversal problem instead of a linear chain. The same pattern applies to any agent that needs to take real-world actions: medical appointment booking, financial transactions, infrastructure changes. If the action is irreversible, gate it.