Async Agents, Agentic Infrastructure, and Scaling Expert Judgment

AI agents are moving from experimental tools to production infrastructure, and this week's writing explores the architectural patterns, cost implications, and evaluation strategies that come with that shift. Here's what developers shipping agentic systems need to know.

Deep Agents v0.5

LangChain's async subagent architecture solves a critical bottleneck: supervisor agents no longer block while waiting for long-running tasks. Each subagent returns a task ID immediately and runs independently, enabling true parallel execution across heterogeneous infrastructure. The Agent Protocol standardization means you can orchestrate subagents running different models on different hardware without tight coupling, which directly impacts both cost optimization and execution speed.

// Async subagent returns immediately with task ID
const taskId = await subagent.executeAsync({
  task: "analyze_video",
  fileUrl: "s3://bucket/video.mp4"
});
 
// Poll or subscribe to task completion
const result = await subagent.getTaskResult(taskId);

Agentic Infrastructure

Vercel reports that 30% of their deployments now come from agents, representing a 1000% increase in six months. That's not a forecast, it's current production traffic, and it changes infrastructure requirements fundamentally. The three-layer model (agent-friendly deployment APIs, unified tooling for building agents, and self-healing autonomous infrastructure) maps directly to where engineering teams should invest if they're building agent-driven workflows. Preview URLs with programmatic access and deployment APIs are table stakes now.

Human judgment in the agent improvement loop

The best-performing agent teams don't manually review outputs at scale. They translate expert judgment into automated evaluations using LLM-as-judge systems aligned with human feedback, then run those evaluations against every production case. LangSmith's Align Evaluator captures this pattern: record expert assessments on a small set, train the evaluator, then apply it broadly while routing only ambiguous cases back to humans. This approach scales expert knowledge without drowning subject matter experts in review queues.

State of Play: AI Coding Assistants

Böckeler's presentation highlights "context engineering" as the emerging discipline for agentic coding tools. The shift from autocomplete to autonomous agents means developers now curate what information gets loaded into context windows, using progressive loading and specialized subagents rather than dumping entire codebases into prompts. This directly affects token costs and response quality. Proper sandboxing and evaluation frameworks aren't optional extras, they're requirements for agents that modify production code.

Building Hierarchical Agentic RAG Systems

Protocol-H's supervisor-worker architecture with autonomous error recovery achieved 84.5% accuracy versus 62.8% for flat-agent approaches on enterprise benchmarks. The reflective retry mechanism catches SQL syntax errors and schema mismatches before they propagate as hallucinations, reducing hallucination rates by 60%. For teams building RAG systems that span structured and unstructured data, this architecture pattern provides a concrete implementation reference with measured performance improvements.

# Hierarchical agent topology
supervisor:
  role: query_router
  workers:
    - sql_agent: handles structured queries
    - vector_agent: handles document retrieval
  retry_policy: reflective_error_recovery

The common thread: production agent systems require architectural patterns (async execution, hierarchical coordination), infrastructure designed for programmatic access, and evaluation systems that scale expert judgment. These aren't future considerations, they're operational requirements today. 🚀