Feedback Loops, Compliance Traces, and Fail-Small Deployments

Agentic workflows are moving from proof-of-concept to production, and the infrastructure around them is evolving fast. This week's reads cover feedback loops that make agents actually learn, compliance-grade architectures for regulated industries, and the resilience patterns required when AI systems touch critical paths.

Agent observability needs feedback to power learning

Traces alone don't improve agent performance. This post argues that observability platforms must capture feedback signals (user ratings, behavioral data, LLM-as-judge evals, or deterministic failure rules) and store them alongside traces to enable systematic learning loops. The feedback layer is what lets teams tune model weights, refine harness logic, or adjust context retrieval strategies based on real-world outcomes. Without it, you're just collecting logs.

Building a company due diligence agent with Deep Agents, LangSmith and Parallel

This walkthrough shows how to build a multi-agent due diligence system that meets FSI compliance requirements. Five specialized subagents investigate different company dimensions, then dynamically spawn parallel subagents to analyze competitors. The architecture prioritizes auditability: every finding includes citations, confidence scores, and a persistent trace that can be reconstructed months later for regulatory review. If you're building agents for regulated environments, this is the reference implementation.

// Deep Agents pattern: spawn subagents based on runtime data
const competitors = await profileAgent.getCompetitors(companyName);
const competitorAnalyses = await Promise.all(
  competitors.map(comp => 
    spawnSubagent("competitive-analysis", { target: comp })
  )
);

Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

Cloudflare's response to two global outages in late 2025 offers lessons for anyone running production agents. Snapstone enables health-mediated progressive rollouts with automated rollbacks. The Codex uses AI to enforce engineering standards at merge time, blocking dangerous code patterns before they ship. For agentic CI/CD, the takeaway is clear: guardrails should activate at commit time, not after deployment.

# Health-mediated deployment pattern
deploy:
  strategy: progressive
  health_checks:
    - metric: error_rate
      threshold: 0.01
      window: 5m
  rollback: automatic

Granite 4.1 3B SVG Pelican Gallery

IBM's Apache 2.0 licensed Granite 4.1 models (3B, 8B, 30B) are now available in 21 GGUF quantized variants. Simon Willison tested all 21 quantizations of the 3B model on SVG generation and found no correlation between model size and quality for this task. The experiment underscores a broader point: smaller, cheaper models can sometimes match larger ones on specific tasks, which matters when optimizing token costs across thousands of agent runs.

The common thread: production agent systems require feedback loops, compliance-grade traceability, progressive deployment strategies, and cost-aware model selection. Observability platforms that ignore any of these dimensions won't survive contact with real workloads.