Building AI agents that actually work
Most 'agents' are brittle pipelines dressed up with AI. Here's the anatomy of a reliable agent, three patterns that hold up in production, and when to use n8n vs code.
Everyone is building agents. Most of them break on the third run.
The failure mode is always the same: the AI is asked to do too much, in the wrong place. The agent fails not because AI is unreliable — it's because the architecture made reliability impossible.
Here's what I've learned building agents that actually run week after week.
The anatomy of a reliable agent
Every agent that works has three parts, and only one of them should involve AI.
Trigger — something deterministic starts the agent. A schedule, a webhook, a file drop, a form submission. Not "whenever the AI decides it's time."
Context — the agent gathers what it needs before calling the AI. The document, the data, the user preferences, the constraints. The AI call happens with full context, not halfway through discovery.
Action — after the AI produces output, something deterministic happens with it. It gets saved, sent, logged, or published. The action step doesn't retry or improvise — it executes.
Agents break when the AI is responsible for the trigger, or when the action step expects the AI to handle errors it can't handle.
Three patterns that hold up
Pattern 1 — Linear chain
Trigger → fetch context → AI call → format → save/send
The simplest pattern. Each step has one job. The AI call is step 3, not step 1.
My content repurposing agent follows this: a new LinkedIn post (trigger) → fetch the post text (context) → ask Claude to rewrite it as a Twitter thread + newsletter paragraph (AI call) → save drafts to Notion (save).
It has run every weekday for three months without intervention.
Pattern 2 — Conditional routing
Trigger → fetch context → AI call → classify → route to A or B
The AI classifies or makes a decision, then deterministic logic handles each branch. The branches themselves don't use AI — they use the AI's output as a signal.
Example: inbound email → extract intent → if "support", create Linear ticket; if "sales", add to CRM; if "newsletter", forward to Notion. The AI does one thing (classify). The routing is code.
Pattern 3 — Feedback loop
Trigger → fetch context → AI call → verify → (retry or complete)
The agent checks its own output before considering the task done. The verify step is rule-based, not AI-based: does the output have the right format? Does it pass a schema check? Is the required field present?
Retry at most once. If it fails twice, log and alert — don't loop.
Common mistakes
Chaining AI calls without checkpoints. If step 2 produces garbage, step 3 will process garbage and produce more garbage. Add a validation gate between AI calls.
Using AI for extraction when regex works. Extracting an email address, a price, a date — these are pattern matching problems. AI adds latency, cost, and inconsistency for zero benefit.
No idempotency. If the agent runs twice, does it create two records? Send two emails? Make every action idempotent: check before insert, use unique keys, log completions.
Handling errors inside the agent. Agents shouldn't catch their own errors and try to recover. They should fail fast, log clearly, and let a human look. Silent failures are worse than loud ones.
n8n vs code
Use n8n when:
- The steps are mostly third-party integrations (Slack, Notion, Gmail, Linear)
- You want to see the flow visually
- Non-developers need to modify it
- You're prototyping and want to skip the infrastructure
Use code when:
- The logic is complex enough that a visual graph becomes unreadable
- You need custom error handling or retries
- The agent is critical path and needs to be tested like code
- You're already running a backend that can host it
I use n8n for content operations (social, newsletter, CRM). I use code for anything that touches the product directly. Either way, the agent needs access to your tools to gather context — connecting Claude to Notion, Gmail, and Drive via MCP is how I wire that up for the code path.
The test
Before shipping an agent, I ask: what happens when the AI returns nothing useful? What happens when the third-party API is down? What happens when it runs twice?
If the answer to any of these is "I don't know" or "it crashes", the agent isn't ready. These edge cases happen in the first week. Design for them upfront.
Reliable agents are boring. They run, do the thing, and you forget they exist. That's the goal.
Not sure whether you're building a prompt, a workflow, or an agent? The distinction matters more than the label.
Building agents inside your own codebase? A good CLAUDE.md is what keeps Claude generating agent code that follows your conventions instead of generic boilerplate.
Get the next post in your inbox
Practical tips for building with AI. One email per post.
Related posts
- Claude Code Skills — What They Are and How to Build Your OwnClaude Code Skills are reusable, slash-invoked workflows that turn repetitive tasks into one-command actions. Full guide: what skills are, how to write them, and the patterns that ship.
- Claude Code Plan Mode — Shortcut, How to Activate, and When to Use ItClaude Code Plan Mode lets you review and steer a task plan before any file is touched. Full guide: shortcut (Shift+Tab), activation, when to use it, and how to execute the plan.
- Vibe coding in 2026 — my full setup with Claude CodeVibe coding isn't sloppy coding. It's a flow-state workflow where you direct and Claude executes. Here's my full setup: Claude Code, hooks, CLAUDE.md, and Plan Mode.