Charly
·5 min read

Building AI agents that actually work

Most 'agents' are brittle pipelines dressed up with AI. Here's the anatomy of a reliable agent, three patterns that hold up in production, and when to use n8n vs code.

Everyone is building agents. Most of them break on the third run.

The failure mode is always the same: the AI is asked to do too much, in the wrong place. The agent fails not because AI is unreliable — it's because the architecture made reliability impossible.

Here's what I've learned building agents that actually run week after week.

The anatomy of a reliable agent

Every agent that works has three parts, and only one of them should involve AI.

Trigger — something deterministic starts the agent. A schedule, a webhook, a file drop, a form submission. Not "whenever the AI decides it's time."

Context — the agent gathers what it needs before calling the AI. The document, the data, the user preferences, the constraints. The AI call happens with full context, not halfway through discovery.

Action — after the AI produces output, something deterministic happens with it. It gets saved, sent, logged, or published. The action step doesn't retry or improvise — it executes.

Agents break when the AI is responsible for the trigger, or when the action step expects the AI to handle errors it can't handle.

Three patterns that hold up

Pattern 1 — Linear chain

Trigger → fetch context → AI call → format → save/send

The simplest pattern. Each step has one job. The AI call is step 3, not step 1.

My content repurposing agent follows this: a new LinkedIn post (trigger) → fetch the post text (context) → ask Claude to rewrite it as a Twitter thread + newsletter paragraph (AI call) → save drafts to Notion (save).

It has run every weekday for three months without intervention.

Pattern 2 — Conditional routing

Trigger → fetch context → AI call → classify → route to A or B

The AI classifies or makes a decision, then deterministic logic handles each branch. The branches themselves don't use AI — they use the AI's output as a signal.

Example: inbound email → extract intent → if "support", create Linear ticket; if "sales", add to CRM; if "newsletter", forward to Notion. The AI does one thing (classify). The routing is code.

Pattern 3 — Feedback loop

Trigger → fetch context → AI call → verify → (retry or complete)

The agent checks its own output before considering the task done. The verify step is rule-based, not AI-based: does the output have the right format? Does it pass a schema check? Is the required field present?

Retry at most once. If it fails twice, log and alert — don't loop.

Common mistakes

Chaining AI calls without checkpoints. If step 2 produces garbage, step 3 will process garbage and produce more garbage. Add a validation gate between AI calls.

Using AI for extraction when regex works. Extracting an email address, a price, a date — these are pattern matching problems. AI adds latency, cost, and inconsistency for zero benefit.

No idempotency. If the agent runs twice, does it create two records? Send two emails? Make every action idempotent: check before insert, use unique keys, log completions.

Handling errors inside the agent. Agents shouldn't catch their own errors and try to recover. They should fail fast, log clearly, and let a human look. Silent failures are worse than loud ones.

n8n vs code

Use n8n when:

  • The steps are mostly third-party integrations (Slack, Notion, Gmail, Linear)
  • You want to see the flow visually
  • Non-developers need to modify it
  • You're prototyping and want to skip the infrastructure

Use code when:

  • The logic is complex enough that a visual graph becomes unreadable
  • You need custom error handling or retries
  • The agent is critical path and needs to be tested like code
  • You're already running a backend that can host it

I use n8n for content operations (social, newsletter, CRM). I use code for anything that touches the product directly. Either way, the agent needs access to your tools to gather context — connecting Claude to Notion, Gmail, and Drive via MCP is how I wire that up for the code path.

The test

Before shipping an agent, I ask: what happens when the AI returns nothing useful? What happens when the third-party API is down? What happens when it runs twice?

If the answer to any of these is "I don't know" or "it crashes", the agent isn't ready. These edge cases happen in the first week. Design for them upfront.

Reliable agents are boring. They run, do the thing, and you forget they exist. That's the goal.


Not sure whether you're building a prompt, a workflow, or an agent? The distinction matters more than the label.

Building agents inside your own codebase? A good CLAUDE.md is what keeps Claude generating agent code that follows your conventions instead of generic boilerplate.

Get the next post in your inbox

Practical tips for building with AI. One email per post.

Related posts