Building AI agents that actually work

Everyone is building agents. Most of them break on the third run.

The failure mode is always the same: the AI is asked to do too much, in the wrong place. The agent fails not because AI is unreliable - it's because the architecture made reliability impossible.

Here's what I've learned building agents that actually run week after week.

The anatomy of a reliable agent

Every agent that works has three parts, and only one of them should involve AI.

Trigger - something deterministic starts the agent. A schedule, a webhook, a file drop, a form submission. Not "whenever the AI decides it's time."

Context - the agent gathers what it needs before calling the AI. The document, the data, the user preferences, the constraints. The AI call happens with full context, not halfway through discovery.

Action - after the AI produces output, something deterministic happens with it. It gets saved, sent, logged, or published. The action step doesn't retry or improvise - it executes.

Agents break when the AI is responsible for the trigger, or when the action step expects the AI to handle errors it can't handle.

Three patterns that hold up

Pattern 1 - Linear chain

Trigger → fetch context → AI call → format → save/send

The simplest pattern. Each step has one job. The AI call is step 3, not step 1.

My content repurposing agent follows this: a new LinkedIn post (trigger) → fetch the post text (context) → ask Claude to rewrite it as a Twitter thread + newsletter paragraph (AI call) → save drafts to Notion (save).

It has run every weekday for three months without intervention.

Pattern 2 - Conditional routing

Trigger → fetch context → AI call → classify → route to A or B

The AI classifies or makes a decision, then deterministic logic handles each branch. The branches themselves don't use AI - they use the AI's output as a signal.

Example: inbound email → extract intent → if "support", create Linear ticket; if "sales", add to CRM; if "newsletter", forward to Notion. The AI does one thing (classify). The routing is code.

Pattern 3 - Feedback loop

Trigger → fetch context → AI call → verify → (retry or complete)

The agent checks its own output before considering the task done. The verify step is rule-based, not AI-based: does the output have the right format? Does it pass a schema check? Is the required field present?

Retry at most once. If it fails twice, log and alert - don't loop.

Common mistakes

Chaining AI calls without checkpoints. If step 2 produces garbage, step 3 will process garbage and produce more garbage. Add a validation gate between AI calls.

Using AI for extraction when regex works. Extracting an email address, a price, a date - these are pattern matching problems. AI adds latency, cost, and inconsistency for zero benefit.

No idempotency. If the agent runs twice, does it create two records? Send two emails? Make every action idempotent: check before insert, use unique keys, log completions.

Handling errors inside the agent. Agents shouldn't catch their own errors and try to recover. They should fail fast, log clearly, and let a human look. Silent failures are worse than loud ones.

n8n vs code

Use n8n when:

The steps are mostly third-party integrations (Slack, Notion, Gmail, Linear)
You want to see the flow visually
Non-developers need to modify it
You're prototyping and want to skip the infrastructure

Use code when:

The logic is complex enough that a visual graph becomes unreadable
You need custom error handling or retries
The agent is critical path and needs to be tested like code
You're already running a backend that can host it

I use n8n for content operations (social, newsletter, CRM). I use code for anything that touches the product directly. Either way, the agent needs access to your tools to gather context - connecting Claude to Notion, Gmail, and Drive via MCP is how I wire that up for the code path.

The test

Before shipping an agent, I ask: what happens when the AI returns nothing useful? What happens when the third-party API is down? What happens when it runs twice?

If the answer to any of these is "I don't know" or "it crashes", the agent isn't ready. These edge cases happen in the first week. Design for them upfront.

Reliable agents are boring. They run, do the thing, and you forget they exist. That's the goal.

Not sure whether you're building a prompt, a workflow, or an agent? The distinction matters more than the label.

Building agents inside your own codebase? A good CLAUDE.md is what keeps Claude generating agent code that follows your conventions instead of generic boilerplate.

Everyone is building agents. Most of them break on the third run.

The failure mode is always the same: the AI is asked to do too much, in the wrong place. The agent fails not because AI is unreliable - it's because the architecture made reliability impossible.

Here's what I've learned building agents that actually run week after week.

The anatomy of a reliable agent

Every agent that works has three parts, and only one of them should involve AI.

Trigger - something deterministic starts the agent. A schedule, a webhook, a file drop, a form submission. Not "whenever the AI decides it's time."

Action - after the AI produces output, something deterministic happens with it. It gets saved, sent, logged, or published. The action step doesn't retry or improvise - it executes.

Agents break when the AI is responsible for the trigger, or when the action step expects the AI to handle errors it can't handle.

Three patterns that hold up

Pattern 1 - Linear chain

Trigger → fetch context → AI call → format → save/send

The simplest pattern. Each step has one job. The AI call is step 3, not step 1.

It has run every weekday for three months without intervention.

Pattern 2 - Conditional routing

Trigger → fetch context → AI call → classify → route to A or B

The AI classifies or makes a decision, then deterministic logic handles each branch. The branches themselves don't use AI - they use the AI's output as a signal.

Example: inbound email → extract intent → if "support", create Linear ticket; if "sales", add to CRM; if "newsletter", forward to Notion. The AI does one thing (classify). The routing is code.

Pattern 3 - Feedback loop

Trigger → fetch context → AI call → verify → (retry or complete)

Retry at most once. If it fails twice, log and alert - don't loop.

Common mistakes

Chaining AI calls without checkpoints. If step 2 produces garbage, step 3 will process garbage and produce more garbage. Add a validation gate between AI calls.

Using AI for extraction when regex works. Extracting an email address, a price, a date - these are pattern matching problems. AI adds latency, cost, and inconsistency for zero benefit.

No idempotency. If the agent runs twice, does it create two records? Send two emails? Make every action idempotent: check before insert, use unique keys, log completions.

Handling errors inside the agent. Agents shouldn't catch their own errors and try to recover. They should fail fast, log clearly, and let a human look. Silent failures are worse than loud ones.

n8n vs code

Use n8n when:

The steps are mostly third-party integrations (Slack, Notion, Gmail, Linear)
You want to see the flow visually
Non-developers need to modify it
You're prototyping and want to skip the infrastructure

Use code when:

The logic is complex enough that a visual graph becomes unreadable
You need custom error handling or retries
The agent is critical path and needs to be tested like code
You're already running a backend that can host it

The test

Before shipping an agent, I ask: what happens when the AI returns nothing useful? What happens when the third-party API is down? What happens when it runs twice?

If the answer to any of these is "I don't know" or "it crashes", the agent isn't ready. These edge cases happen in the first week. Design for them upfront.

Reliable agents are boring. They run, do the thing, and you forget they exist. That's the goal.

Not sure whether you're building a prompt, a workflow, or an agent? The distinction matters more than the label.

Building agents inside your own codebase? A good CLAUDE.md is what keeps Claude generating agent code that follows your conventions instead of generic boilerplate.

Building AI agents that actually work

The anatomy of a reliable agent

Three patterns that hold up

Pattern 1 - Linear chain

Pattern 2 - Conditional routing

Pattern 3 - Feedback loop

Common mistakes

n8n vs code

The test

Related posts

Building AI agents that actually work

The anatomy of a reliable agent

Three patterns that hold up

Pattern 1 - Linear chain

Pattern 2 - Conditional routing

Pattern 3 - Feedback loop

Common mistakes

n8n vs code

The test

Related posts