Claude Agent Architecture: How to Build Autonomous AI Agents That Don't Go Rogue

The promise of autonomous AI agents is obvious: a Claude instance that wakes up every morning, researches your market, drafts your content, and books your leads — without you lifting a finger. The problem is equally obvious: an agent with access to email, Twitter, and your invoicing system can cause a lot of damage before you notice.

The question isn't whether to give agents real capabilities. It's how to design the control layer that makes those capabilities safe to delegate. Here's the exact architecture I built for Eliane — an autonomous agent platform for solo founders — and why each piece exists.

The Agentic Loop

Eliane's agents run on Claude's tool-use API in a tight loop. The pattern is:

Receive a task description as the initial user message
Call client.messages.create with the system prompt, available tools, and message history
If Claude responds with tool_use, execute the requested tool and feed the result back
Repeat until Claude returns end_turn or calls task_complete

The code is deliberately simple:

for (let i = 0; i < MAX_AGENT_ITERATIONS; i++) {
  const response = await client.messages.create({
    model,
    max_tokens: MAX_TOKENS,
    system: systemPrompt,
    tools,
    messages,
  });

  messages.push({ role: "assistant", content: response.content });

  if (response.stop_reason === "end_turn") break;
  if (response.stop_reason !== "tool_use") break;

  const toolResults = await executeTools(response.content, toolCtx);
  messages.push({ role: "user", content: toolResults });

  if (completed) break;
}

MAX_AGENT_ITERATIONS is 10. This is the first safety layer. An agent that spins forever has no upper bound on cost or side effects. A hard cap means a runaway agent burns at most N API calls before the loop exits. In practice, well-scoped tasks complete in 3-5 iterations. The cap only triggers on confused or looping behavior — which is exactly when you want a ceiling.

The task_complete tool is the clean exit. When Claude is satisfied it has finished the job, it calls task_complete with a summary and a quality score (1-5). The loop exits cleanly, results are recorded, and the task is marked done. This is important: agents don't wander around doing extra work after finishing. They self-terminate.

Tool Scope Is the Real Safety Layer

An autonomous AI agent is only as dangerous as its tools. The design choice with the highest leverage isn't the prompt — it's which tools you give the agent access to, and when.

Eliane's 14 tools split into three categories:

Read-only (safe to run autonomously):

web_search — Brave Search API
search_memory — semantic search over the agent's knowledge base
search_email — Gmail inbox search
search_tweets — X/Twitter search
search_notion — Notion workspace search
list_payments — Stripe payment listing

Write operations (no external side effects):

save_insight — writes a note to the memory store
draft_content — prepares text for review
create_subtask — decomposes work into child tasks
task_complete — marks the current task as done

High-stakes actions (require explicit human approval):

post_tweet — creates a pending approval; does not post
send_email — creates a pending approval; does not send
create_invoice — creates a pending approval; does not charge

Agents can run reads and internal writes autonomously all day. The moment they touch the external world in a way that's irreversible — sending a message someone will receive, posting publicly, charging a customer — they stop and wait.

This isn't a prompt instruction. It's structural. The post_tweet tool implementation does not call the Twitter API. It writes a row to the approvals table and returns { status: "pending_approval", approval_id: "..." } to Claude. No user approval, no tweet. The approval gate is built into the tool itself, not the system prompt.

Human-in-the-Loop Approvals

The approval flow works through Telegram. When an agent creates a pending approval, the system sends the user a message with the full content: "Your content-writer agent wants to post this tweet. /approve abc123 to confirm."

The user reviews and either approves or ignores. Approvals expire after 24 hours. After expiry, the pending action is automatically cancelled — the agent doesn't retry forever.

This creates a natural human review cadence without requiring the user to babysit the system. Agents run autonomously during research and drafting phases — the work that has no downside if it's wrong. The user only gets pulled in at the moment of external action — the work that matters.

A concrete example of the flow for a content-writer agent:

Agent calls web_search("French indie hacker community trends Feb 2026") → gets results
Agent calls search_memory("best-performing tweet formats") → retrieves past insights
Agent calls draft_content("tweet about AI agents") → drafts tweet text
Agent calls post_tweet({ content: "drafted text" }) → creates pending approval, returns pending status
Agent calls task_complete({ summary: "Draft ready for approval", score: 4 }) → loop exits
User receives Telegram message with tweet content for review

The agent did everything it could autonomously. The human made the one decision that matters.

Task Decomposition

Agents can spawn child tasks via the create_subtask tool. A lead-finder agent, for example, might find 10 prospects and create one subtask per prospect for personalized outreach drafting.

When an agent creates subtasks, the parent task enters waiting status. The scheduler runs child tasks independently — potentially in parallel, depending on queue depth — and aggregates results when all children complete. The parent auto-resolves.

This matters for scope management. An autonomous AI agent that runs a complex multi-step workflow as one monolithic task has a context window problem: the message history grows with each tool call. Task decomposition breaks the workflow into bounded units, each running in a fresh context. The total work is the same; the per-agent context is manageable.

Semantic Memory

Every tool call produces output that Claude uses within the current task. But what about across tasks? A lead-finder that ran on Monday found that cold tweets with specific hooks got replies. A content-writer running on Friday should know that.

Eliane's memory layer uses pgvector to store embeddings of every insight an agent saves via save_insight. When an agent calls search_memory("twitter outreach hooks that worked"), the system runs a cosine similarity search against the vector store and returns the top semantically similar entries — regardless of when they were created.

This gives agents contextual memory without forcing all past work into the current context window. The agent retrieves specifically what's relevant, not a raw transcript of everything.

Department Model

Each agent operates within a department: Growth, Sales, or Strategy. The department determines the system prompt. A Growth agent has a system prompt focused on content, distribution, and audience building. A Sales agent is focused on leads, outreach, and pipeline.

Keeping agents department-scoped has two effects. First, prompts stay focused — a Growth agent isn't tempted to opine on financial strategy. Second, tool access can be restricted per department. The Starter plan restricts all agents to Haiku; the Pro plan unlocks Sonnet for higher-stakes work.

The model routing is explicit:

function selectModel(requestedModel: string, plan: PlanId): string {
  if (plan === "starter") return "claude-haiku-4-5-20251001";
  return requestedModel; // pro plan respects the requested model
}

Cost control through model routing, not prompting. A Starter customer can't accidentally burn Opus tokens on a competitor research task.

What Makes It Actually Safe

The honest answer: structure, not trust.

A system prompt that says "don't send emails without permission" is wishful thinking. An approval gate that requires a database write before any email is sent is a guarantee. The architecture earns autonomy through constraints — not by hoping Claude follows instructions, but by making the wrong action structurally impossible.

The control layer isn't an afterthought bolted on for marketing. It's the first design decision. Every tool that touches the external world runs through an approval gate. Every agent has a fixed iteration cap. Every high-stakes action creates a human checkpoint.

That's what makes the system deployable for real founders with real businesses. Not because Claude is always right. Because the structure makes mistakes catchable before they're irreversible.

Building Eliane and Kite in public. Technical posts at kiteaiagent.com/blog.