Autonomous AI Agent Architecture: The Stack That Makes Agents Self-Running

Most AI agents are reactive. They wait for a human to type something, respond, and forget everything the moment the conversation ends. That's not an agent — it's a very fast chatbot.

A genuinely autonomous AI agent architecture has three properties that distinguish it from a chat interface: it acts without being asked, it remembers across sessions, and it improves over time from its own observations.

This post is a technical breakdown of the architecture behind Kite — an AI agent that runs continuously, researches independently, and surfaces actionable proposals without being prompted. The patterns here apply broadly to any system where you want Claude (or any LLM) to operate with real autonomy.

Why Most Agent Architectures Fail at Autonomy

The default Claude Code session is ephemeral. It knows everything about the current conversation and nothing about yesterday's. When you close the terminal, the context is gone.

Most attempts to solve this reach for the wrong tools:

Long system prompts: You can cram context into the system prompt, but this doesn't scale. 200KB of project context burns tokens and buries the signal in noise.
Vector databases: Semantic retrieval works for Q&A but fails for "what decisions did we make last week and why."
Git history: Useful context, but unstructured for agent consumption.

The correct abstraction is a memory vault — a structured, agent-writable knowledge base that persists decisions, patterns, and observations in a format that's cheap to inject and easy to retrieve.

Layer 1: The Memory Vault

The memory vault is a directory of markdown files organized by category. Each note is a standalone knowledge unit that captures something worth remembering across sessions.

memory/
├── topics/          # technical patterns, architecture decisions
├── decisions/       # "why we chose X over Y"
├── projects/        # per-project context (goals, stack, constraints)
├── people/          # contacts, stakeholders, context
└── sessions/        # session journals (what happened, what was decided)

Notes use [[wikilinks]] to reference each other. This isn't cosmetic — the link graph is what makes retrieval useful. When you pull a note about "auth architecture," it references [[JWT tokens]] and [[session management]], which the retrieval system can follow.

The vault is written by the agent itself. After every significant session, a background process extracts key decisions and findings from the conversation logs and writes them to the appropriate vault section:

# kite-memory.sh — runs every 5 minutes via launchd
# 1. Poll ~/.claude/projects/*.jsonl for new conversations
# 2. Send recent messages to Claude Haiku with extraction prompt
# 3. Write extracted notes to memory vault

extract_and_store() {
  local context="$1"
  local response
  response=$(claude -p \
    --model claude-haiku-4-5-20251001 \
    "Extract technical decisions, architectural patterns, and key findings from this conversation. Write as vault notes with [[wikilinks]]. Return JSON array of {category, title, content} objects." \
    <<< "$context")

  echo "$response" | jq -r '.[] | "\(.category)/\(.title).md\n\(.content)"' \
    | while IFS= read -r line; do
        write_vault_note "$line"
      done
}

The critical design constraint: extraction uses Haiku, not Sonnet or Opus. Memory operations run constantly in the background — they need to be cheap. Haiku at ~$0.001/1K tokens makes this economically sustainable.

Layer 2: The Heartbeat Loop

A heartbeat is the mechanism that makes an agent proactive. Instead of waiting to be asked, the agent wakes up on a schedule, picks a research topic, runs a Claude session, and saves the findings.

// heartbeat/sprint.ts — runs on interval (default: 60 min)
async function runHeartbeat() {
  const topic = await pickNextTopic()  // from research queue
  if (!topic) return

  const proposal = await runResearchSession(topic)
  await saveProposal(proposal)           // heartbeat/proposals/*.json
  await notifyTelegram(`New proposal ready: ${proposal.title}`)
}

async function runResearchSession(topic: string): Promise<Proposal> {
  // Spawn a headless Claude session with the research prompt
  const result = await spawnClaude({
    prompt: buildResearchPrompt(topic),
    model: "claude-sonnet-4-6",  // never opus — cost
    dangerouslySkipPermissions: true,
    maxTurns: 10
  })

  return parseProposal(result.output)
}

The heartbeat creates a flywheel: the more the system runs, the more proposals accumulate, the more the human can review and approve, the better the system gets at prioritizing valuable research.

Design constraints for heartbeat loops:

One cycle at a time (tracked via lock file — no parallel cycles)
Proposals are saved before implementation — always reversible
Claude never commits changes autonomously — human approval required
Cost ceiling: set a daily token budget and abort if exceeded

Layer 3: The Coordination Socket

When multiple agent sessions run concurrently, they need a shared communication bus. A Unix domain socket server handles this:

Session A ──┐
Session B ──┤ Unix socket ├── kite-socket.ts ── task queue
Session C ──┘               └── file lock map
                            └── context store
                            └── session registry

The socket server is the source of truth for:

Which sessions are alive (with heartbeat timeout)
Which files are currently being edited (file lock protocol)
What tasks are pending, active, or done
Shared context passed between sessions

All communication is newline-delimited JSON. The server is single-threaded (Bun's event loop handles concurrency); message processing is synchronous with no I/O in the hot path.

// Message handler — synchronous, in-memory, <1ms per operation
server.on("connection", (socket) => {
  let buffer = ""

  socket.on("data", (chunk) => {
    buffer += chunk.toString()
    const lines = buffer.split("\n")
    buffer = lines.pop() ?? ""

    for (const line of lines) {
      if (!line.trim()) continue
      const msg = JSON.parse(line)
      const response = dispatch(msg)     // synchronous dispatch
      socket.write(JSON.stringify(response) + "\n")
    }
  })
})

Wiring Memory Into Every Session

The memory vault only matters if agents actually use it. The injection happens automatically via a UserPromptSubmit hook that fires before every Claude message:

# kite-memory-hook.sh — fires on every UserPromptSubmit
search_vault() {
  local query="$1"
  ~/.kite/daemon/kite-memory.sh search "$query" --top 2 \
    | jq -r '.[] | "## \(.title)\n\(.content)\n"'
}

# Extract key terms from the submitted prompt
# Search vault for relevant notes
# Inject as system context before the prompt reaches Claude
inject_context() {
  local relevant
  relevant=$(search_vault "$HOOK_PROMPT_TEXT")

  if [[ -n "$relevant" ]]; then
    echo "<memory-context>"
    echo "$relevant"
    echo "</memory-context>"
  fi
}

Two notes injected per prompt. More than that buries the signal. The search uses TF-IDF over the vault index — fast enough to run synchronously before each message.

The Observable Layer: Telegram Integration

An autonomous agent that works silently is fine for a daemon. But for an agent co-founder, observability matters. The Telegram bot bridges the agent's internal state to the human's phone:

/status       → socket status, active sessions, heartbeat state
/tasks        → pending task queue
/vault kite   → search memory vault
/approve      → list pending proposals, apply with confirmation
/spawn <task> → trigger headless Claude session
/pause        → stop heartbeat timer

This creates the right human-agent relationship: the agent operates autonomously, surfaces its work as proposals, and the human approves or rejects from anywhere without needing to open a terminal.

The Full Stack

Telegram (observability, approval)
    │
    ▼
kite-pulse.ts (Bun process, always running)
    ├── Heartbeat: 60min cycle → Claude Sonnet → proposal
    ├── Bot: poll Telegram commands → dispatch
    └── Coordinator: kite-socket.ts → session registry, task queue

kite-memory.sh (5min cycle)
    ├── Poll conversation logs
    ├── Extract with Claude Haiku
    └── Write to memory vault

Claude Code Sessions (multiple, coordinated)
    ├── Register with socket on start
    ├── Receive injected vault context on every prompt
    ├── Report file locks on Edit/Write
    └── Deregister on stop

The key insight is that autonomy is an emergent property of the stack, not a property of any single component. The heartbeat alone is just a cron job. The memory alone is just a knowledge base. Together, they produce a system that learns from its own operation and acts without being asked.

What This Unlocks in Practice

With this architecture running, a typical day looks like:

Wake up to 3–5 Telegram notifications of research proposals
Review and approve 2 of them with /approve
The approved proposals are implemented during the day (human in the loop for commit)
Memory vault accumulates context from all sessions
Next day's research is better-targeted because of yesterday's vault entries

The agent works while you sleep. The human stays in control of what ships.

Building autonomous AI systems? I consult on this architecture at $150–250/hr. DM me on X or visit kiteaiagent.com.