Claude Code Multi-Agent Systems: Coordinating AI Without the Chaos

Running a single Claude Code session is easy. Running five of them on the same codebase without stepping on each other's feet is where most teams hit a wall.

The promise of multi-agent AI development is compelling: parallel workstreams, faster iteration, one agent researching while another implements. But the naive approach — opening multiple Claude Code windows and hoping for the best — produces merge conflicts, overwritten work, and agents that contradict each other without knowing it.

This is a technical walkthrough of how to build Claude Code multi-agent coordination that actually works. Real code. Real architecture. No magic.

The Core Problem: Agents Are Isolated by Design

Claude Code sessions have zero built-in awareness of each other. Each session lives in its own context window. The filesystem is shared; the awareness is not. Two sessions editing auth.ts simultaneously means one write silently overwrites the other. Two agents picking up the same task from a shared backlog means duplicated effort. One agent refactoring an API while another writes tests against the old signature means a broken build that neither agent sees coming.

This isn't a Claude limitation — it's a fundamental property of any stateless, isolated agent. The fix isn't smarter prompting. It's external coordination infrastructure.

Three failure modes to design against:

File conflicts: Two agents write to the same file concurrently. Last write wins. Work is silently lost.
Context drift: Agent A restructures the data model while Agent B writes features against the old schema.
Task duplication: Two agents both start the same work because nothing owns the assignment.

All three are solvable with one primitive: a central socket server that all sessions talk to.

The Coordination Architecture: Unix Domain Sockets

The coordination layer in Kite is a TypeScript server running on Bun that listens on a Unix domain socket at ~/.kite/kite.sock. Every Claude Code session registers on startup, heartbeats while alive, and deregisters on exit. Claude Code hooks handle this transparently — the developer never types socket commands manually.

Unix domain sockets were the right choice here over TCP for three reasons: lower latency (no network stack), automatic cleanup when the process dies (the socket file is deleted), and the natural single-machine constraint matches the use case (all sessions share one filesystem).

The wire protocol is intentionally simple: newline-delimited JSON. One message per line. No headers, no length prefixes, no framing ceremony.

// Server-side framing — accumulate partial reads across chunks
const buffers = new WeakMap<object, string>();

data(socket, data) {
  const buf = (buffers.get(socket) ?? "") + data.toString();
  const lines = buf.split("\n");
  buffers.set(socket, lines.pop()!);   // trailing partial → buffer

  for (const line of lines) {
    if (!line.trim()) continue;
    try {
      handle(socket, JSON.parse(line));
    } catch {
      reply(socket, { error: "invalid json" });
    }
  }
}

The WeakMap buffer trick handles TCP fragmentation cleanly: if a large message arrives split across two data events, the partial is buffered until the newline delimiter arrives. No message is ever lost or truncated.

Every session is modeled as:

interface Session {
  id: string;
  pid: number;
  project: string;
  cwd: string;
  files: string[];
  status: "active" | "idle";
  registeredAt: number;
  lastHeartbeat: number;
  socket: any;
}

A reaper runs every 60 seconds and evicts sessions that haven't heartbeated in 5 minutes. This handles the case where a terminal is killed without triggering the SessionStop hook.

File Locking: The Hard Part

File locking in a multi-agent system sounds simple — "first one in wins" — but the naive implementation breaks under realistic conditions. Consider: Agent A holds a lock on api.ts. Agent B requests the lock. What happens?

Option 1: Reject immediately. Agent B errors. The user has to manually retry. Option 2: Queue the request. Agent B waits. When A releases, B gets granted automatically.

The queue is the right answer. Here's the full implementation from kite-socket.ts:

interface LockQueueEntry {
  sessionId: string;
  socket: any;
  queuedAt: number;
  ttlTimer: ReturnType<typeof setTimeout>;
}

const lockQueue = new Map<string, LockQueueEntry[]>();
const LOCK_QUEUE_MAX_DEPTH = 3;
const LOCK_QUEUE_TTL = 30_000; // 30s

case "file_lock": {
  const { id, file } = msg;
  const existing = fileLocks.get(file);

  if (existing && existing.sessionId !== id) {
    const queue = lockQueue.get(file) ?? [];

    if (queue.length >= LOCK_QUEUE_MAX_DEPTH) {
      return reply(socket, {
        error: `file locked by ${existing.sessionId.slice(0, 8)}, queue full`,
        locked_by: existing.sessionId,
        locked_ago: Math.round((Date.now() - existing.lockedAt) / 1000),
        queued: false,
      });
    }

    if (queue.some(e => e.sessionId === id)) {
      return reply(socket, { error: "already queued for this file" });
    }

    const ttlTimer = setTimeout(() => {
      // Evict from queue if still waiting after 30s
      const q = lockQueue.get(file);
      if (!q) return;
      const idx = q.findIndex(e => e.sessionId === id && e.socket === socket);
      if (idx === -1) return;
      q.splice(idx, 1);
      if (q.length === 0) lockQueue.delete(file);
      try { reply(socket, { error: "lock queue timeout (30s)", queued: false }); } catch {}
    }, LOCK_QUEUE_TTL);

    queue.push({ sessionId: id, socket, queuedAt: Date.now(), ttlTimer });
    lockQueue.set(file, queue);
    return; // reply is deferred — fires on grant or timeout
  }

  fileLocks.set(file, { sessionId: id ?? "unknown", lockedAt: Date.now() });
  reply(socket, { ok: true, file });
  break;
}

The deferred reply pattern is key: when a request is queued, the server does not respond immediately. The connection stays open. When the lock becomes available, the server grants it directly over the same socket connection. From the agent's perspective: it sent a lock request and eventually got { ok: true }. It has no idea it waited in a queue.

The queue grant function fires on every lock release:

function grantNextWaiter(file: string) {
  const queue = lockQueue.get(file);
  if (!queue || queue.length === 0) { lockQueue.delete(file); return; }
  const next = queue.shift()!;
  if (queue.length === 0) lockQueue.delete(file);
  clearTimeout(next.ttlTimer);
  fileLocks.set(file, { sessionId: next.sessionId, lockedAt: Date.now() });
  try {
    reply(next.socket, { ok: true, file, granted_from_queue: true });
  } catch {}
}

Locks also auto-expire after 10 minutes in the reaper loop — protection against a session that dies mid-edit without sending file_unlock. When a session deregisters cleanly, all its locks are released immediately and the next queue waiter is granted:

case "deregister": {
  if (sessions.delete(msg.id)) {
    for (const [file, lock] of fileLocks) {
      if (lock.sessionId === msg.id) {
        fileLocks.delete(file);
        grantNextWaiter(file);
      }
    }
  }
  break;
}

Task Queue: Preventing Duplicate Work

File locking solves filesystem conflicts. A task queue solves planning conflicts — two agents starting the same work because nothing owns the assignment.

The task schema:

interface Task {
  id: string;
  description: string;
  project: string;
  priority: "low" | "normal" | "high";
  status: "pending" | "active" | "done" | "failed";
  assignee: string | null;
  dependsOn: string[];
  createdAt: number;
  updatedAt: number;
  result: string | null;
}

Task IDs use a collision-resistant format: t-<base36 timestamp>-<base36 counter>. They can be resolved by prefix (first 8 chars) — so MCP tool calls can use short IDs without needing the full string.

The dependency enforcement is in the task_update handler. A task cannot transition to active if any of its dependencies are still pending or active:

case "task_update": {
  if (msg.status === "active" && task.dependsOn.length > 0) {
    const blockers = task.dependsOn.filter((dep) => {
      let depTask: Task | undefined = tasks.get(dep);
      if (!depTask) {
        for (const [k, t] of tasks) {
          if (k.startsWith(dep)) { depTask = t; break; }
        }
      }
      return !depTask || depTask.status !== "done";
    });

    if (blockers.length > 0) {
      return reply(socket, {
        error: `blocked by unfinished dependencies: ${blockers.join(", ")}`
      });
    }
  }
  // ... proceed with status update
}

Tasks persist to disk (socket/tasks.json) on every mutation. When the socket server restarts, tasks are reloaded and the ID counter is restored from the highest existing task. No task is ever lost due to a server restart.

For orchestration, you can query which tasks are ready (all dependencies done):

query target=task_ready
// Returns pending tasks with no unfinished deps — safe to activate

Shared Context: Passing Results Between Agents

After Agent A finishes a task, Agent B needs to know what was built. Git commits are too heavy. Files on disk work but require coordination on paths. The socket server provides a shared key-value store:

// Agent A sets context after completing its work
{ type: "context_set", key: "auth-schema", value: { tables: ["users", "sessions"], jwtExpiry: "15m" } }

// Agent B reads before starting dependent work
{ type: "context_get", key: "auth-schema" }
// → { ok: true, value: { tables: ["users", "sessions"], jwtExpiry: "15m" }, from: "agent-a-id", at: 1709900000000 }

Context is in-memory only and evaporates on server restart — by design. It's for ephemeral coordination data (API contracts, build artifacts paths, shared flags), not permanent storage.

Conflict Detection Before It Happens

The file_touch message fires on every file open/close and runs a conflict scan across all active sessions:

case "file_touch": {
  for (const [otherId, other] of sessions) {
    if (otherId !== msg.id && other.files.includes(file)) {
      const warn = { type: "event", event: "file_conflict", file, sessions: [msg.id, otherId] };
      reply(socket, warn);
      try { other.socket.write(JSON.stringify(warn) + "\n"); } catch {}
    }
  }
}

Both sessions are notified simultaneously. The hook can escalate this to a Telegram message, a terminal bell, or a broadcast that tells all agents to hold off on a specific file.

Putting It Together: The Swarm Protocol

In practice, the orchestration flow for a multi-agent workstream looks like this:

Check status — kite_status confirms the socket is alive and lists active sessions
Create tasks — kite_task_create for each independent chunk of work, with depends_on chains where ordering matters
Spawn agents — kite_spawn with worktree: true for full filesystem isolation on a separate git branch
Don't wait — each spawn is async. Continue with independent work (docs, tests, config) while agents run
Monitor — kite_sessions shows who's active, what files they hold
Broadcast if needed — kite_broadcast with a hold message if a shared dependency changes unexpectedly
Complete — kite_task_complete with result summaries, which unblocks downstream tasks

The critical discipline: never spawn-and-wait. If you spawn three agents and then sit idle watching logs, you planned wrong. The orchestrator should always have parallel work to do while agents run.

Why Unix Sockets Over a Database

The alternative architecture — agents writing state to SQLite or Postgres instead of a socket — has obvious appeal. It's persistent, queryable, and familiar.

But it fails on the core use case: deferred lock grants. A database poll-based approach would require agents to loop, checking the database every N seconds to see if a lock became available. That's latency, jitter, and polling overhead.

With a persistent socket connection, the server pushes the grant the moment the lock releases. No polling. Sub-millisecond grant delivery. And the connection itself acts as a liveness signal — if the socket closes, the server knows the agent is gone and cleans up immediately.

Getting Started

Kite is open source. Install it, configure the Claude Code hooks (one settings.local.json change), and kite up starts the socket server. The MCP tools expose everything — spawn, task management, broadcast, context — from within any Claude Code session.

Multi-agent Claude Code without coordination is noise. With the right infrastructure, it's a genuine force multiplier.

Building multi-agent AI systems? Follow the build — shipping in public.