← All posts

MCP Server Orchestration: Building Claude Tool Networks That Scale

A deep dive into Model Context Protocol server architecture — how to design, compose, and orchestrate MCP tools for production Claude agent systems.

MCP Server Orchestration: Building Claude Tool Networks That Scale

Model Context Protocol (MCP) is the API layer that makes Claude useful beyond chat. It's how you give Claude tools — the ability to read files, query databases, call external APIs, spawn processes, or coordinate with other agents. Understanding how to design and orchestrate MCP servers is the difference between a toy agent and a production system.

This post covers MCP server architecture from first principles, with real TypeScript code from production systems.

What MCP Actually Is

MCP is a standardized protocol for connecting LLMs to tools and context sources. When Claude calls a tool, it sends a structured request to an MCP server, which executes the operation and returns a result. The LLM never executes code directly — the MCP server is the execution boundary.

The protocol supports three primitives:

  • Tools: Callable functions (create_task, search_vault, deploy_branch)
  • Resources: Readable context (resource://tasks/pending, resource://sessions/active)
  • Prompts: Reusable prompt templates with parameter injection

For most orchestration use cases, you'll primarily use tools. Resources become important when you want Claude to subscribe to live state changes (e.g., a task queue that updates in real-time).

Server Transport: stdio vs HTTP

MCP servers run over two transport types:

stdio (recommended for local tools):

# Claude Code invokes the server as a child process
# Communication via stdin/stdout, newline-delimited JSON-RPC
$ bun run mcp/kite-mcp.ts

HTTP (for remote or shared tools):

# Persistent server, Claude connects via HTTP/SSE
$ bun run serve.ts --port 3001

For Claude Code integration, stdio is almost always the right choice. It's simpler, has no network dependencies, and respects the same permissions as the host process. The server starts on demand, handles one session's requests, and exits cleanly.

// .claude/settings.json
{
  "mcpServers": {
    "kite": {
      "command": "bun",
      "args": ["run", "/Users/you/.kite/mcp/kite-mcp.ts"],
      "type": "stdio"
    }
  }
}

Building a Production MCP Server

The minimal viable MCP server structure in TypeScript with the official SDK:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"
import { z } from "zod"

const server = new McpServer({
  name: "kite",
  version: "1.0.0"
})

// Define a tool with Zod schema validation
server.tool(
  "kite_task_create",
  "Create a task in the Kite queue",
  {
    title: z.string().describe("Task description"),
    priority: z.enum(["low", "normal", "high"]).default("normal"),
    project: z.string().describe("Project path"),
    depends_on: z.array(z.string()).optional().describe("Task IDs that must complete first")
  },
  async ({ title, priority, project, depends_on }) => {
    const task = await createTask({ title, priority, project, depends_on })
    return {
      content: [{
        type: "text",
        text: `Task created: ${task.id.slice(0, 8)} — ${task.title}`
      }]
    }
  }
)

// Connect via stdio
const transport = new StdioServerTransport()
await server.connect(transport)

Three things to get right in every tool definition:

  1. Description matters more than the name. Claude reads the description to decide when to call a tool. "Create a task in the Kite queue" is unambiguous. "task_create" is not.
  2. Use Zod for validation. Input validation at the MCP layer catches LLM hallucinations (Claude inventing parameter values) before they reach your execution code.
  3. Return structured, actionable text. Claude uses the tool result to decide its next action. Return enough context to make the next decision obvious.

Tool Design Patterns

Pattern 1: Narrow tools, wide composition

Prefer many small tools over a few large ones. Claude is better at composing simple tools than inferring complex parameters for a single tool.

// Good: separate tools for each operation
server.tool("kite_sessions", "List active Claude Code sessions", {}, handler)
server.tool("kite_task_list", "List tasks by status", { status: z.enum([...]) }, handler)
server.tool("kite_spawn", "Spawn a new Claude session", { task: z.string(), ... }, handler)

// Avoid: one "do everything" tool
server.tool("kite_action", "Perform a Kite action", { action: z.string(), params: z.any() }, handler)

Pattern 2: Idempotent writes

Tools that create or modify state should be idempotent where possible. Claude may retry on timeout or ambiguity.

server.tool("kite_file_lock", "Acquire exclusive lock on a file", {
  file: z.string(),
  session_id: z.string()
}, async ({ file, session_id }) => {
  // Acquiring the same lock twice (same session) is a no-op, not an error
  const existing = locks.get(file)
  if (existing?.id === session_id) {
    return { content: [{ type: "text", text: `Lock already held: ${file}` }] }
  }
  // ...
})

Pattern 3: Error messages that guide recovery

Tool errors are read by Claude to decide what to do next. Vague errors lead to retry loops. Actionable errors lead to correct recovery behavior.

// Vague — Claude will retry blindly
throw new Error("Lock failed")

// Actionable — Claude knows to wait or choose a different file
throw new Error(
  `File locked by session ${holderId.slice(0, 8)} since ${lockedAt}. ` +
  `Use kite_sessions to check if that session is still active.`
)

Cross-Server Orchestration

The most powerful pattern: MCP servers that coordinate with each other through a shared state layer.

Kite's architecture has three MCP-adjacent layers that all talk to the same Unix socket server:

Claude Code session
    │
    ├── kite-mcp.ts (MCP server via stdio)
    │     └── tools: spawn, dispatch, task_create, review
    │
    └── hooks (shell scripts)
          ├── kite-socket-hook.sh  (register, heartbeat, file_touch)
          └── kite-memory-hook.sh  (vault injection)

All communicate via: ~/.kite/kite.sock (Unix domain socket)

The MCP server and the hook scripts don't talk to each other directly — they both talk to the socket server, which maintains the authoritative state. This prevents race conditions and makes the system debuggable: at any point, you can inspect the socket server's state to understand what's happening.

// kite-client.ts — shared client used by both MCP tools and hooks
export async function sendToSocket(message: object): Promise<object> {
  return new Promise((resolve, reject) => {
    const socket = net.createConnection(SOCKET_PATH)
    let response = ""

    socket.on("data", (d) => response += d.toString())
    socket.on("end", () => resolve(JSON.parse(response)))
    socket.on("error", reject)

    socket.write(JSON.stringify(message) + "\n")
    // Don't close immediately — wait for response
  })
}

MCP Sampling: Tools That Call Claude

MCP 1.0 added sampling — the ability for MCP servers to make LLM calls themselves. This enables tools that do non-trivial reasoning, not just data retrieval.

The kite_review tool uses sampling to spawn an independent reviewer:

server.tool(
  "kite_review",
  "Have an independent Claude instance review recent changes",
  {
    task_id: z.string(),
    focus: z.string().describe("What aspect to review")
  },
  async ({ task_id, focus }, { server: mcpServer }) => {
    // Use MCP sampling instead of spawning a subprocess
    const review = await mcpServer.createMessage({
      messages: [{
        role: "user",
        content: {
          type: "text",
          text: buildReviewPrompt(task_id, focus)
        }
      }],
      maxTokens: 2000,
      modelPreferences: {
        hints: [{ name: "claude-sonnet-4-6" }]
      }
    })

    return {
      content: [{
        type: "text",
        text: review.content.type === "text" ? review.content.text : "Review failed"
      }]
    }
  }
)

Sampling keeps the review in-process (no subprocess overhead) and lets the host application control which model handles it — production systems can route review calls to a cheaper model without changing the tool code.

Resources: Live State Subscriptions

For agent coordination, resources let Claude observe state that changes over time:

// Expose the task queue as a subscribable resource
server.resource(
  "kite://tasks",
  "Active Kite task queue",
  async (uri) => {
    const tasks = await querySocket({ type: "task_list", status: "all" })
    return {
      contents: [{
        uri: uri.toString(),
        mimeType: "application/json",
        text: JSON.stringify(tasks, null, 2)
      }]
    }
  }
)

// Expose active sessions
server.resource(
  "kite://sessions",
  "Active Claude Code sessions",
  async (uri) => {
    const sessions = await querySocket({ type: "query", target: "sessions" })
    return {
      contents: [{
        uri: uri.toString(),
        mimeType: "application/json",
        text: JSON.stringify(sessions, null, 2)
      }]
    }
  }
)

Claude can now read kite://tasks to understand the current workload before deciding whether to create new tasks or pick up existing ones. This is meaningfully different from a tool call — it's ambient context rather than an explicit operation.

Production Considerations

Logging: Log every tool call with inputs and outputs. MCP servers run as black boxes from Claude's perspective — without logging, debugging failures is nearly impossible.

const originalTool = server.tool.bind(server)
server.tool = (name, description, schema, handler) => {
  return originalTool(name, description, schema, async (args, context) => {
    console.error(`[${new Date().toISOString()}] tool:${name}`, JSON.stringify(args))
    const result = await handler(args, context)
    console.error(`[${new Date().toISOString()}] result:${name}`, JSON.stringify(result))
    return result
  })
}

Note: use console.error not console.log in stdio servers — stdout is reserved for the MCP protocol.

Graceful shutdown: Register a SIGINT handler to clean up any open connections before exit. Abruptly killed servers leave orphaned locks and stale session registrations.

Schema versions: Include a _version field in your tool schemas. When you change a tool's interface, bump the version so you can detect which sessions are running old code.


The MCP server is the substrate of every capable Claude agent. Design the tools well, return actionable errors, and use the coordination patterns above — the agent behavior that emerges is qualitatively different from anything achievable through prompt engineering alone.

Need help building production MCP servers? I do this work professionally at kiteaiagent.comreach out on X.