Debugging MCP Servers: Runtime Snapshots and Memory Leak Detection

hero

If you've ever stared at an MCP server that's slowly eating RAM or responding with exponentially growing latency, you know the frustration: there's nothing to look at. No visible state, no call stack, just a black box running tool handlers. This post walks through how I added a /debug/snapshot endpoint to an MCP server, built a flamegraph pipeline on top of it, and caught a 16MB-per-session memory leak that would have been invisible with conventional logging.

The Problem: MCP State Is Invisible by Default

MCP servers carry state. Every tool call appends to a handler's context. Every session forks a new handler branch. The runtime knows exactly what's happening internally — but there's no standard interface to ask it.

This becomes a real problem at scale. Running a four-node Mac Mini cluster with Ollama and MCP, I measured average context accumulation at roughly 12 KB per tool call. Past 50 concurrent sessions, response latency stopped scaling linearly and started curving upward exponentially. That inflection point is invisible unless you can observe the heap at runtime — not after the fact, not from logs, but right now while it's happening.

latency growth pattern without and with snapshot visibility

The Node.js ecosystem has heap dumps. V8 has inspector protocol. MCP servers need the equivalent — a structured, queryable view of runtime state that you can hit on demand.

Section 1: Designing the Snapshot Endpoint

The core idea is a single HTTP endpoint that captures a point-in-time dump of everything the MCP server knows about itself: active sessions, the current tool call stack, per-handler queue depths, and heap usage. Keep it simple and always-on.

// Internal state snapshot handler
app.get('/debug/snapshot', (req, res) => {
  const snapshot = {
    timestamp: Date.now(),
    activeSessions: sessionRegistry.size,
    toolCallStack: toolTracer.getStack(),   // tools currently executing
    handlerQueue: queueMonitor.dump(),       // per-handler wait queue depth
    heapUsedMB: process.memoryUsage().heapUsed / 1024 / 1024
  };
  res.json(snapshot);
});

Three supporting objects to wire up:

Object	What it tracks	Implementation hint
`sessionRegistry`	Active session map	A `Map<string, Session>` maintained by your session lifecycle hooks
`toolTracer`	In-flight tool calls	Push on call start, pop on resolve/reject — include call ID + start timestamp
`queueMonitor`	Handler backpressure	Track pending requests per handler type; expose queue length

The endpoint stays open in production. Access control is handled at the network layer — internal IP allowlist only, never exposed to the public interface. No auth overhead in the hot path.

One gotcha: toolTracer.getStack() should be thread-safe if you're using worker threads or async concurrency. I use a simple AsyncLocalStorage-scoped map that gets merged on snapshot request. If you forget this and snapshot during heavy concurrent load, you'll get partial stacks.

Section 2: Turning Snapshots into a Flamegraph

Raw JSON from a single snapshot tells you the current state. What you actually want is the trajectory — how the tool call stack evolves over time. Flamegraphs are the right visualization for this. Here's the pipeline I use:

snapshot collection to flamegraph pipeline

# Collect snapshots every second, extract tool name + duration, append to trace log
watch -n 1 'curl -s http://localhost:3100/debug/snapshot \
  | jq ".toolCallStack[] | .name + \" \" + (.durationMs | tostring)" \
  >> /tmp/mcp_trace.log'

# Once you have enough data, render the flamegraph
perl flamegraph.pl /tmp/mcp_trace.log > mcp_flamegraph.svg

You need Brendan Gregg's flamegraph.pl for the second step. On Mac:

brew install perl
git clone https://github.com/brendangregg/FlameGraph.git
export PATH="$PATH:$(pwd)/FlameGraph"

What this surfaced for me: when Draw Things image generation requests came in through MCP, one particular tool was consistently sitting at 500ms+. The flamegraph made it immediately visible as a wide, flat plateau in the stack — something I'd never have identified by scrubbing through timestamped log lines. Bottleneck identification time dropped by roughly 90% compared to conventional log analysis.

One variation worth knowing: if you're running Docker or a remote node, adjust the curl target:

# Docker: forward the port in your compose file
# docker-compose.yml
services:
  mcp-server:
    ports:
      - "3100:3100"

# Linux: same curl command, no changes needed
# Mac (local dev): curl http://localhost:3100/debug/snapshot
# Remote node: curl http://192.168.1.42:3100/debug/snapshot

Section 3: Snapshot Diffing for Memory Leak Detection

A single snapshot shows you state. Two snapshots diff'd against each other show you movement. This is where memory leaks become catchable.

before and after snapshot diff revealing heap growth

interface Snapshot {
  timestamp: number;
  activeSessions: number;
  toolCallStack: Array<{ id: string; name: string; durationMs: number }>;
  heapUsedMB: number;
}

function diffSnapshot(before: Snapshot, after: Snapshot) {
  return {
    heapDeltaMB: after.heapUsedMB - before.heapUsedMB,
    sessionDelta: after.activeSessions - before.activeSessions,
    newToolCalls: after.toolCallStack.filter(
      c => !before.toolCallStack.find(b => b.id === c.id)
    )
  };
}

The diagnostic logic is straightforward:

const snap1 = await fetch('http://localhost:3100/debug/snapshot').then(r => r.json());
// ... wait for sessions to close and GC to run ...
await new Promise(resolve => setTimeout(resolve, 10_000));
const snap2 = await fetch('http://localhost:3100/debug/snapshot').then(r => r.json());

const diff = diffSnapshot(snap1, snap2);

if (diff.heapDeltaMB > 0 && diff.sessionDelta <= 0) {
  console.warn(`Memory leak suspected: +${diff.heapDeltaMB.toFixed(1)} MB after session close`);
}

If heapDeltaMB stays positive after sessions have closed and GC has had time to run, something is holding a reference it shouldn't be. This is how I found a bug in n8n 2.8.4's MCP node: after 10 minutes of continuous execution, one handler was retaining 16 MB of context per session after the session closed. The diff pointed directly at the handler. The root cause turned out to be a toolTracer.clearSession() call placed after an await that sometimes short-circuitedon error — the clear never ran on the failure path.

buggy vs fixed session cleanup flow

The fix:

// Before (buggy): cleanup skipped on error
async function handleSession(sessionId: string) {
  try {
    await processTools(sessionId);
    toolTracer.clearSession(sessionId); // never reached on throw
  } catch (e) {
    return;
  }
}

// After (fixed): cleanup always runs
async function handleSession(sessionId: string) {
  try {
    await processTools(sessionId);
  } catch (e) {
    logger.error(e);
  } finally {
    toolTracer.clearSession(sessionId); // always executes
  }
}

Variations and Gotchas

Environment differences. On macOS, watch isn't installed by default — brew install watch first. On Linux it's in procps. In Docker, add curl and jq to your base image; they're not always there.

GC timing. When you take the "after" snapshot to check for leaks, wait long enough for V8's GC to run. setTimeout of 10 seconds is usually enough; under heavy load, bump it to 30. If you call global.gc() manually (requires --expose-gc flag), you can tighten this window considerably — useful in CI leak checks.

Stack depth explosion. If your tool handlers are deeply nested or recursive, toolTracer.getStack() can return hundreds of entries. Cap it:

toolCallStack: toolTracer.getStack().slice(0, 50), // cap at 50 entries

Snapshot overhead. The endpoint is read-only and synchronous — overhead is minimal. But if you're running 1-second polling in production at scale, use a 5-second or 10-second interval instead. The flamegraph granularity is still useful and you won't add noise to your latency metrics.

Closing

Running an MCP server without a snapshot endpoint is operating blind. One /debug/snapshot endpoint, a watch loop, and a flamegraph render gives you the tool call timeline. Two snapshots and a diff function gives you memory accountability. Neither requires a profiler, an APM subscription, or any changes to your MCP client.

The next step, if you want to go further: wire the snapshot diff into your CI pipeline as a leak regression check — take a snapshot before and after each integration test run, and fail the build if heapDeltaMB > threshold.

🐦 Faster updates on X: @baegseungh7061
📚 More in this series: Code Advanced
💌 Subscribe: Follow on X or grab the RSS

Seunghyeon's Agentic Lab

이 블로그 검색