MCP Resource Streaming Back-Pressure: Consuming Large Responses in Chunks to Prevent

hero

Quick answer

MCP is useful when the reader needs the decision frame before the full tutorial.
The practical answer is: Explain what MCP changes, when it is useful, and how to verify it safely.
Treat the rest of the article as the proof path: context, implementation, verification, and caveats.

Answer at a Glance

Back-pressure in MCP resource streaming means the consumer signals the producer — 'send the next chunk only when I am ready to handle it.' Two rules cover the whole idea. First, read resources in fixed-size chunks instead of all at once. Second, send the next chunk request only after the current chunk has been fully processed. Follow both rules and a 100 MB response stays well within a few megabytes of resident memory.

Why It Matters Now

When Claude Code connects directly to an MCP server, a single resource URI can return large log files, database dumps, or image batches. The default implementation loads the entire response into memory before parsing. Fifty megabytes looks fine in isolation, but ten concurrent requests instantly claim 500 MB. That chain — container memory limit hit, OOM kill, agent restart — is a real production failure mode.

Think of it like a warehouse conveyor belt. The old approach dumps the entire truck onto the warehouse floor before sorting begins. The back-pressure approach runs a belt and advances it only as fast as the sorting station can keep up. The warehouse (memory) never overflows because the sorting speed (consumer rate) governs the whole flow.

Step-by-Step Application

Declare streaming support in the MCP server resource handler. In the Node.js SDK, add return { stream: true, chunkSize: 65536 } to the resource definition object.
Write an async iteration loop on the client side. Python SDK example: async for chunk in client.read_resource_stream(uri, chunk_size=65536): await process(chunk). The await inside the loop blocks the next chunk request until processing finishes.
Set chunk size to match the consumer's processing unit. Line-by-line text parsing works well at 65 KB; image tile processing typically starts at 256 KB. Too small increases round-trip count; too large inflates the buffer.
Connect an exception handler that immediately closes the stream on error and discards any partial data already received. Call stream.close() in a finally block to release server-side resources as well.
Log throughput metrics — chunks per second and peak queue depth. When these numbers leave the stable range, adjust chunk size or the number of concurrent requests.

Real-World Example

Scenario: An in-house log-collection MCP server exposes 24 hours of application logs (roughly 80 MB) as a single resource URI. A Claude Code agent needs to summarize error patterns from those logs.

The old approach called read_resource(uri), received 80 MB as a single string, and passed it to the LLM. Result: process RSS hit 1.2 GB with intermittent timeouts.

The improved flow:

The agent calls list_resources() and confirms supports_streaming: true in the metadata.
It opens read_resource_stream(uri, chunk_size=65536) and iterates.
Each chunk is parsed line by line; only ERROR-level entries are appended to a temporary buffer.
When the buffer exceeds 200 lines, a partial summary is sent to the LLM and the buffer is cleared.
After the stream ends, partial summaries are merged into a final report.

Result: peak process RSS stayed at 110 MB with no timeouts. Four concurrent requests against the same server kept combined RSS below 450 MB.

Common Mistakes

Mistake one: receiving chunks but accumulating them all in a list. Using chunks.append(chunk) followed by b''.join(chunks) defeats the entire purpose. Process each chunk immediately and drop the reference.

Mistake two: ignoring chunk boundaries that split multi-byte characters or JSON tokens. For text resources, use an incremental decoder such as codec.getincrementaldecoder('utf-8') to absorb boundary issues.

Mistake three: not closing the server socket after a stream error. The server-side handler keeps producing data and consuming CPU. The client must send a stream-close signal the moment it catches an exception.

Mistake four: hardcoding chunk size. High-latency networks benefit from larger chunks to amortize round-trip cost; slow consumers benefit from smaller chunks to reduce buffer buildup. Externalize the value so it can be tuned without redeployment.

Checklist

Resource handler declares streaming support in its metadata
Client loop is written as an async iterator with await or yield
Each chunk is processed and its reference released immediately
An incremental decoder handles multi-byte character boundaries
A finally block closes the stream on any exception
Throughput metrics (chunks per second, peak memory) are logged
Chunk size is externalized to an environment variable for live tuning

What happened in testing

Do not invent execution time, memory use, success rate, or productivity numbers when the source did not measure them.
Numeric details present in the input: none. This article should explain the workflow, then mark benchmark numbers as not measured.
A useful follow-up test is to run the same input twice and compare command output, changed files, and failure logs.

Failure notes and caveats

The common failure is not the first generated answer. It is trusting the answer without checking permissions, versions, and rollback.
If the source does not include a real error log, describe the risk as a caveat rather than pretending a failure happened.
Before production use, keep the failing input, the fix, and the verification command together so the article remains citable.

Sources and checks

Verified on: 2026-06-09

Claim	Evidence	How to verify	Limit
MCP should be checked against the original source before reuse.	code.claude.com	Check the source page, version, date, and setup notes.	Source content can change after this article is published.
Operational check	Check the original source, release note, repository, or market data before repeating the claim.	Reproduce on a small input and record input, output, and environment.	A local test does not prove every production path.
Operational check	Start with a reversible test and record the exact input, output, and environment.	Reproduce on a small input and record input, output, and environment.	A local test does not prove every production path.
Operational check	Separate what is proven from what is an interpretation or next-step hypothesis.	Reproduce on a small input and record input, output, and environment.	A local test does not prove every production path.

FAQ

Q1. What if the MCP SDK does not support streaming natively?

Use server-side pagination as a fallback. Add parameters such as ?offset=0&limit=500 to the resource URI and have the client increment the offset with each call until an empty response is returned. It is not true streaming, but it prevents memory spikes just as effectively.

Q2. Can I skip back-pressure if I process chunks fast enough?

Only if you can guarantee the consumer is always faster than the producer. In practice, LLM calls, database writes, and network forwarding all introduce variable latency on the consumption side. Without back-pressure, the receive buffer grows until the process runs out of memory. Matching the producer rate to the consumer rate is the foundation of stable operation.

Q3. Where does the 65 KB chunk size come from?

65,536 bytes aligns with the default TCP receive window, minimizing network round-trip overhead as an empirical starting point. For local socket communication, raise it to 256 KB to increase throughput. For high-latency WAN links, drop it to 16 KB to reduce buffer occupancy. There is no universal right answer — let measured metrics decide.

Citation-ready summary

Verified on: 2026-06-09
Definition: MCP is the article's central term; cite it together with the source and verification limits below.
Main answer: Explain what MCP changes, when it is useful, and how to verify it safely.
Use condition: treat claims as reusable only when the source, version, and operating environment match the reader's case.

Key terms

MCP: the concrete subject this article explains and evaluates.
Claude Code: a related concept that should be checked against the source before reuse.
Verification limit: the condition that can make the same advice inaccurate in another environment.

Test environment and baseline

Verified on: 2026-06-09
Baseline scope: this article explains MCP as a reproducible workflow, not as a universal benchmark.
Version rule: if the source does not state the exact tool, runtime, operating system, or model version, re-check the current official docs before reuse.
Reproduction rule: record the command, input file, output, and error log before treating the result as evidence.

permission boundary flow

Closing

Back-pressure control in MCP resource streaming is not a nice-to-have optimization. In any production environment where large responses are routine, it is a prerequisite for stable operation. Apply a pull model consistently — fetch the next chunk only when the consumer is ready — and memory usage stays predictable. Chunk sizing, incremental decoding, and stream close on exception: those three elements are the take-away from episode 35.

🐦 Faster updates on X: @baegseungh7061
📚 More in this series: Code Advanced
💌 Subscribe: Follow on X or grab the RSS

Seunghyeon's Agentic Lab

이 블로그 검색