Claude Agents Running Real Code on Cloudflare's Infrastructure

hero

Claude Agents Now Execute Code Inside Cloudflare's Sandbox

If you're running multi-step Claude agents in production, the biggest blocker usually isn't model capability — it's trust. Trusting that the agent won't escape its sandbox, won't touch the wrong system, won't silently fail three steps in. Cloudflare just changed the calculus here by integrating Anthropic's Claude Managed Agents directly into its infrastructure, giving agents a real, isolated execution environment rather than just a place to generate code.

This matters if you're already using Claude Code or the Anthropic API to orchestrate workflows. The gap between "agent writes a script" and "agent runs the script and returns a result" used to require you to build and secure that bridge yourself. Now it's part of the stack.

1. Why this matters now

For the last two years, the practical ceiling on agent deployments has been execution risk, not reasoning quality. You could get a model to plan a deployment pipeline, write a migration script, or design an ETL workflow — but actually letting it run those artifacts in a live environment meant you had to answer a hard question first: what happens if it gets it wrong?

The typical answer involved a lot of defensive scaffolding. You'd wrap the agent in a sandboxed VM, restrict network egress, add a human approval gate before any destructive operation, and still feel nervous about it. That scaffolding is expensive to build and fragile to maintain, especially when your agent needs to iterate — run something, read the result, adjust, run again.

The Cloudflare integration directly targets this gap. Cloudflare's Workers environment already enforces strict execution isolation by design. An agent running inside that environment inherits those constraints without any extra work on your part. The boundary isn't a policy you configure; it's an infrastructure property you get for free.

2. The core idea

The main conclusion is simple: Claude agents can now write and run code inside Cloudflare's isolated edge environment, with the security boundary enforced at the infrastructure level rather than the application level.

Think of it this way:

Previous model New model
Agent generates code → you run it somewhere Agent generates code → agent runs it inside the sandbox
You build and maintain the execution boundary Cloudflare enforces the boundary by default
Iteration requires external orchestration Agent reads results and iterates within the same session
Latency depends on your infra Edge execution keeps the round-trip short

The analogy that clicked for me: before, Claude was an architect handing you blueprints. Now it walks onto the job site and drives the nails itself — but Cloudflare's sandbox is the fenced construction zone it can't leave.

This integrates with existing Claude Code workflows and the Anthropic API. You're not replacing your agent logic; you're adding an execution layer on top of it. If you already have a multi-step agent that produces bash scripts or Python snippets, plugging in Cloudflare Environments is a targeted upgrade to where those artifacts land.

3. How to implement it

The integration entry point is through Cloudflare's Workers AI and the Managed Agents API. Here's the minimal path to get a Claude agent executing code inside a Cloudflare Worker.

First, set up your Cloudflare Worker project:

npm create cloudflare@latest my-claude-agent -- --type worker
cd my-claude-agent
npm install

Configure your wrangler.toml to enable the AI binding:

name = "my-claude-agent"
main = "src/index.ts"
compatibility_date = "2024-11-01"

[ai]
binding = "AI"

[vars]
ANTHROPIC_API_KEY = ""  # set via wrangler secret

Set your API key as a secret (never in the config file):

wrangler secret put ANTHROPIC_API_KEY

Here's what a minimal agent handler looks like inside the Worker, where the agent receives a task, executes code, and returns results:

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { task } = await request.json();

    // Call Claude with tool use enabled for code execution
    const response = await fetch("https://api.anthropic.com/v1/messages", {
      method: "POST",
      headers: {
        "x-api-key": env.ANTHROPIC_API_KEY,
        "anthropic-version": "2023-06-01",
        "anthropic-beta": "computer-use-2024-10-22",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: "claude-opus-4-5",
        max_tokens: 4096,
        tools: [
          {
            type: "computer_20241022",
            name: "computer",
            display_width_px: 1280,
            display_height_px: 800,
          },
        ],
        messages: [{ role: "user", content: task }],
      }),
    });

    const result = await response.json();
    return new Response(JSON.stringify(result), {
      headers: { "Content-Type": "application/json" },
    });
  },
};

For a multi-step agent that runs scripts and reads results, the pattern looks like this — the agent loop stays inside the Worker:

async function runAgentLoop(task: string, env: Env): Promise<string> {
  const messages: Message[] = [{ role: "user", content: task }];

  while (true) {
    const response = await callClaude(messages, env);

    if (response.stop_reason === "end_turn") {
      return extractText(response.content);
    }

    if (response.stop_reason === "tool_use") {
      // Execute the tool call inside the sandbox — results stay isolated
      const toolResults = await executeTools(response.content, env);
      messages.push({ role: "assistant", content: response.content });
      messages.push({ role: "user", content: toolResults });
    }
  }
}

Verify the deployment works:

wrangler deploy
curl -X POST https://my-claude-agent.your-subdomain.workers.dev \
  -H "Content-Type: application/json" \
  -d '{"task": "Write a Python function to calculate fibonacci numbers and test it with inputs 1 through 10"}'

Expected output shape:

{
  "stop_reason": "end_turn",
  "content": [
    {
      "type": "text",
      "text": "Here is the function and the test results: [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]"
    }
  ]
}

The agent wrote the function, ran it inside the sandbox, and returned the output. You didn't touch a VM.

4. What to watch in production

Execution time limits. Cloudflare Workers have a CPU time limit (typically 10ms–30ms on the free tier, up to 30 seconds on paid plans with Durable Objects). If your agent needs to run long-running computations, you'll hit this ceiling. For CPU-intensive work, consider offloading to a Cloudflare Queue and processing asynchronously.

Egress restrictions. The sandbox isolation that makes this safe also constrains what the agent can reach. If your workflow requires the agent to fetch data from internal APIs or VPCs, you'll need Cloudflare Tunnel or Service Bindings configured. Don't assume the agent can reach arbitrary URLs without testing egress paths explicitly.

Tool call surface area. Every tool you expose to the agent is an attack surface. Be deliberate about which tools the agent can invoke. Start with read-only tools, verify behavior, then expand write permissions incrementally. The sandbox enforces the execution boundary; you still control the tool permissions above that layer.

Multi-cloud or on-premise conflicts. If your organization has a policy against cloud lock-in or routes sensitive workloads to on-premise infrastructure, this integration creates a new dependency that needs sign-off. Cloudflare runs the execution environment, which means data processed by the agent passes through their network. For regulated workloads (PII, financial data, healthcare), audit this against your compliance requirements before moving fast.

Cost model. Cloudflare Workers pricing is based on requests and CPU time. An agent loop that iterates 10 times per request with tool calls counts differently than a single-shot inference call. Run a test workload at realistic scale before estimating production costs.

Mac vs. Linux behavior differences. If you're testing locally with wrangler dev, the local sandbox behavior closely mirrors production but is not identical. Always validate tool execution results in a staging Worker deployment before promoting to production.


The practical takeaway: if you're already running Claude agents and the main friction is "how do I trust this thing to actually run code," Cloudflare Environments removes most of the DIY sandboxing work. The next step is to check the Cloudflare Environments docs for the specific permission scopes your agent needs, map those against your existing Workers setup, and benchmark latency on a realistic agent loop before committing to the architecture.

What I'd look at next: Cloudflare Durable Objects for stateful agent sessions, and the Queues integration for workloads that exceed the CPU time window.


🐦 Faster updates on X: @baegseungh7061
📚 More in this series: AI Insights
💌 Subscribe: Follow on X or grab the RSS

댓글