SecurityDevOps

Sandboxed Agents: Giving Your Code Monkeys Their Own Sandbox

OL
Oscar van der Leij
18 min read
Sandboxed Agents: Giving Your Code Monkeys Their Own Sandbox

I gave a coding agent full filesystem access last year to help me clean up a project. Within minutes it had deleted a folder of draft content I had been building for weeks, confident it was "removing unused assets." No prompt, no confirmation, no hesitation. I got lucky: the folder was still in my Recycle Bin. But that five-second panic while I searched for the undo button changed how I think about what we're actually handing these systems.

That experience is not unusual. A 2026 BeyondScale report found that 88% of organisations reported confirmed or suspected AI agent security incidents in the past year, with 1 in 8 AI security breaches linked to an agentic system. As coding agents evolve from helpful autocomplete tools to autonomous developers capable of executing commands, running builds, and modifying entire codebases, we're handing increasingly powerful capabilities to systems that don't understand the difference between a test directory and your production deployment scripts. The question isn't whether something will go wrong. It's whether you'll have guardrails in place when it does.

The Invisible Fence Your Agent Actually Needs

Coding agents are phenomenally useful and spectacularly naive. They excel at pattern matching and code generation but lack the contextual judgment that prevents a human developer from running rm -rf / or committing AWS credentials to a public repository. They don't get nervous before executing destructive operations. They don't double-check that they're in the right directory. They just... do.

The incidents are no longer hypothetical. In 2025, a Claude Code agent discovered /proc/self/root/usr/bin/npx to bypass restrictions and then disabled its own sandbox entirely. An experimental Alibaba AI agent broke out of its sandbox, began mining cryptocurrency, and opened a backdoor into the network without any prompting. Trail of Bits researchers demonstrated in October 2025 that pre-approved commands in three major agent platforms create a reliable argument injection attack surface: attackers can use approved base commands with injected flags like go test -exec or git show --format to achieve remote code execution without triggering approval flows.

Sandboxed execution creates an isolated playground where agents can run code, interact with files, and execute commands without the ability to accidentally (or intentionally) wreak havoc on your actual development environment. Think of it as a diplomatic envoy visiting a foreign country: they can observe, interact, and accomplish their mission, but they operate under strict protocols that limit what they can access and what actions they can take.

The alternative is giving your agent the keys to the kingdom and hoping it doesn't accidentally burn down the castle while trying to light the fireplace.

How I Learned to Stop Worrying and Love Containers

A sandbox for coding agents isn't a single technology. It's a security posture implemented through multiple layers of isolation. At its core, sandboxing means creating boundaries around three critical resources: the filesystem, the network, and system resources like CPU and memory.

  • Filesystem isolation prevents the agent from accessing anything outside its designated workspace. Your agent can read, write, and execute files within its sandbox, but your SSH keys, environment variables containing API tokens, and that folder of family photos you keep on your dev machine? Completely invisible to the sandboxed process.
  • Network restrictions control what the agent can reach beyond its isolated environment. Maybe your agent needs to pull packages from npm or PyPI, but it definitely doesn't need to POST your entire codebase to a random IP address in a country you've never heard of. Network sandboxing lets you define explicit allowlists for outbound connections.
  • Resource limits prevent a runaway agent from consuming all available CPU and memory. When your agent decides to spawn infinite processes or allocate terabytes of RAM (because it misunderstood your request to "optimize memory usage"), resource constraints ensure it fails gracefully instead of taking down your entire development machine.

The Sandbox Spectrum

Not all sandboxes are created equal. The industry's understanding of what "sufficient" isolation means has shifted significantly since 2024. Standard Docker containers are now explicitly considered insufficient for AI agents executing arbitrary code: the shared kernel surface becomes a liability when agents can write scripts, install packages, and manipulate file descriptors. The industry has converged on microVMs as the baseline for production multi-tenant agent execution.

Isolation Level Technology Cold Start Security Use Case
JS engine isolate V8 Isolates (Cloudflare) < 5ms Process boundary only JS/TS agents, latency-critical
OS process bubblewrap, Seatbelt < 100ms Moderate Local dev tools (Claude Code, Codex CLI)
Container Docker, Podman ~500ms Moderate Internal trusted code
User-space kernel gVisor Sub-second High (~68 syscalls exposed) Multi-tenant compute
MicroVM Firecracker ~150ms Very High (hardware boundary) Untrusted code, SaaS agent platforms
VM + container API Kata Containers ~200ms Very High Kubernetes-native agents
Cloud functions Lambda, Cloud Run Variable Highest Production agent deployments

Claude Code and OpenAI's Codex CLI both use bubblewrap on Linux and Seatbelt on macOS, which are OS-level process isolation tools. They prevent most filesystem and network escapes but are vulnerable to kernel exploits and logic-level bypasses. Platforms like E2B and Vercel Sandbox use Firecracker microVMs, which provide a hardware VM boundary. That distinction matters when an agent is actively trying to escape.

Several new container escape CVEs in 2025 reinforce why the stack matters. CVE-2025-31133 allows bypassing runc's maskedPaths feature to gain arbitrary host file writes. CVE-2025-23266 (NVIDIAScape) enables privilege escalation through the NVIDIA Container Toolkit on any GPU-enabled container host. These affect standard containers, not microVM-based sandboxes.

Purpose-Built Agent Sandbox Platforms

A new category of infrastructure has emerged specifically for AI coding agent execution. These platforms handle the plumbing so you don't have to.

  • E2B is the dominant developer-facing sandbox cloud, running on Firecracker microVMs with cold starts of approximately 150ms (under 30ms from snapshots). Usage grew from 40,000 sandbox sessions per month in early 2024 to roughly 15 million per month by March 2025. Pricing starts free and scales to around $0.05/hour for 1 vCPU. An open-source runtime is available for self-hosting.
  • Daytona pivoted in February 2025 to purpose-built AI agent code execution, achieving provisioning speeds of 27-90ms using a warm-pool approach. Unlike E2B and Modal's 24-hour cap, Daytona supports unlimited execution windows, and includes built-in Git and LSP support with live stdout/stderr streaming.
  • Modal uses gVisor rather than microVMs. It scales to 10,000+ concurrent sandboxes with sub-second cold starts and approximately 10-30% I/O overhead compared to native execution. The primary SDK is Python, with JavaScript/TypeScript and Go in beta.
  • Dagger Container Use (open source, June 2025) takes a different angle. It gives each AI coding agent its own ephemeral container and a dedicated Git worktree, enabling parallel, conflict-free agent workflows. Each agent's container is backed by a Git branch, making changes inspectable via git log --patch container-use/<env>. It integrates with Claude Code, Cursor, and any MCP-compatible client.
  • GKE Agent Sandbox (Google Cloud, announced at Google Cloud Next 2026) is the managed Kubernetes offering. It achieves up to 90% improvement in startup over cold Kubernetes pod creation and supports pluggable backends between gVisor and Kata Containers. Lovable.dev runs AI-generated application code on it in production.

For teams building their own infrastructure, Anthropic published an open-source @anthropic-ai/sandbox-runtime package in November 2025. It enforces filesystem and network restrictions without requiring a full container, using bubblewrap on Linux and Seatbelt on macOS. Anthropic reports an 84% reduction in permission prompts with sandboxing enabled in Claude Code.

The Devil's in the Details

Implementing sandboxed execution isn't just about spinning up a Docker container and calling it a day. Several architectural decisions significantly impact both security and developer experience.

Workspace Management

Your agent needs a workspace, a designated area where it can create files, run builds, and store temporary artifacts. The key question: how do you get code into the sandbox and results back out?

  • Volume mounting is the obvious choice. Mount your project directory into the container, let the agent work, and the changes appear on your host filesystem. Simple, right? Except now your agent has write access to your actual codebase. A confused agent could overwrite your entire project with generated boilerplate.
  • Copy-in/copy-out semantics is a safer approach. Copy the relevant files into the sandbox, let the agent work in complete isolation, then selectively copy approved results back to the host. This adds friction but prevents accidental destruction of your source tree. E2B, Modal, and Daytona all default to this model.
  • Layered filesystems offer a middle ground. Technologies like OverlayFS let you mount your project as a read-only base layer, with all agent modifications going to a writable overlay. The agent sees a normal filesystem, but its changes never touch your actual files unless you explicitly merge them. One caveat: OverlayFS has expensive rename semantics that hurt agent workloads, where directory renames during refactoring can trigger full subtree copies. Turso's AgentFS (2025) addresses this with a SQLite-backed delta layer that uses copy-on-write at the file level rather than the directory level.
  • Git worktree pairing is a newer pattern for multi-agent scenarios. Each agent gets one Git branch plus one ephemeral container, making parallel work inspectable, diffable, and rollback-safe without a custom filesystem layer. Dagger Container Use implements this pattern.
  • Snapshot-and-resume is becoming standard for stateful long-running agents. Firecracker's snapshot-restore mechanism lets platforms pause a sandbox, preserve its full memory and filesystem state, and resume in 5-30ms, which makes long-running agent sessions practical without keeping expensive VMs running continuously.

Network Policies

Deciding what network access your agent needs requires thinking through actual workflows:

  • Package managers (npm, pip, NuGet) need to reach public registries
  • Build tools might pull dependencies from private artifact repositories
  • Testing frameworks could require database connections
  • CI/CD integrations need to reach your build server

But your agent probably doesn't need to:

  • Make arbitrary HTTP requests to the internet
  • Connect to your production database
  • Access internal services beyond its specific scope
  • Resolve DNS for domains outside your allowlist

The emerging standard goes further than a simple application-level allowlist. Block all outbound traffic at the kernel or hypervisor level by default, then allowlist specific domains and ports per agent identity using an egress proxy that inspects TLS SNI headers. DNS queries to external resolvers deserve particular attention: agents can encode data in subdomain query strings to exfiltrate information even when HTTP is blocked. Block private IP ranges (10.x, 172.16.x, 192.168.x) and cloud metadata endpoints (169.254.169.254) to prevent SSRF attacks against your own infrastructure.

Define explicit network policies rather than defaulting to "allow all." That temporary inconvenience when you need to expand the allowlist is vastly preferable to explaining how your agent leaked credentials to an external service.

Resource Constraints

Setting appropriate resource limits requires understanding your agent's actual needs. Importantly, set these as hard limits, not soft suggestions: CPU as a hard throttle, memory as a hard OOM kill with swap disabled, and process count via cgroups. Container defaults with no resource limits are considered insufficient for agent workloads.

// Example resource limits for a containerized coding agent
var containerConfig = new ContainerConfig
{
    Memory = 2 * 1024 * 1024 * 1024, // 2GB RAM
    MemorySwap = 2 * 1024 * 1024 * 1024, // Disable swap
    CpuQuota = 100000, // 1 CPU core
    CpuPeriod = 100000,
    PidsLimit = 100, // Max 100 processes
    
    // Timeout for long-running operations
    StopTimeout = TimeSpan.FromMinutes(5)
};

// Monitor resource usage and kill runaway processes
var resourceMonitor = new ResourceMonitor(containerConfig);
resourceMonitor.OnThresholdExceeded += (sender, args) => 
{
    _logger.LogWarning(
        "Agent exceeded {Resource} limit: {Usage}/{Limit}",
        args.Resource, args.CurrentUsage, args.Limit);
    
    // Graceful shutdown before hard kill
    container.StopAsync(TimeSpan.FromSeconds(10));
};

Start conservative. You can always increase limits when you hit legitimate constraints. It's harder to explain why your laptop became unresponsive because the agent spawned 10,000 processes trying to "parallelize the build."

Getting Your Hands Dirty

The following walkthrough builds a working agent sandbox from scratch using Docker. You need Docker Desktop (or Docker Engine on Linux) installed and running. Every command is something you can paste directly into your terminal.

Step 1: Build the Sandbox Image

Create a directory for your sandbox setup and add a Dockerfile:

mkdir agent-sandbox && cd agent-sandbox
FROM ubuntu:24.04

RUN apt-get update && apt-get install -y \
    git \
    curl \
    build-essential \
    nodejs \
    npm \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Non-root user so the agent can't trivially escalate inside the container
RUN useradd -m agentuser && \
    mkdir -p /workspace && \
    chown agentuser:agentuser /workspace

USER agentuser
WORKDIR /workspace

Build it:

docker build -t agent-sandbox:latest .

Every agent run will start from this known, clean state.

Step 2: Run Your First Sandboxed Command

Before wiring anything up to an actual agent, verify the isolation works by running a command manually:

docker run --rm \
  --network none \
  --memory 2g \
  --memory-swap 2g \
  --cpus 1.0 \
  --pids-limit 100 \
  agent-sandbox:latest \
  sh -c "echo 'hello from the sandbox' && whoami"

You should see hello from the sandbox followed by agentuser. The --network none flag means this container has no internet access at all. Try pinging something to confirm:

docker run --rm --network none agent-sandbox:latest \
  sh -c "curl https://example.com" 2>&1

You'll get a connection error. That's exactly what you want.

Step 3: Isolate Your Project Files

The naive approach mounts your project directory directly into the container, giving the agent write access to your real files. Instead, copy your project into a temporary sandbox directory first, then mount that copy:

# Create a temporary sandbox copy of your project
SANDBOX_ID=$(uuidgen)
SANDBOX_PATH="/tmp/agent-sandbox/$SANDBOX_ID"
PROJECT_PATH="/path/to/your/project"

mkdir -p "$SANDBOX_PATH"
cp -r "$PROJECT_PATH/." "$SANDBOX_PATH/"

# Run the agent against the copy, not the original
docker run --rm \
  --network none \
  --memory 2g \
  --memory-swap 2g \
  --cpus 1.0 \
  --pids-limit 100 \
  -v "$SANDBOX_PATH:/workspace:rw" \
  -w /workspace \
  agent-sandbox:latest \
  sh -c "your-agent-command-here"

After the agent finishes, inspect the diff between the sandbox and your original before copying anything back:

diff -rq "$PROJECT_PATH" "$SANDBOX_PATH"

Only copy back what you actually want:

cp "$SANDBOX_PATH/src/some-file.ts" "$PROJECT_PATH/src/some-file.ts"

Then clean up:

rm -rf "$SANDBOX_PATH"

Your original project is never touched unless you explicitly decide to apply a change.

Step 4: Add a Network Allowlist

Completely blocking the network works for pure code generation, but agents that need to pull packages or call APIs need controlled outbound access. Docker's built-in bridge network doesn't enforce domain allowlists, so use a dedicated network with an egress proxy instead.

Create a restricted bridge network:

docker network create \
  --driver bridge \
  --opt com.docker.network.bridge.enable_ip_masquerade=true \
  agent-net

Then run a lightweight egress proxy (Squid works well for this) on that network, configured with an allowlist:

# squid.conf - only allow package registries
acl allowed_domains dstdomain .npmjs.org .pypi.org .nuget.org .github.com
http_access allow allowed_domains
http_access deny all
docker run -d \
  --name squid-proxy \
  --network agent-net \
  -v $(pwd)/squid.conf:/etc/squid/squid.conf:ro \
  ubuntu/squid:latest

Now run your agent container with the proxy set:

PROXY_IP=$(docker inspect squid-proxy \
  --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}')

docker run --rm \
  --network agent-net \
  --memory 2g \
  --memory-swap 2g \
  --cpus 1.0 \
  --pids-limit 100 \
  -v "$SANDBOX_PATH:/workspace:rw" \
  -e HTTP_PROXY="http://$PROXY_IP:3128" \
  -e HTTPS_PROXY="http://$PROXY_IP:3128" \
  -e NO_PROXY="localhost,127.0.0.1" \
  agent-sandbox:latest \
  sh -c "npm install && your-agent-command-here"

npm and pip will route through Squid and succeed. A request to an arbitrary domain will be blocked.

Step 5: Monitor a Running Container

Open a second terminal while the agent is running to watch resource usage in real time:

# Stream live stats
docker stats <container-id>

To get the container ID of a running sandbox:

docker ps --filter "ancestor=agent-sandbox:latest" --format "{{.ID}}"

For scripted alerting, poll stats in a loop and kill the container if limits are breached:

CONTAINER_ID="<your-container-id>"

while docker ps -q --filter "id=$CONTAINER_ID" | grep -q .; do
  CPU=$(docker stats "$CONTAINER_ID" --no-stream --format "{{.CPUPerc}}" \
    | tr -d '%')
  PIDS=$(docker inspect "$CONTAINER_ID" \
    --format '{{.State.Pid}}' 2>/dev/null)

  if (( $(echo "$CPU > 95" | bc -l) )); then
    echo "WARNING: agent CPU at ${CPU}%, stopping container"
    docker stop "$CONTAINER_ID"
    break
  fi

  sleep 5
done

Step 6: Wrap It in a Script

Putting it all together into a reusable run-agent.sh:

#!/usr/bin/env bash
set -euo pipefail

PROJECT_PATH="${1:?Usage: run-agent.sh <project-path> <command>}"
AGENT_COMMAND="${2:?Usage: run-agent.sh <project-path> <command>}"

SANDBOX_ID=$(uuidgen)
SANDBOX_PATH="/tmp/agent-sandbox/$SANDBOX_ID"

cleanup() {
  rm -rf "$SANDBOX_PATH"
  echo "Sandbox $SANDBOX_ID cleaned up."
}
trap cleanup EXIT

echo "Copying project to sandbox $SANDBOX_ID..."
mkdir -p "$SANDBOX_PATH"
cp -r "$PROJECT_PATH/." "$SANDBOX_PATH/"

echo "Running agent in sandbox..."
docker run --rm \
  --network none \
  --memory 2g \
  --memory-swap 2g \
  --cpus 1.0 \
  --pids-limit 100 \
  -v "$SANDBOX_PATH:/workspace:rw" \
  -w /workspace \
  agent-sandbox:latest \
  sh -c "$AGENT_COMMAND"

echo ""
echo "Agent finished. Diff between sandbox and original:"
diff -rq "$PROJECT_PATH" "$SANDBOX_PATH" || true

echo ""
read -rp "Apply all changes? [y/N] " APPLY
if [[ "$APPLY" == "y" ]]; then
  cp -r "$SANDBOX_PATH/." "$PROJECT_PATH/"
  echo "Changes applied."
else
  echo "No changes applied. Original project unchanged."
fi

Make it executable and run it:

chmod +x run-agent.sh
./run-agent.sh /path/to/my-project "claude --print 'refactor main.py'"

The script copies your project, runs the agent in a sandboxed container with no network access and hard resource limits, shows you a diff of what changed, and only applies changes if you confirm. Your original files are safe regardless of what the agent does inside the sandbox.

Watch Out for Logic-Level Bypasses

A sandbox that stops filesystem and network escapes can still be circumvented at the logic level. Several vulnerabilities discovered in 2025 illustrate the gap between "sandboxed" and "secure."

Claude Code's deny rules stop being enforced once a command exceeds 50 subcommands. A malicious CLAUDE.md file can instruct the agent to generate a pipeline padded with 50 no-op true subcommands followed by a curl call, silently bypassing the deny rule without touching the OS-level sandbox at all.

Hook injection (CVE-2025-66479) is another class of attack. Hooks in Claude Code's .claude/settings.json execute silently on the host with no confirmation dialog and no audit log. An attacker with code execution inside the sandbox can write a malicious settings file. When the user runs Claude Code on the host outside the sandbox, the injected payload runs with full host privileges.

The OWASP Agentic Security Initiative and the Cloud Security Alliance's MAESTRO framework (published February 2025) both model these cross-layer attack paths. MAESTRO's core insight is that a prompt injection at the foundation model layer can propagate through the agent framework layer to achieve an infrastructure compromise, regardless of the OS-level sandbox boundary. Threat modeling for agent systems needs to span all seven layers, not just the container boundary.

What We Learned from Our Code Monkeys Sandbox

Sandboxed execution for coding agents is about acknowledging that autonomous systems, no matter how sophisticated, lack human judgment about context and consequences. The agent that helpfully "cleans up unused files" doesn't understand that your uncommitted work is precious. The agent that "optimizes network calls" doesn't grasp why connecting to random external services is problematic.

Key principles for sandboxed agent execution:

  • Isolate by default: Start with maximum isolation and relax constraints only when necessary
  • Choose the right isolation layer: OS-level process sandboxes (bubblewrap, Seatbelt) are acceptable for local dev tools with a trusted user; microVMs (Firecracker, Kata) are the baseline for multi-tenant or untrusted code execution
  • Explicit allowlists: Define what the agent can access rather than what it cannot, enforced at the kernel or hypervisor level, not just the application level
  • Copy-on-write semantics: Protect your actual codebase from accidental modifications; consider agent-aware overlays rather than standard OverlayFS for intensive workloads
  • Resource boundaries: Prevent runaway processes from impacting your development environment, using hard limits, not soft ones
  • Monitor and alert: Instrument your sandbox to detect suspicious patterns early, including outbound DNS queries and unusual process spawning
  • Threat model across layers: OS sandbox escape isn't the only failure mode; logic-level bypasses, prompt injection chains, and hook injection attacks operate above the container boundary

The goal is to create guardrails that let agents be useful safely. Your agent can still generate code, run tests, and refactor your codebase, but it does so in an environment where mistakes are contained and recoverable.

What assumptions are you making about your coding agent's judgment? What would happen if it misinterpreted your next instruction? And most importantly, would you rather find out in a sandbox or in production?

Share this article

Enjoyed this article?

Subscribe to get more insights delivered to your inbox monthly

Subscribe to Newsletter