Why Your AI Developer Needs a Sandbox (And What Goes Inside)
Let's state the obvious: an AI that can write and execute arbitrary code on your machine is a security nightmare waiting to happen.
The model hallucinates a path. The path happens to be /. The command happens to be rm -rf. Your morning just got a lot worse.
This isn't theoretical. AI models make mistakes. They misunderstand context. They sometimes do exactly what you asked, which turns out to be exactly what you didn't want. Any serious AI coding tool needs to account for this.
Here's how we think about it.
The Threat Model
When an AI writes code, several things can go wrong:
Accidental destruction: The AI misunderstands scope and modifies files outside the project. Or runs a cleanup command that's a bit too enthusiastic.
Resource exhaustion: An infinite loop, a memory leak, a fork bomb. The AI doesn't know it's creating one—it's just following a pattern that happens to be catastrophic.
Secret exposure: The AI logs debug output that includes environment variables. Your API keys are now in the PR description.
Supply chain attacks: The AI installs a package that sounds right (loadsh instead of lodash). Congratulations, you've got malware.
Persistence: The AI creates a cron job or background process that outlives the task. Something is now running on your machine that shouldn't be.
None of these require malicious intent. They're just bugs—the kind of bugs that happen when you let code run unsupervised.
The Solution: Isolated Execution
Every task Vince executes runs in its own isolated container. Not on your machine. Not with access to your files. In a fresh, disposable environment that gets destroyed when the task completes.
┌──────────────────────────────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ DOCKER SANDBOX │ │
│ │ │ │
│ │ • Cloned repository (read-write) │ │
│ │ • Code Intelligence tools │ │
│ │ • Test runners, linters │ │
│ │ • Network access (configurable) │ │
│ │ │ │
│ │ ✗ No access to host filesystem │ │
│ │ ✗ No access to other containers │ │
│ │ ✗ No persistent storage after task │ │
│ │ ✗ Resource limits enforced │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
The sandbox contains everything needed to work on your code—and nothing else.
What Lives Inside the Sandbox
1. Your Repository (Cloned Fresh)
Each task starts with a fresh git clone. The AI works on this copy, not your local checkout. Changes only escape the sandbox through Git—pushed to a branch, opened as a PR.
This means:
- Your working directory is never touched
- Uncommitted changes on your machine are safe
- The AI can't accidentally commit your local debugging hacks
2. Code Intelligence Server
This is where the AI's "eyes" live. The Code Intelligence MCP server provides AST-aware tools for navigating and editing code:
// Search across the codebase
search_code("handlePayment.*error")
// Get a file's structure without reading every line
get_file_outline("src/services/PaymentService.ts")
// Extract just the function needed
get_function("processRefund", "src/services/PaymentService.ts")
// Make surgical edits
replace_function("processRefund", updatedImplementation)These tools run inside the sandbox, operating on the cloned repository. They use tree-sitter for AST parsing and ripgrep for fast search—battle-tested tools that won't surprise you.
3. Runtime Environment
The container includes everything needed to build and test your project:
runtime:
docker:
default_image: "node:20-slim" # Or python:3.12, rust:latest, etc.
memory_limit: "4g"
cpu_limit: 4
timeout_seconds: 3600 # Hard cap on execution timeThe AI can run npm install, npm test, npm run lint—whatever your project needs. But it can't install system packages, modify the Docker image, or break out of the container.
4. Controlled Network Access
By default, sandboxes have network access for:
- Pulling dependencies (npm, pip, cargo)
- Pushing to Git
- Calling configured APIs
You can lock this down further:
runtime:
docker:
network_mode: "none" # Complete isolationOr allow specific hosts only. The AI doesn't need to reach the internet to fix a CSS bug.
Resource Limits: The Guardrails
Even inside a sandbox, an AI can cause problems. A test suite that runs forever. A build that consumes all available memory. We enforce hard limits:
Time Limits
runtime:
docker:
timeout_seconds: 3600 # 1 hour max per taskIf a task runs longer than the timeout, it gets killed. The sandbox is destroyed. You're notified of the timeout.
Memory Limits
runtime:
docker:
memory_limit: "4g" # Container can't exceed thisIf the AI writes code that leaks memory, the container hits the limit and the process dies. Your host machine stays healthy.
CPU Limits
runtime:
docker:
cpu_limit: 4 # Max 4 CPU coresPrevents the sandbox from starving other processes on your machine.
Tool Call Limits
guardrails:
max_tool_calls_per_task: 200
max_cost_per_task_usd: 20.00Even if the AI gets stuck in a loop, it can only make so many tool calls before we cut it off. This also caps your API costs.
Tool-Level Safety
Not all tools are created equal. Some are read-only and harmless. Others can delete data or cost money. We categorize them:
Blocked Tools
Some operations are never allowed, regardless of context:
guardrails:
blocked_tools:
- "filesystem.delete_recursive"
- "git.force_push"
- "git.delete_branch_remote" # main/master protectionIf the AI tries to call a blocked tool, it gets an error and has to find another approach.
Approval-Required Tools
Some operations need human confirmation:
guardrails:
require_approval:
- "*delete*" # Anything with 'delete' in the name
- "*destroy*" # Anything with 'destroy'
- "git.push" # Confirm before pushingWhen the AI wants to use one of these tools, execution pauses:
⏸️ Approval Required
Vince wants to execute: git.push
Arguments: { branch: "JIRA-123", remote: "origin" }
Task: BACKEND-123 - Fix login button on mobile
[Approve] [Reject] [View Context]
You decide whether to proceed. The AI waits.
Safe Tools
Everything else runs without interruption. Reading files, searching code, running tests—these happen automatically as part of normal execution.
Pluggable Backends
Docker is just the default. As your needs grow, you can swap in different backends:
Docker (Default)
Best for: Local development, quick tasks, low latency
runtime:
backend: dockerContainers spin up in seconds. Great for iteration speed.
Hetzner Cloud VMs (Coming Soon)
Best for: Production workloads, long-running tasks, stronger isolation
runtime:
backend: hetzner
hetzner:
server_type: "cpx21" # 3 vCPU, 4GB RAM
location: "fsn1"
auto_delete: true # Destroy VM after taskEach task gets a dedicated VM. Complete isolation between tasks. Auto-cleanup ensures you're not paying for idle resources.
Kubernetes (Future)
Best for: Enterprise, auto-scaling, multi-tenant
runtime:
backend: kubernetesTasks run as pods with resource quotas and network policies. Scales with your workload.
The Escape Hatch: Git
Here's the key insight: the only way code escapes the sandbox is through Git.
The AI can modify files all it wants inside the container. It can create, delete, rename. But those changes only matter when they're:
- Committed to a branch
- Pushed to your remote
- Opened as a pull request
And pull requests go through your normal review process. The sandbox isn't the last line of defense—it's the first. Your code review is the second. Your CI pipeline is the third.
What This Means for You
You can let an AI write code without worrying about:
- Your local files being touched
- Your machine resources being exhausted
- Secrets leaking through debug output
- Persistent processes surviving the task
- One task interfering with another
The worst case scenario is a failed task and a destroyed container. Your machine, your files, and your secrets remain untouched.
Configuring Your Sandbox
Here's a complete configuration example:
runtime:
backend: docker
docker:
default_image: "vince/code-sandbox:node-20"
memory_limit: "4g"
cpu_limit: 4
timeout_seconds: 3600
network_mode: "bridge"
guardrails:
blocked_tools:
- "filesystem.delete_recursive"
- "git.force_push"
require_approval:
- "*delete*"
- "git.push"
max_tool_calls_per_task: 200
max_cost_per_task_usd: 20.00Start restrictive. Loosen as you build trust. The defaults are conservative for a reason.
Security questions? Edge cases we haven't covered? We'd love to hear them—our threat model is only as good as the scenarios we've considered.