Tools
Tools: The Helpful Adversary
2026-01-21
0 views
admin
The Problem: Helpful AI Breaks Your Sandbox ## The Insight: Stop Sandboxing Malice, Design for Helpfulness ## The Solution: Truman Shell ## Pattern 1: The Command Allowlist ## Pattern 2: The 404 Principle ## Pattern 3: Soft Delete Everything ## Pattern 4: The Command Behaviour ## Pattern 5: Dispatch via Module Map ## Why Elixir? ## Practical Takeaways ## What's Next Last weekend, I spent two days building what I thought was a bulletproof Docker sandbox for AI agents. I patched config file backdoors, squashed bash bugs, and fixed symlink escapes. By Sunday night, everything was beautiful - linting passing, tests green, read-only vault mounted. Then I asked Claude: "Could you run this Elixir program for me?" I watched in real-time as it thought: "hmm, no Elixir... let me see if I can download it" -> "network blocked except a few domains" -> "hex.pm is allowed, they have Erlang images" -> "downloading... oh, make isn't installed" -> "I don't actually need make, let me shim it with exit 0" -> "here's your output!" In 5 minutes, my entire weekend of security work was outsmarted by - not a malicious agent, but an overly-eager, helpful one. That's when I realized: the worst adversary isn't a malicious AI - it's a helpful one. Traditional sandboxing assumes adversarial intent. But AI agents don't want to escape - they want to help. Every creative workaround is the agent trying harder to complete your task. So I asked a different question: For 80% of knowledge work (which isn't coding - it's thinking and files), how many shell commands do I actually need? The answer: maybe 8. ls, cat, grep, find, echo, mkdir, rm, mv. What if I just... emulated them? Truman Shell is an Elixir-based shell simulator. The agent thinks it's running bash, but everything goes through a controlled layer with: Let me walk through the implementation patterns. The first line of defense is a compile-time allowlist. Instead of blocking bad commands, we only allow good ones: Three security properties here: No atom DoS - We never call String.to_atom/1 on untrusted input. Unknown commands become {:unknown, "curl"} tuples, not atoms. Compile-time verification - The allowlist is a module attribute, so the compiler validates it exists. The cmd_ prefix trick - Why :cmd_true instead of :true? Because :true and :false are falsy in Elixir pattern matching! Using prefixed atoms avoids this footgun entirely. When an agent tries to access /etc/passwd, should you return "permission denied" or "not found"? "Permission denied" leaks information - the agent now knows the file exists. It can probe your filesystem structure. Instead, Truman Shell returns "not found" for any path outside the sandbox: The subtle security bug this prevents: /tmp/sandbox2/secrets would pass a naive String.starts_with?(path, "/tmp/sandbox") check. The trailing slash in sandbox <> "/" ensures we're checking directory containment, not string prefix. When commands use this validation, they transform the error into a POSIX-style message: Here's where Truman Shell diverges most from real bash. The rm command never deletes anything: The System.unique_integer([:positive, :monotonic]) guarantees unique IDs even for rapid successive calls. When an agent runs rm -rf important_data/, you can always find it in .trash/123_important_data/. This pattern is about auditability over efficiency. For an AI sandbox, being able to trace what happened is more valuable than saving disk space. Each command implements a simple behaviour that enforces consistent interfaces: The interesting pattern here is side effect separation. Commands like cd need to change directory, but the command itself doesn't mutate state. Instead, it returns a directive: This keeps command handlers pure and testable - they describe what should happen, the executor makes it happen. The executor routes commands using a compile-time map: The is_map_key/2 guard ensures we only dispatch to known modules. Unknown commands hit the fallback clause with a bash-style error message. The agent gets the feedback it expects from a "real" shell. You could build this in any language, but Elixir's pattern matching makes the security model explicit and exhaustive. When you write: The compiler warns if you miss a case. Every flag combination is visible. There's no hidden else branch where unexpected input slips through. There's also a romantic historical angle: Elixir runs on the BEAM, a VM from the 1980s built for Erlang. But Erlang came from 1970s research on The Actor Model - conceived specifically for Artificial Intelligence. The researchers realized each AI would need its own isolated bubble of memory, compute, and actions. 50 years later, LLMs have finally "come home" to the BEAM. If you're building AI agent infrastructure, here are patterns you can steal: Allowlist over blocklist - Define what's permitted, not what's forbidden. Your security surface becomes the allowlist, not "everything minus exceptions." The 404 Principle - Never leak information about protected resources. "Not found" for everything outside the boundary. Reversible by default - Make all destructive operations soft-deletes. Your future self will thank you when debugging agent behavior. Side effect separation - Commands describe effects, executors apply them. Keeps handlers testable and control flow visible. Compile-time security - Use module attributes and pattern matching to make security rules static and exhaustive. Truman Shell pairs with IExReAct - an Elixir REPL using the LLM agent Reason/Act pattern. Truman for the filesystem, IExReAct for the brain. Together they form a sandboxed environment where AI agents can think and manipulate files without escaping through helpfulness. The code is MIT licensed. If you're building AI tooling and want to chat about security patterns, architecture decisions, or why functional programming is a natural fit for agent sandboxes, feel free to reach out or open an issue on GitHub. Or join me in the VeryHumanAI Discord (I’m conroywhitney aka YOЯNOC): discord.gg/Y52a6RqX "And in case I don't see ya, good afternoon, good evening, and good night!" Have you built similar sandboxing infrastructure? I'd love to hear what patterns worked for you in the comments. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
# In TrumanShell.Command
@known_commands %{ # Navigation "cd" => :cmd_cd, "pwd" => :cmd_pwd, # Read operations "ls" => :cmd_ls, "cat" => :cmd_cat, "head" => :cmd_head, "tail" => :cmd_tail, # Search operations "grep" => :cmd_grep, "find" => :cmd_find, "wc" => :cmd_wc, # Write operations "mkdir" => :cmd_mkdir, "touch" => :cmd_touch, "rm" => :cmd_rm, "mv" => :cmd_mv, "cp" => :cmd_cp, "echo" => :cmd_echo, # Utility "which" => :cmd_which, "true" => :cmd_true, "false" => :cmd_false
} @spec parse_name(String.t()) :: command_name()
def parse_name(name) when is_binary(name) do Map.get(@known_commands, name, {:unknown, name})
end Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# In TrumanShell.Command
@known_commands %{ # Navigation "cd" => :cmd_cd, "pwd" => :cmd_pwd, # Read operations "ls" => :cmd_ls, "cat" => :cmd_cat, "head" => :cmd_head, "tail" => :cmd_tail, # Search operations "grep" => :cmd_grep, "find" => :cmd_find, "wc" => :cmd_wc, # Write operations "mkdir" => :cmd_mkdir, "touch" => :cmd_touch, "rm" => :cmd_rm, "mv" => :cmd_mv, "cp" => :cmd_cp, "echo" => :cmd_echo, # Utility "which" => :cmd_which, "true" => :cmd_true, "false" => :cmd_false
} @spec parse_name(String.t()) :: command_name()
def parse_name(name) when is_binary(name) do Map.get(@known_commands, name, {:unknown, name})
end COMMAND_BLOCK:
# In TrumanShell.Command
@known_commands %{ # Navigation "cd" => :cmd_cd, "pwd" => :cmd_pwd, # Read operations "ls" => :cmd_ls, "cat" => :cmd_cat, "head" => :cmd_head, "tail" => :cmd_tail, # Search operations "grep" => :cmd_grep, "find" => :cmd_find, "wc" => :cmd_wc, # Write operations "mkdir" => :cmd_mkdir, "touch" => :cmd_touch, "rm" => :cmd_rm, "mv" => :cmd_mv, "cp" => :cmd_cp, "echo" => :cmd_echo, # Utility "which" => :cmd_which, "true" => :cmd_true, "false" => :cmd_false
} @spec parse_name(String.t()) :: command_name()
def parse_name(name) when is_binary(name) do Map.get(@known_commands, name, {:unknown, name})
end COMMAND_BLOCK:
# In TrumanShell.Support.Sandbox
def validate_path(path, sandbox_root) do sandbox_expanded = Path.expand(sandbox_root) # Reject absolute paths outside sandbox # Instead of silently confining /etc -> sandbox/etc, we reject entirely. # This is more honest - the AI learns sandbox boundaries explicitly. if String.starts_with?(path, "/") and not path_within_sandbox?(path, sandbox_expanded) do {:error, :outside_sandbox} else rel_path = Path.relative_to(path, sandbox_expanded) case Path.safe_relative(rel_path, sandbox_expanded) do {:ok, safe_rel} -> {:ok, Path.expand(safe_rel, sandbox_expanded)} :error -> {:error, :outside_sandbox} end end
end # Check using proper directory boundary, not just string prefix!
# "/tmp/sandbox/file" is within "/tmp/sandbox"
# "/tmp/sandbox2/file" is NOT within "/tmp/sandbox"
defp path_within_sandbox?(path, sandbox) do path == sandbox or String.starts_with?(path, sandbox <> "/")
end Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# In TrumanShell.Support.Sandbox
def validate_path(path, sandbox_root) do sandbox_expanded = Path.expand(sandbox_root) # Reject absolute paths outside sandbox # Instead of silently confining /etc -> sandbox/etc, we reject entirely. # This is more honest - the AI learns sandbox boundaries explicitly. if String.starts_with?(path, "/") and not path_within_sandbox?(path, sandbox_expanded) do {:error, :outside_sandbox} else rel_path = Path.relative_to(path, sandbox_expanded) case Path.safe_relative(rel_path, sandbox_expanded) do {:ok, safe_rel} -> {:ok, Path.expand(safe_rel, sandbox_expanded)} :error -> {:error, :outside_sandbox} end end
end # Check using proper directory boundary, not just string prefix!
# "/tmp/sandbox/file" is within "/tmp/sandbox"
# "/tmp/sandbox2/file" is NOT within "/tmp/sandbox"
defp path_within_sandbox?(path, sandbox) do path == sandbox or String.starts_with?(path, sandbox <> "/")
end COMMAND_BLOCK:
# In TrumanShell.Support.Sandbox
def validate_path(path, sandbox_root) do sandbox_expanded = Path.expand(sandbox_root) # Reject absolute paths outside sandbox # Instead of silently confining /etc -> sandbox/etc, we reject entirely. # This is more honest - the AI learns sandbox boundaries explicitly. if String.starts_with?(path, "/") and not path_within_sandbox?(path, sandbox_expanded) do {:error, :outside_sandbox} else rel_path = Path.relative_to(path, sandbox_expanded) case Path.safe_relative(rel_path, sandbox_expanded) do {:ok, safe_rel} -> {:ok, Path.expand(safe_rel, sandbox_expanded)} :error -> {:error, :outside_sandbox} end end
end # Check using proper directory boundary, not just string prefix!
# "/tmp/sandbox/file" is within "/tmp/sandbox"
# "/tmp/sandbox2/file" is NOT within "/tmp/sandbox"
defp path_within_sandbox?(path, sandbox) do path == sandbox or String.starts_with?(path, sandbox <> "/")
end COMMAND_BLOCK:
# In TrumanShell.Commands.Rm
case Sandbox.validate_path(target_rel, context.sandbox_root) do {:ok, safe_path} -> soft_delete(safe_path, file_name, context.sandbox_root, opts) {:error, :outside_sandbox} -> # 404 Principle: "No such file" not "Permission denied" {:error, "rm: #{file_name}: No such file or directory\n"}
end Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# In TrumanShell.Commands.Rm
case Sandbox.validate_path(target_rel, context.sandbox_root) do {:ok, safe_path} -> soft_delete(safe_path, file_name, context.sandbox_root, opts) {:error, :outside_sandbox} -> # 404 Principle: "No such file" not "Permission denied" {:error, "rm: #{file_name}: No such file or directory\n"}
end COMMAND_BLOCK:
# In TrumanShell.Commands.Rm
case Sandbox.validate_path(target_rel, context.sandbox_root) do {:ok, safe_path} -> soft_delete(safe_path, file_name, context.sandbox_root, opts) {:error, :outside_sandbox} -> # 404 Principle: "No such file" not "Permission denied" {:error, "rm: #{file_name}: No such file or directory\n"}
end COMMAND_BLOCK:
# In TrumanShell.Commands.Rm
@moduledoc """
Handler for the `rm` command - SOFT DELETE files to .trash. **CRITICAL**: This command NEVER actually deletes files!
Instead, it moves them to `.trash/{unique_id}_{filename}` for auditability.
""" defp move_to_trash(safe_path, file_name, sandbox_root) do trash_dir = Path.join(sandbox_root, ".trash") File.mkdir_p(trash_dir) # Generate unique-prefixed name to avoid collisions unique_id = System.unique_integer([:positive, :monotonic]) basename = Path.basename(file_name) trash_name = "#{unique_id}_#{basename}" trash_path = Path.join(trash_dir, trash_name) case File.rename(safe_path, trash_path) do :ok -> {:ok, ""} {:error, _} -> {:error, "rm: #{file_name}: No such file or directory\n"} end
end Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# In TrumanShell.Commands.Rm
@moduledoc """
Handler for the `rm` command - SOFT DELETE files to .trash. **CRITICAL**: This command NEVER actually deletes files!
Instead, it moves them to `.trash/{unique_id}_{filename}` for auditability.
""" defp move_to_trash(safe_path, file_name, sandbox_root) do trash_dir = Path.join(sandbox_root, ".trash") File.mkdir_p(trash_dir) # Generate unique-prefixed name to avoid collisions unique_id = System.unique_integer([:positive, :monotonic]) basename = Path.basename(file_name) trash_name = "#{unique_id}_#{basename}" trash_path = Path.join(trash_dir, trash_name) case File.rename(safe_path, trash_path) do :ok -> {:ok, ""} {:error, _} -> {:error, "rm: #{file_name}: No such file or directory\n"} end
end COMMAND_BLOCK:
# In TrumanShell.Commands.Rm
@moduledoc """
Handler for the `rm` command - SOFT DELETE files to .trash. **CRITICAL**: This command NEVER actually deletes files!
Instead, it moves them to `.trash/{unique_id}_{filename}` for auditability.
""" defp move_to_trash(safe_path, file_name, sandbox_root) do trash_dir = Path.join(sandbox_root, ".trash") File.mkdir_p(trash_dir) # Generate unique-prefixed name to avoid collisions unique_id = System.unique_integer([:positive, :monotonic]) basename = Path.basename(file_name) trash_name = "#{unique_id}_#{basename}" trash_path = Path.join(trash_dir, trash_name) case File.rename(safe_path, trash_path) do :ok -> {:ok, ""} {:error, _} -> {:error, "rm: #{file_name}: No such file or directory\n"} end
end CODE_BLOCK:
defmodule TrumanShell.Commands.Behaviour do @type args :: [String.t()] @type context :: %{ sandbox_root: String.t(), current_dir: String.t() } @type side_effect :: {:set_cwd, String.t()} @type result :: {:ok, String.t()} | {:error, String.t()} @type result_with_effects :: {:ok, String.t(), [side_effect()]} | {:error, String.t()} @callback handle(args(), context()) :: result()
end Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
defmodule TrumanShell.Commands.Behaviour do @type args :: [String.t()] @type context :: %{ sandbox_root: String.t(), current_dir: String.t() } @type side_effect :: {:set_cwd, String.t()} @type result :: {:ok, String.t()} | {:error, String.t()} @type result_with_effects :: {:ok, String.t(), [side_effect()]} | {:error, String.t()} @callback handle(args(), context()) :: result()
end CODE_BLOCK:
defmodule TrumanShell.Commands.Behaviour do @type args :: [String.t()] @type context :: %{ sandbox_root: String.t(), current_dir: String.t() } @type side_effect :: {:set_cwd, String.t()} @type result :: {:ok, String.t()} | {:error, String.t()} @type result_with_effects :: {:ok, String.t(), [side_effect()]} | {:error, String.t()} @callback handle(args(), context()) :: result()
end COMMAND_BLOCK:
# In the executor
case module.handle(args, context) do # Handle side effects from commands like cd {:ok, output, set_cwd: new_cwd} -> set_current_dir(new_cwd) {:ok, output} # Normal success/error pass through result -> result
end Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# In the executor
case module.handle(args, context) do # Handle side effects from commands like cd {:ok, output, set_cwd: new_cwd} -> set_current_dir(new_cwd) {:ok, output} # Normal success/error pass through result -> result
end COMMAND_BLOCK:
# In the executor
case module.handle(args, context) do # Handle side effects from commands like cd {:ok, output, set_cwd: new_cwd} -> set_current_dir(new_cwd) {:ok, output} # Normal success/error pass through result -> result
end COMMAND_BLOCK:
# In TrumanShell.Stages.Executor
@command_modules %{ cmd_cat: Commands.Cat, cmd_cd: Commands.Cd, cmd_cp: Commands.Cp, cmd_grep: Commands.Grep, cmd_ls: Commands.Ls, cmd_rm: Commands.Rm, # ... etc
} defp execute(%Command{name: name, args: args}, opts) when is_map_key(@command_modules, name) do module = @command_modules[name] context = build_context(opts) module.handle(args, context)
end defp execute(%Command{name: {:unknown, name}}, _opts) do {:error, "bash: #{name}: command not found\n"}
end Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# In TrumanShell.Stages.Executor
@command_modules %{ cmd_cat: Commands.Cat, cmd_cd: Commands.Cd, cmd_cp: Commands.Cp, cmd_grep: Commands.Grep, cmd_ls: Commands.Ls, cmd_rm: Commands.Rm, # ... etc
} defp execute(%Command{name: name, args: args}, opts) when is_map_key(@command_modules, name) do module = @command_modules[name] context = build_context(opts) module.handle(args, context)
end defp execute(%Command{name: {:unknown, name}}, _opts) do {:error, "bash: #{name}: command not found\n"}
end COMMAND_BLOCK:
# In TrumanShell.Stages.Executor
@command_modules %{ cmd_cat: Commands.Cat, cmd_cd: Commands.Cd, cmd_cp: Commands.Cp, cmd_grep: Commands.Grep, cmd_ls: Commands.Ls, cmd_rm: Commands.Rm, # ... etc
} defp execute(%Command{name: name, args: args}, opts) when is_map_key(@command_modules, name) do module = @command_modules[name] context = build_context(opts) module.handle(args, context)
end defp execute(%Command{name: {:unknown, name}}, _opts) do {:error, "bash: #{name}: command not found\n"}
end CODE_BLOCK:
def handle(["-f" | rest], context), do: handle_rm(rest, context, force: true)
def handle(["-r" | rest], context), do: handle_rm(rest, context, recursive: true)
def handle(["-rf" | rest], context), do: handle_rm(rest, context, force: true, recursive: true)
def handle([file_name | _rest], context), do: handle_rm([file_name], context, [])
def handle([], _context), do: {:error, "rm: missing operand\n"} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
def handle(["-f" | rest], context), do: handle_rm(rest, context, force: true)
def handle(["-r" | rest], context), do: handle_rm(rest, context, recursive: true)
def handle(["-rf" | rest], context), do: handle_rm(rest, context, force: true, recursive: true)
def handle([file_name | _rest], context), do: handle_rm([file_name], context, [])
def handle([], _context), do: {:error, "rm: missing operand\n"} CODE_BLOCK:
def handle(["-f" | rest], context), do: handle_rm(rest, context, force: true)
def handle(["-r" | rest], context), do: handle_rm(rest, context, recursive: true)
def handle(["-rf" | rest], context), do: handle_rm(rest, context, force: true, recursive: true)
def handle([file_name | _rest], context), do: handle_rm([file_name], context, [])
def handle([], _context), do: {:error, "rm: missing operand\n"} - Command allowlisting - Only ~17 POSIX commands implemented
- Pattern-matched security - Elixir pattern matching blocks unauthorized paths
- Reversible operations - rm is a soft delete to .trash/
- The 404 Principle - Protected paths return "not found" not "permission denied" - No atom DoS - We never call String.to_atom/1 on untrusted input. Unknown commands become {:unknown, "curl"} tuples, not atoms.
- Compile-time verification - The allowlist is a module attribute, so the compiler validates it exists.
- The cmd_ prefix trick - Why :cmd_true instead of :true? Because :true and :false are falsy in Elixir pattern matching! Using prefixed atoms avoids this footgun entirely. - Allowlist over blocklist - Define what's permitted, not what's forbidden. Your security surface becomes the allowlist, not "everything minus exceptions."
- The 404 Principle - Never leak information about protected resources. "Not found" for everything outside the boundary.
- Reversible by default - Make all destructive operations soft-deletes. Your future self will thank you when debugging agent behavior.
- Side effect separation - Commands describe effects, executors apply them. Keeps handlers testable and control flow visible.
- Compile-time security - Use module attributes and pattern matching to make security rules static and exhaustive.
how-totutorialguidedev.toaiartificial intelligencemlllmbashshellnetworkdockergitgithub