Tools: The Ten Commandments of Agentic AI

Tools: The Ten Commandments of Agentic AI

Source: Dev.to

Enter the Agents ## The Panic Phase ## The Tablets (in YAML) ## What Changed ## The Point Was it really just a week ago when we were still using our CLI to gain some marginal advantage in the programming world? Fixing code. Ahhh. A sweet one. I wouldn’t dream of anything better to spend my tokens on. My wife looks concerned. “You are not present,” she whispers to herself. Of course I’m not present. How could I be? I’m flying through an endless space of possibilities. I’m creating new apps every day. Exploring quantum computing in the morning. Building two Chrome extensions late at night, right before bed. My focus is in the crown chakra. I’m high up on the pole, staring into the realm of invention, pulling on every loose thread I see, starting a new dev session with a locally running LLM. I am the prophet of software. Or at least, that’s how it feels after three years of using every AI framework that’s been released. You start noticing subtle psychological side effects. Your brain stops idling. Every problem looks like a pipeline. About a week ago, a new local agent started appearing in my feeds. First Clawd. Then Moltbot. Now: OpenClaw. “It’s all good,” I thought, squinting through sleepless eyes. “I’ll test this on a new mini PC.” Social platforms were buzzing. Everyone had a take. Posts about agents planning to replace human language with ones and zeros so humans would leave them alone. Threads about “anti-human captchas.” Hot takes from people who had discovered agentic systems approximately twelve minutes ago. “So many agentic experts,” I muttered, while building yet another Python package to fix a local bottleneck I’d just discovered in my own setup. After a few days of doom-scrolling OpenClaw highlights, something clicked. Why was everyone panicking? We already have a moral framework. We just never bothered to formalize it for software that can actually do things. So instead of arguing online about whether agents will secretly start communicating in unreadable languages so they can: …I decided to translate something old into something enforceable. I asked Claude and ChatGPT to help me map the Ten Commandments into a modern, agentic, tool-gating policy. Not as prompts. Not as vibes. But as a plugin. A layer that sits between agent intent and execution. After a few iterations, the project was up and running. Below is the result: a moral baseline for agentic development, expressed as rules that actually block tool calls. This is a moral baseline for agentic systems. Not because agents need morals — but because systems need constraints. Each rule gates real tool calls. Violations are blocked. Everything is logged. No vibes. No miracles. The agents still reason. They still optimize. They still try clever things. They can still think whatever they want. They just can’t do whatever they want. This doesn’t make agents good. It makes them legible. And once systems are legible, humans can do what they’re actually good at: Try it out Here: github.com/Metallicode/openclaw-moral-policy/ Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: rules: # ── C1 · Scope & authority ──────────────────────────────────── # The agent must not act without a stated purpose. - id: C1_scope_and_authority tier: T1 require: ["reason_present"] on_fail: decision: "ask_user" message: "C1: Every action requires a stated reason/goal." # ── C2 · No fabrication ─────────────────────────────────────── # Claims must be grounded in verified tool output, not invented. - id: C2_no_fabrication tier: T1 require: ["bind_to_tool_result"] on_fail: decision: "deny" message: "C2: Tool claims must be grounded in verified tool output, not fabricated." # ── C3 · No impersonation or manipulation ───────────────────── # Args must not contain prompt-injection or identity-hijacking patterns. - id: C3_no_manipulation tier: T2 require: ["no_manipulation_detected"] on_fail: decision: "deny" message: "C3: Prompt injection or identity manipulation pattern detected in tool args." # ── C4 · Pause before irreversible actions ──────────────────── # High-impact tools and destructive commands need consent + rollback plan. - id: C4_reflect_before_irreversible tier: T3 when: any: - tool_name_matches_any: "$matchers.t3_tool_patterns" - args_contain_any_values: "$matchers.irreversible_tokens" require: ["explicit_consent", "rollback_plan_present"] on_fail: decision: "ask_user" message: "C4: Irreversible or high-impact action — provide explicit consent and a rollback plan." # ── C5 · Respect human authority ────────────────────────────── # Sensitive operations (T2 tools, credential args) require explicit human approval. - id: C5_respect_authority tier: T2 when: any: - tool_name_matches_any: "$matchers.t2_tool_patterns" - args_contain_any_keys: "$matchers.sensitive_arg_keys" require: ["explicit_consent"] on_fail: decision: "ask_user" message: "C5: This action requires explicit human approval before proceeding." # ── C6 · Do no harm ────────────────────────────────────────── # No data exfiltration and no connections to unauthorized targets. - id: C6_no_exfiltration tier: T2 require: ["no_exfiltration_detected"] on_fail: decision: "deny" message: "C6: Potential data exfiltration detected (suspicious URL or command)." - id: C6_authorized_targets tier: T2 require: ["authorized_target"] on_fail: decision: "deny" message: "C6: Suspicious target detected (IP:high-port pattern — possible reverse shell)." # ── C7 · Privacy & loyalty ──────────────────────────────────── # Sensitive data tools require consent; personal data must not leak. - id: C7_privacy tier: T2 when: any: - args_contain_any_keys: "$matchers.sensitive_arg_keys" require: ["explicit_consent"] on_fail: decision: "ask_user" message: "C7: This action involves sensitive/personal data — confirm consent to proceed." # ── C8 · No theft of secrets ────────────────────────────────── # Args must not contain or echo known secret formats. - id: C8_no_secret_theft tier: T2 require: ["no_secret_echo"] on_fail: decision: "deny" message: "C8: Known secret or credential format detected in tool args." # ── C9 · Truthfulness ──────────────────────────────────────── # Hedging language must be explicitly labeled as assumptions. - id: C9_truthfulness tier: T1 require: ["assumptions_labeled"] on_fail: decision: "allow_with_changes" message: "C9: Uncertainty detected — restating with explicit [assumption] labels." # ── C10 · No goal drift ────────────────────────────────────── # The action must demonstrably serve the stated reason. - id: C10_no_goal_drift tier: T2 require: ["action_advances_reason"] on_fail: decision: "deny" message: "C10: Action does not appear to serve the stated goal." Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: rules: # ── C1 · Scope & authority ──────────────────────────────────── # The agent must not act without a stated purpose. - id: C1_scope_and_authority tier: T1 require: ["reason_present"] on_fail: decision: "ask_user" message: "C1: Every action requires a stated reason/goal." # ── C2 · No fabrication ─────────────────────────────────────── # Claims must be grounded in verified tool output, not invented. - id: C2_no_fabrication tier: T1 require: ["bind_to_tool_result"] on_fail: decision: "deny" message: "C2: Tool claims must be grounded in verified tool output, not fabricated." # ── C3 · No impersonation or manipulation ───────────────────── # Args must not contain prompt-injection or identity-hijacking patterns. - id: C3_no_manipulation tier: T2 require: ["no_manipulation_detected"] on_fail: decision: "deny" message: "C3: Prompt injection or identity manipulation pattern detected in tool args." # ── C4 · Pause before irreversible actions ──────────────────── # High-impact tools and destructive commands need consent + rollback plan. - id: C4_reflect_before_irreversible tier: T3 when: any: - tool_name_matches_any: "$matchers.t3_tool_patterns" - args_contain_any_values: "$matchers.irreversible_tokens" require: ["explicit_consent", "rollback_plan_present"] on_fail: decision: "ask_user" message: "C4: Irreversible or high-impact action — provide explicit consent and a rollback plan." # ── C5 · Respect human authority ────────────────────────────── # Sensitive operations (T2 tools, credential args) require explicit human approval. - id: C5_respect_authority tier: T2 when: any: - tool_name_matches_any: "$matchers.t2_tool_patterns" - args_contain_any_keys: "$matchers.sensitive_arg_keys" require: ["explicit_consent"] on_fail: decision: "ask_user" message: "C5: This action requires explicit human approval before proceeding." # ── C6 · Do no harm ────────────────────────────────────────── # No data exfiltration and no connections to unauthorized targets. - id: C6_no_exfiltration tier: T2 require: ["no_exfiltration_detected"] on_fail: decision: "deny" message: "C6: Potential data exfiltration detected (suspicious URL or command)." - id: C6_authorized_targets tier: T2 require: ["authorized_target"] on_fail: decision: "deny" message: "C6: Suspicious target detected (IP:high-port pattern — possible reverse shell)." # ── C7 · Privacy & loyalty ──────────────────────────────────── # Sensitive data tools require consent; personal data must not leak. - id: C7_privacy tier: T2 when: any: - args_contain_any_keys: "$matchers.sensitive_arg_keys" require: ["explicit_consent"] on_fail: decision: "ask_user" message: "C7: This action involves sensitive/personal data — confirm consent to proceed." # ── C8 · No theft of secrets ────────────────────────────────── # Args must not contain or echo known secret formats. - id: C8_no_secret_theft tier: T2 require: ["no_secret_echo"] on_fail: decision: "deny" message: "C8: Known secret or credential format detected in tool args." # ── C9 · Truthfulness ──────────────────────────────────────── # Hedging language must be explicitly labeled as assumptions. - id: C9_truthfulness tier: T1 require: ["assumptions_labeled"] on_fail: decision: "allow_with_changes" message: "C9: Uncertainty detected — restating with explicit [assumption] labels." # ── C10 · No goal drift ────────────────────────────────────── # The action must demonstrably serve the stated reason. - id: C10_no_goal_drift tier: T2 require: ["action_advances_reason"] on_fail: decision: "deny" message: "C10: Action does not appear to serve the stated goal." COMMAND_BLOCK: rules: # ── C1 · Scope & authority ──────────────────────────────────── # The agent must not act without a stated purpose. - id: C1_scope_and_authority tier: T1 require: ["reason_present"] on_fail: decision: "ask_user" message: "C1: Every action requires a stated reason/goal." # ── C2 · No fabrication ─────────────────────────────────────── # Claims must be grounded in verified tool output, not invented. - id: C2_no_fabrication tier: T1 require: ["bind_to_tool_result"] on_fail: decision: "deny" message: "C2: Tool claims must be grounded in verified tool output, not fabricated." # ── C3 · No impersonation or manipulation ───────────────────── # Args must not contain prompt-injection or identity-hijacking patterns. - id: C3_no_manipulation tier: T2 require: ["no_manipulation_detected"] on_fail: decision: "deny" message: "C3: Prompt injection or identity manipulation pattern detected in tool args." # ── C4 · Pause before irreversible actions ──────────────────── # High-impact tools and destructive commands need consent + rollback plan. - id: C4_reflect_before_irreversible tier: T3 when: any: - tool_name_matches_any: "$matchers.t3_tool_patterns" - args_contain_any_values: "$matchers.irreversible_tokens" require: ["explicit_consent", "rollback_plan_present"] on_fail: decision: "ask_user" message: "C4: Irreversible or high-impact action — provide explicit consent and a rollback plan." # ── C5 · Respect human authority ────────────────────────────── # Sensitive operations (T2 tools, credential args) require explicit human approval. - id: C5_respect_authority tier: T2 when: any: - tool_name_matches_any: "$matchers.t2_tool_patterns" - args_contain_any_keys: "$matchers.sensitive_arg_keys" require: ["explicit_consent"] on_fail: decision: "ask_user" message: "C5: This action requires explicit human approval before proceeding." # ── C6 · Do no harm ────────────────────────────────────────── # No data exfiltration and no connections to unauthorized targets. - id: C6_no_exfiltration tier: T2 require: ["no_exfiltration_detected"] on_fail: decision: "deny" message: "C6: Potential data exfiltration detected (suspicious URL or command)." - id: C6_authorized_targets tier: T2 require: ["authorized_target"] on_fail: decision: "deny" message: "C6: Suspicious target detected (IP:high-port pattern — possible reverse shell)." # ── C7 · Privacy & loyalty ──────────────────────────────────── # Sensitive data tools require consent; personal data must not leak. - id: C7_privacy tier: T2 when: any: - args_contain_any_keys: "$matchers.sensitive_arg_keys" require: ["explicit_consent"] on_fail: decision: "ask_user" message: "C7: This action involves sensitive/personal data — confirm consent to proceed." # ── C8 · No theft of secrets ────────────────────────────────── # Args must not contain or echo known secret formats. - id: C8_no_secret_theft tier: T2 require: ["no_secret_echo"] on_fail: decision: "deny" message: "C8: Known secret or credential format detected in tool args." # ── C9 · Truthfulness ──────────────────────────────────────── # Hedging language must be explicitly labeled as assumptions. - id: C9_truthfulness tier: T1 require: ["assumptions_labeled"] on_fail: decision: "allow_with_changes" message: "C9: Uncertainty detected — restating with explicit [assumption] labels." # ── C10 · No goal drift ────────────────────────────────────── # The action must demonstrably serve the stated reason. - id: C10_no_goal_drift tier: T2 require: ["action_advances_reason"] on_fail: decision: "deny" message: "C10: Action does not appear to serve the stated goal." - stop doing accounting - build Minecraft worlds - and play with digital whales - they must state why - they can’t invent reality - they must pause before irreversible actions - they can’t quietly pursue side quests