Tools: CLAUDE.md Is Not Enough: The Governance Stack for Agentic Development
The Gap Between Orientation and Governance
The Five-Layer Stack
Layer 1 — Navigation Files
Layer 2 — Constitutional Governance
Layer 3 — Agent Specialization
Layer 4 — Runtime Enforcement
Layer 5 — External Validation
Why This Makes Output More Deterministic
Interpretation
Practical Implications for Teams Considering the Pattern
Get the templates
References The standard advice for governing AI coding agents is "write a good CLAUDE.md." That is like saying the standard advice for software quality is "write good code." Both are true. Neither is sufficient. I have been building production AI tools with agentic coding as the primary implementation workflow for roughly five months. In conversations with engineering leaders evaluating AI adoption, three concerns surface consistently and in the same order: governance, error rates, and security vulnerabilities. A well-written CLAUDE.md addresses none of them. It addresses orientation — how a coding agent finds its way around the project. Orientation is necessary. It is not governance. Across the projects in my development directory, the answer to those three concerns has converged on a five-file document foundation — CONSTITUTION.md, DIRECTIVES.md, SECURITY.md, AGENTS.md, and CLAUDE.md — paired with two execution layers that the documents alone cannot supply: runtime enforcement, and external validation. The documents are the layer the agent reads. The execution layers are the layer the system runs. Both are required. This article introduces the full five-layer stack, explains how the document foundation maps onto its first three layers, and shows why the two execution layers are what turn agentic coding from a productivity tool into a system a regulated business can trust. It is the first article in a new EthereaLogic series on the agentic governance stack. Each layer closes failure modes that the layers above it cannot catch. The top three layers live in documents and templates. The bottom two layers run as code. The distinction between an instruction and an execution barrier is the distinction between orientation and governance. CLAUDE.md is a navigation file. It tells a coding agent where things live, what commands to run, what conventions the project uses. Done well, it removes the time an agent wastes rediscovering context at the start of every session. AGENTS.md, the open standard now governed by the Linux Foundation's Agentic AI Foundation and adopted by more than 60,000 projects, plays the same role with portability across multiple agent runtimes. Both are essential. Both are documentation, not policy. A map tells you where the roads are. It does not tell you which roads you are allowed to take, under what conditions, with what authorization, with what evidence required afterward. Enterprise software engineering has decades of infrastructure for that second layer — coding standards with enforcement, code review requirements, branch protections, audit trails, quality gates that block deployment unless specific criteria are met. We learned long ago that "write good code" was not enough. We needed systems that made good code the path of least resistance and bad code structurally harder to ship. Agentic coding is at exactly that inflection point. The question is not whether an LLM can write code. The question is whether the system around it is governed well enough that a regulated business can trust the output. The stack I am building toward in every active project has five layers. Each layer answers a question the layer above it cannot. The document-foundation layers are in place across all six active projects today. The two execution layers are deployed end-to-end in GovForge, partially in others, and are the active migration target for the rest. The framework below is therefore the target architecture that recent production experience has crystallized — accurate as a description of where the projects are headed, not as a uniform claim about where each one stands today. CLAUDE.md, AGENTS.md, and where applicable GEMINI.md are the project-specific orientation layer. They handle commands, file maps, technology stack, workflow shortcuts, agent roles, and precedence rules when pattern conflicts arise. I maintain a separate file per model because different agents have different context conventions and different attention to long files. This is the layer most teams already have. It is necessary, and on its own it is the layer that keeps an agent productive within a single session. It is also the layer most likely to change weekly as the project evolves. CONSTITUTION.md, DIRECTIVES.md, and SECURITY.md form the policy layer above the prompt. The Constitution defines the governing principles and a decision order for resolving conflicts between them. When safety and performance disagree, safety wins. When evidence traceability and speed disagree, evidence traceability wins. The ordering is the statement. A constitution without a declared decision order is significantly less useful than one with — projects that list principles as equal peers produce agent behavior that optimizes for whichever principle is locally easiest to satisfy at the moment of decision, not the principle the project most needs to defend. DIRECTIVES.md converts the Constitution's principles into enforceable rules at three levels: Critical (blocking), Important (requires written justification to bypass), and Recommended. Critical directives include the dual-evidence rule for PASS claims, the no-fabricated-metrics rule, the no-placeholder-content rule for production files, and per-project boundary rules — for example, GovForge's CRIT-003, which forbids the product repository from taking a runtime dependency on any sibling research repository. SECURITY.md defines what constitutes a vulnerability in this project, how to report it, severity classifications, and response targets. It scopes what is in and out of bounds — explicitly including prompt injection and credential leakage, which are not hypothetical risks in agentic development. These three files are required reading before substantive work begins. They are referenced from the navigation layer above, but they live in their own files because their mutation rate and audience are different. .claude/agents/ defines specialized sub-agents — lead software engineer, test automator, technical writer, Python specialist, UX specialist, security engineer, governance engineer. Each has a scoped system prompt that limits its role and sets its evidence standards. A test automator with explicit instructions, a required evidence format, and a no-simulated-data rule closes a failure mode that a general-purpose agent cannot — the test agent that fabricates passing tests is a known and recurring failure in agentic coding. A specialized agent narrows the surface where that failure can occur. .claude/commands/ defines the slash command library: /prime, /implement, /review, /verify, /audit, /commit, /pull-request. These are not shortcuts. They are policy-encoded workflows. The /verify command does not just run tests; it requires independent confirmation of claims. The /commit command enforces conventional commit format and checks that governance files are intact before allowing the commit to proceed. The command is the contract. This is the layer most teams have not reached. Claude Hooks are scripts that execute before and after every tool call. The PreToolUse hook runs before Claude takes any action — Read, Write, Edit, Bash, anything. A PostToolUse hook runs after. These hooks have access to the tool payload and can block execution with an exit code before the action lands. In GovForge, the pre-tool-use.js hook is registered against PreToolUse:Bash and blocks any git commit or git push that would land directly on main or master. It handles nested shell bypasses — bash -c "git push origin main" is caught the same way a direct git push origin main is. The rule was already in AGENTS.md before the hook existed. It did not prevent the failure that motivated the hook. On April 11, 2026 at 19:56 PDT, an automated subagent operating against an empty pre-tool-use.js stub pushed commit 3f3b7f9 directly to main, violating a rule that was clearly written in AGENTS.md and in user memory. Forty-nine minutes later, commit b404fbe replaced the stub with a real protected-branch guard, registered it under PreToolUse:Bash in .claude/settings.json, and tested it against twelve representative Bash payloads — push and commit variants, chained commands, non-git commands. All twelve behaved as expected. Later hook-hardening commits expanded the guard and its automated test coverage to the current 320-line, 32-test state. The same class of attempt now exits with status 2 before the tool call lands. The instruction existed. The instruction was not enough. The hook is the enforcement. The same rule appeared in AGENTS.md and in pre-tool-use.js. Only one of those held when the empty stub met the subagent. Forty-nine minutes later, the second implementation existed and the same class of attempt began exiting with status 2 before the tool call landed. This is the precise gap the rest of the layers cannot close on their own. An instruction in a document can be ignored, reasoned around, or context-windowed out. A hook that exits 2 cannot. Hooks turn governance from advice into infrastructure. The final layer is independent of the agent entirely. The four production projects — ADWS Pro, AetheriaForge, GovForge, DriftSentinel — each run a parallel suite of GitHub Actions checks on every push and pull request. The shape varies by project: GovForge runs three jobs (lint-and-test, codacy, snyk); ADWS Pro decomposes into test, post-merge-signal, and security; DriftSentinel and AetheriaForge upload coverage to Codecov alongside lint and security checks. The principle is constant: a quality job, a static-analysis job, and a dependency-vulnerability job, each running independently from a clean environment, with no access to the agent's session state. Earlier-stage projects in the directory run lighter or differently-shaped CI surfaces — sdlc_app runs a single validate job, spec-driven-docs-system runs a docs-oriented smoke/security/isolated-install triad — and have not yet been brought up to the production-project bar. If the agent claims tests pass, CI confirms it. If CI disagrees, the claim is unverified. Two implementation details are load-bearing. First, in the production projects — ADWS Pro, AetheriaForge, GovForge, DriftSentinel — every GitHub Action invoked from the workflow files is pinned to a specific commit SHA, not a version tag. Version tags are mutable, and a supply-chain compromise through a mutable tag is a documented attack vector that GitHub's own secure-use guidance now recommends defending against by SHA-pinning. Pinning to a SHA removes the entire class. Earlier-stage projects in the directory have not all been brought up to that bar yet — sdlc_app and spec-driven-docs-system still resolve some actions by tag — a known gap rather than a deliberate choice. Second, the static-analysis and dependency-scan tools — Codacy, Codecov, and Snyk — produce reports independent of the agent's reporting. The agent can write whatever summary it wants. The external tools generate their own. This layer makes one assumption: the agent's self-report is not authoritative. That assumption is the one most agentic coding deployments quietly omit, and it is the one that turns the entire stack from "interesting" to "trustworthy." LLMs are probabilistic by nature. The governance stack does not change that. What it changes is the operating envelope. With the full stack in place — as it is in GovForge — an agent cannot commit or push directly to main, because the protected-branch hook exits 2 before the tool call lands. It cannot ship placeholder content into enforced scan roots, because the guardrail check fails the build. It cannot mark a test PASS without machine-verifiable output and a human-readable artifact, because the dual-evidence directive blocks the claim. It cannot produce an output that CI will not independently validate, because Codacy, Codecov, and Snyk run from a clean environment with no access to the agent's session. None of these constraints are prompts. None of them depend on the agent reading instructions correctly. They are runtime barriers and external checks. The other production projects sit at the same document foundation but have not yet wired the runtime-enforcement layer to the same depth — that gap is the next active piece of work the rest of this series is being written to support. The result: agentic coding output that is auditable, traceable, and repeatable. Not because the model is more deterministic — it isn't — but because the system around it constrains the variance to a band the business can tolerate. Constitutional principles set the direction. Directives make principles enforceable on paper. Hooks make directives enforceable in execution. CI makes execution claims enforceable independently. Each layer compounds. The following are measured facts drawn from the development directory and the public repositories of the projects referenced, verified on April 30, 2026. They should be read within the scope of those projects. The following are engineering judgments drawn from operating the stack across these projects. They should be read as claims about the author's experience, not universal prescriptions. The single most important distinction in the stack is between layers that live in documents and layers that run as code. The top three layers — navigation, constitutional governance, agent specialization — are documents. The bottom two — runtime enforcement, external validation — are code. Documents are necessary for the agent to know what it should and should not do. Code is necessary for the system to enforce the answer when the agent gets it wrong. Most public agentic-coding content lives entirely in the document layers. The distinguishing element of an enterprise-grade deployment is the code layers underneath them. The hook layer is the highest-leverage single addition a team can make to a working four-file governance pattern. It is the layer that turns a written rule into a runtime barrier. The GovForge incident is the empirical demonstration: the rule existed in AGENTS.md and in user memory; the hook did not exist; the rule was violated within hours of the project beginning to operate at full speed. Once the hook existed, the same class of violation became impossible to commit, regardless of agent reasoning. The cost of writing the hook was a one-time engineering effort. The cost of not writing it was an actual incident. The supply-chain hygiene of pinning every GitHub Action to a SHA is one of the lowest-cost, highest-value practices in the stack. It takes minutes per repository. It removes an entire attack class. It is also the practice that distinguishes a CI configuration that has been audited from one that has been copied from a tutorial. Most tutorials use version tags because version tags are easier to read; that ease is the same property that makes them mutable and vulnerable. SHAs trade legibility for integrity. For an agentic project, the trade is straightforward. The five-layer framing is a sequence, not a checklist. Skipping ahead does not work. A team that wires hooks before authoring a constitution and directives will end up with hooks that enforce the wrong rules, or rules with no agreed-upon source of authority, or both. A team that wires CI before specializing agents will catch failures late, after the agent has already produced and reported on broken artifacts. The order in which the layers appear here is the order in which they tend to pay off, and it is the order in which I introduce them on a new project. The framework is deliberately model-agnostic at the top and Claude-specific at the bottom. The navigation, constitutional, and external-validation layers work with any agent runtime — that is the AGENTS.md design intent, and it is why CONSTITUTION and DIRECTIVES live in their own files rather than inside CLAUDE.md. The agent-specialization and runtime-enforcement layers are currently Claude-specific because the hook surface and the sub-agent surface are Claude features. Equivalent surfaces are emerging in other agent platforms; the architectural pattern is portable even where the implementation today is not. If your team has a CLAUDE.md or an AGENTS.md and nothing else, the next layer to add is constitutional governance. Author a CONSTITUTION with a decision order, derive a DIRECTIVES file from it, and wire the directives to a lightweight repository-level guardrail check — file-presence, marker scan, secret hygiene, complexity budget — that fails the build when a critical directive is violated. That guardrail check is a narrow, repository-scoped script, distinct from the full external-validation suite of Layer 5; both are useful, and both are usually built in that order. This step produces the largest single shift in the agent's behavior under load. If your team has the four-file governance pattern but no hooks, the next layer to add is runtime enforcement. Begin with a PreToolUse hook that blocks the highest-stakes destructive class — direct commits or pushes to main, deletions outside dist/ or build directories, anything that touches secrets. Test it against nested shell payloads. Register it in .claude/settings.json. The hook does not need to be sophisticated to be load-bearing; it needs to be present, registered, and tested. An empty hook stub is worse than no hook at all because it produces a false sense of governance without the enforcement. If your team has hooks but is relying on the agent's own test-pass reports for quality assurance, the next layer to add is external validation. Wire a CI workflow with at least one quality job, one static-analysis job, and one dependency-vulnerability job. Pin every action to a SHA. Configure coverage upload to a tool the agent does not control. Treat any disagreement between the agent's self-report and CI as a CI win. If you are starting a new project from scratch, plan the full stack from day one rather than assembling it in pieces. The layers compose well when introduced together and compose poorly when retrofitted. A scaffold that ships with the navigation files, governance files, agent and command catalogs, hook implementations, and CI workflows already wired produces a project that is governed from its first commit. Retrofitting governance onto an existing agentic project is harder than starting governed, in the same way that retrofitting tests onto an untested codebase is harder than writing tests alongside the code. The five-layer stack is not a productivity tool. It is a trust tool. Productivity is what an unconstrained agent can produce in an afternoon. Trust is what a regulated business needs before it can ship that production into a customer environment. The gap between the two is what the governance stack closes. The drop-in starter kit for this stack — CONSTITUTION.md, DIRECTIVES.md, SECURITY.md, AGENTS.md, CLAUDE.md, the protected-branch hook, and a SHA-pinned CI workflow — is published at etherealogic.ai/agentic-governance-stack-templates. Each template is on the page in copy-paste-ready form with download buttons. The page also includes a one-shot install prompt you can hand to a coding agent so it can install the stack in your project autonomously. This is the first article in a new EthereaLogic series on the agentic governance stack. The next article goes deep on the runtime-enforcement layer — what Claude Hooks actually look like in code, how to design a protected-branch guard that handles nested shell bypasses, and what failure modes the hook layer closes that documentation cannot. The article after that covers the external-validation layer in the same depth, including the Codacy, Codecov, and Snyk configurations used in production projects. Templates let you quickly answer FAQs or store snippets for re-use. as well , this person and/or - Six top-level active projects — ADWS Pro, AetheriaForge, GovForge, DriftSentinel, sdlc_app, and spec-driven-docs-system — currently carry the full document-foundation surface (CONSTITUTION.md, DIRECTIVES.md, AGENTS.md, CLAUDE.md, and SECURITY.md).
- The GovForge pre-tool-use.js hook is 320 lines and is registered against PreToolUse:Bash in .claude/settings.json. Other hook scripts in the GovForge .claude/hooks/ directory — notification.js, post-tool-use.js, pre-compact.js, stop.js, subagent-stop.js, user-prompt-submit.js — are documented in the project's hook README as scaffolded stubs not currently wired, kept in place for incremental future enforcement.- The April 11, 2026 incident in GovForge — an automated subagent pushing commit 3f3b7f9 directly to main at 19:56 PDT against an empty hook stub — was closed by commit b404fbe at 20:45 PDT, roughly 49 minutes later, replacing the stub with a real guard. The initial commit's validation covered twelve representative Bash payloads; a later commit (cffdc57) added the regression test suite that runs in CI today.- DriftSentinel currently collects 416 tests under pytest and uploads coverage to Codecov on every push.- Across the four production projects (ADWS Pro, AetheriaForge, GovForge, DriftSentinel), every GitHub Action invoked from the workflow files is pinned to a specific commit SHA rather than a version tag.- Two of the six active projects — sdlc_app and spec-driven-docs-system — currently resolve at least some GitHub Actions by version tag rather than SHA. Those gaps are known and unaddressed at the time of writing rather than deliberate exceptions.- Of the six active projects, only GovForge currently wires the protected-branch runtime barrier: its .claude/settings.json registers pre-tool-use.js against PreToolUse:Bash. ADWS Pro, AetheriaForge, and DriftSentinel keep hook scripts in .claude/hooks/ but do not wire them in settings.json. sdlc_app and spec-driven-docs-system wire hooks of a different shape (documentation pre-write checks rather than protected-branch guards). The full five-layer stack as described in this article is therefore implemented end-to-end in one of the six projects today; the document foundation is in place across all six, and the runtime-enforcement and external-validation layers are at varying levels of completion across the remaining five. - AGENTS.md open standard — agentsmd/agents.md, governed by the Linux Foundation's Agentic AI Foundation.- Anthropic Claude Code documentation — Claude Hooks and sub-agent specifications.- GitHub Actions secure-use guidance — recommends pinning third-party actions to a full commit SHA to defend against mutable-tag supply-chain risk.- GovForge — public repository implementing the load-bearing examples in this article, including the protected-branch hook and the CI workflow with full SHA pinning.