Tools: Structured Output for AI Coding Agents: Why I Built Pare

Tools: Structured Output for AI Coding Agents: Why I Built Pare

Source: Dev.to

What if the tools just spoke the agent's language? ## The token savings are real ## Some things that weren't obvious ## The architecture that makes 25 servers maintainable ## Security when agents are the users ## "But loading all those tools costs tokens too" ## All coding agents are welcomed ## Telling the agent to prefer Pare ## What's next If you've spent any time watching Claude Code, Cursor, or Copilot work through a coding task, you've seen it: the agent runs git log and gets back 200 lines of formatted terminal output. It runs npm outdated and parses an ASCII table. It runs docker ps and tries to extract container IDs from column-aligned text that was designed for a human glancing at a terminal. Most of the time it works. Sometimes it doesn't — the agent misreads a column boundary, hallucinates a field that wasn't there, or burns through context window space on ANSI color codes and decorative characters that carry zero information. And every time, it's spending your tokens on text it has to re-parse back into the structured data the CLI tool had internally before it formatted it for human eyes. I started tracking this in my own workflows. In a typical 30-minute coding session, an agent might make 40-60 tool calls. Each one returns raw terminal text that the model has to interpret. The token overhead from progress bars, ANSI escape sequences, column padding, and repeated headers was consistently 3-10x more than the actual data the agent needed. On multi-file refactors with heavy test/build cycles, I was watching context windows fill up with formatting noise while the agent lost track of the actual code changes it was supposed to be reasoning about. The frustrating part: the data was already structured inside the tool. git knows the commit hash, author, and file list as distinct fields. eslint has a JSON formatter built in. cargo test tracks pass/fail per test case internally. But the default output mode — the one every agent uses — throws all that structure away and paints a picture for a human reading a terminal. That was the premise behind Pare: build MCP servers that wrap real CLI tools and return typed, schema-validated JSON. Not approximations — real parsers that handle the full output surface of each tool, including edge cases, errors, and platform differences. The scope grew fast. What started as a few servers for git, npm, and Docker turned into 25 MCP servers covering 222 tools across the developer CLI landscape: Each one uses the Model Context Protocol (MCP) — the standard for AI-tool communication supported by Claude, Cursor, Windsurf, VS Code, Zed, Gemini CLI, OpenAI Codex, and others. Every tool call returns both structured JSON with a Zod-validated schema and human-readable text for chat display. Here's a before/after with git log --stat. This is what the agent sees today: That's ~95 tokens. The agent has to extract the hash, author, date, file list, and diff stats by pattern-matching against whitespace-aligned text. Usually it works. Sometimes it miscounts the + characters or misreads the file path boundaries. Here's the same commit through Pare: ~55 tokens. Every field is typed and directly addressable — no regex, no guessing. And the gap widens fast: run git log --stat on 10 commits and you're looking at ~950 tokens of terminal formatting versus ~310 tokens of structured JSON. I ran extensive benchmarks on the various tool outputs against their raw CLI equivalent: In practice, Pare also has an automatic compact mode: when the structured JSON would exceed the raw CLI token count (which can happen with very terse commands), it automatically applies a compact projection — stripping verbose fields while keeping everything the agent needs to make decisions. This means Pare always uses fewer tokens than raw output, guaranteed. From the outside, "wrap a CLI and return JSON" sounds like a weekend project. In practice, a few categories of problems kept coming up. Every CLI is its own parsing problem. git log output varies by platform — Windows cmd.exe misinterprets angle brackets in format strings. docker ps column widths change based on content. cargo test interleaves compiler output with test results. ansible-playbook has a PLAY RECAP section with a completely different structure from the rest of its output. Each tool needed its own parser, and each parser had its own edge cases to discover. Cross-platform differences add up. Pare runs CI on Linux, macOS, and Windows with Node.js 20 and 22. Path separators, line endings, shell quoting, and process group handling all differ in ways that don't surface until CI catches them. One example: async taskkill on Windows was leaving orphan processes after timeouts, which required switching to synchronous execFileSync for the kill logic. Schema design is an iterative process. The first version of the output schemas included everything each CLI returned. Over time it became clear that agents don't benefit from — and are sometimes confused by — fields like resolvedUrl in npm packages or endLine/endColumn in lint diagnostics. Each round of pruning was guided by watching what agents actually used versus what they read and ignored. It ended up being closer to API design than CLI wrapping. The testing regime is substantial. 222 tools need more than just unit tests: This isn't test theater. The fidelity and smoke layers exist because I kept finding bugs that unit tests missed: parsers that worked on the fixture but broke on real output, schema changes that compiled fine but broke the MCP response format, compact mode projections that accidentally dropped error information. Scaling to 25 without the codebase becoming a mess required deliberate architectural investment: Shared foundations. A common library (@paretools/shared) provides the dual-output system, command execution with execFile (no shell injection surface), input validation, error categorization, and a createServer() factory that eliminates boilerplate. When I add a new server, the entry point is 6 lines of code. Structured error recovery. Every Pare tool classifies failures into categories an agent can match on programmatically — command-not-found, permission-denied, timeout, network-error, authentication-error, conflict, and others. Instead of parsing "Error: EACCES: permission denied" from stderr, the agent gets { "category": "permission-denied", "command": "git", "exitCode": 128 } and can decide what to do next without guessing. Centralized input schemas. Common parameters like path, compact, fix, config, and filePatterns are defined once in the shared library and reused across all 222 tools. This ensures consistent behavior and makes it impossible for one server to accidentally define path differently from another. Automatic compact mode. Every tool measures whether structured output saves tokens compared to raw CLI output. If structured is more expensive (rare, but possible for very terse commands), it automatically switches to a compact projection. The agent can override this with compact: false if it needs full details. When an AI agent constructs CLI commands from natural language, the attack surface changes. A prompt injection that tricks the agent into passing --output=/etc/passwd as a "filename" is a real threat. Every Pare tool defends against this: This is one of the frequent pushback I hear when discussing pare with other devs, and it's fair. Every MCP server registers tool definitions upfront — that's context the model carries for the whole session. But the math works out differently than you'd expect. First, there is rarely a need to load all 222 tools. Pare has 25 servers with ~9 tools each. It provides flexibility in installing and using only what you need — if you just want git and test, that's 27 tool definitions. You can filter even further with environment variables: Second, the savings compound on the output side. Each tool call returns structured JSON that's typically 30-90% leaner than raw CLI output. In a benchmark across a real coding session, the aggregate reduction was 72% — and that's counting the upfront tool registration cost. After two or three tool calls, you're ahead on net context usage. By the end of a session with 40-60 tool calls, the savings are substantial. Third, and this is the part people miss: even when the token count is similar, structured output is higher quality context. An agent reading { "success": false, "category": "permission-denied", "exitCode": 128 } doesn't need to pattern-match against stderr text. It reasons about typed fields directly. That means fewer wasted inference cycles, fewer misinterpretations, and less backtracking — which saves tokens downstream in ways that don't show up in a simple input/output comparison. Pare works with any MCP-compatible client. Here's the setup for some of the popular ones: Claude Code (one command per server): Claude Desktop / Cursor / Windsurf / Cline / Gemini CLI (JSON config): VS Code / GitHub Copilot (.vscode/mcp.json): OpenAI Codex (.codex/config.toml): That's it. The agent immediately gets access to structured tool output. No configuration, no API keys, no runtime dependencies beyond the CLI tools themselves. Once the servers are configured, add a one-liner to your project's agent instruction file so the agent reaches for Pare tools instead of raw CLI commands: CLAUDE.md (Claude Code): AGENTS.md (OpenAI Codex, Gemini CLI): .cursor/rules/pare.mdc (Cursor): With this in place, the agent will automatically use mcp__pare-git__status instead of running git status through Bash — and get typed JSON back instead of terminal text. The MCP ecosystem is young. The patterns established now — how tools structure their output, what schemas look like, how errors are categorized — will shape how AI agents interact with developer infrastructure for years. I've open-sourced Pare under the MIT license because this should be shared infrastructure, not a proprietary advantage. The codebase is designed to make contributing straightforward: each server is self-contained, follows the same architecture, and has the same test patterns. If there's a CLI tool you wish your agent handled better, the pattern for adding it is well-established. GitHub: github.com/Dave-London/Pare npm: All 25 packages at npmjs.com/org/paretools Built by Dave London. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: commit a1b2c3d4e5f67890abcdef1234567890abcdef12 Author: Jane Developer <[email protected]> Date: Mon Feb 10 14:32:01 2026 +0200 Add user authentication middleware src/auth/middleware.ts | 45 +++++++++++++++++++++++++++++++++++++++++++++ src/routes/api.ts | 2 +- 2 files changed, 46 insertions(+), 1 deletion(-) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: commit a1b2c3d4e5f67890abcdef1234567890abcdef12 Author: Jane Developer <[email protected]> Date: Mon Feb 10 14:32:01 2026 +0200 Add user authentication middleware src/auth/middleware.ts | 45 +++++++++++++++++++++++++++++++++++++++++++++ src/routes/api.ts | 2 +- 2 files changed, 46 insertions(+), 1 deletion(-) CODE_BLOCK: commit a1b2c3d4e5f67890abcdef1234567890abcdef12 Author: Jane Developer <[email protected]> Date: Mon Feb 10 14:32:01 2026 +0200 Add user authentication middleware src/auth/middleware.ts | 45 +++++++++++++++++++++++++++++++++++++++++++++ src/routes/api.ts | 2 +- 2 files changed, 46 insertions(+), 1 deletion(-) CODE_BLOCK: { "commits": [ { "hash": "a1b2c3d4e5f6", "hashShort": "a1b2c3d", "message": "Add user authentication middleware", "author": "Jane Developer", "date": "2026-02-10T14:32:01+02:00", "files": ["src/auth/middleware.ts", "src/routes/api.ts"], "insertions": 46, "deletions": 1 } ], "total": 1 } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "commits": [ { "hash": "a1b2c3d4e5f6", "hashShort": "a1b2c3d", "message": "Add user authentication middleware", "author": "Jane Developer", "date": "2026-02-10T14:32:01+02:00", "files": ["src/auth/middleware.ts", "src/routes/api.ts"], "insertions": 46, "deletions": 1 } ], "total": 1 } CODE_BLOCK: { "commits": [ { "hash": "a1b2c3d4e5f6", "hashShort": "a1b2c3d", "message": "Add user authentication middleware", "author": "Jane Developer", "date": "2026-02-10T14:32:01+02:00", "files": ["src/auth/middleware.ts", "src/routes/api.ts"], "insertions": 46, "deletions": 1 } ], "total": 1 } COMMAND_BLOCK: # Only register status and log in the git server PARE_GIT_TOOLS=status,log npx @paretools/git Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Only register status and log in the git server PARE_GIT_TOOLS=status,log npx @paretools/git COMMAND_BLOCK: # Only register status and log in the git server PARE_GIT_TOOLS=status,log npx @paretools/git CODE_BLOCK: claude mcp add --transport stdio pare-git -- npx -y @paretools/git claude mcp add --transport stdio pare-test -- npx -y @paretools/test Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: claude mcp add --transport stdio pare-git -- npx -y @paretools/git claude mcp add --transport stdio pare-test -- npx -y @paretools/test CODE_BLOCK: claude mcp add --transport stdio pare-git -- npx -y @paretools/git claude mcp add --transport stdio pare-test -- npx -y @paretools/test CODE_BLOCK: { "mcpServers": { "pare-git": { "command": "npx", "args": ["-y", "@paretools/git"] }, "pare-test": { "command": "npx", "args": ["-y", "@paretools/test"] } } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "mcpServers": { "pare-git": { "command": "npx", "args": ["-y", "@paretools/git"] }, "pare-test": { "command": "npx", "args": ["-y", "@paretools/test"] } } } CODE_BLOCK: { "mcpServers": { "pare-git": { "command": "npx", "args": ["-y", "@paretools/git"] }, "pare-test": { "command": "npx", "args": ["-y", "@paretools/test"] } } } CODE_BLOCK: { "servers": { "pare-git": { "type": "stdio", "command": "npx", "args": ["-y", "@paretools/git"] } } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "servers": { "pare-git": { "type": "stdio", "command": "npx", "args": ["-y", "@paretools/git"] } } } CODE_BLOCK: { "servers": { "pare-git": { "type": "stdio", "command": "npx", "args": ["-y", "@paretools/git"] } } } CODE_BLOCK: [mcp_servers.pare-git] command = "npx" args = ["-y", "@paretools/git"] Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: [mcp_servers.pare-git] command = "npx" args = ["-y", "@paretools/git"] CODE_BLOCK: [mcp_servers.pare-git] command = "npx" args = ["-y", "@paretools/git"] COMMAND_BLOCK: ## MCP Tools When Pare MCP tools are available (prefixed with mcp\_\_pare-\*), prefer them over running raw CLI commands via Bash. Pare tools return structured JSON with ~85% fewer tokens than CLI output. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: ## MCP Tools When Pare MCP tools are available (prefixed with mcp\_\_pare-\*), prefer them over running raw CLI commands via Bash. Pare tools return structured JSON with ~85% fewer tokens than CLI output. COMMAND_BLOCK: ## MCP Tools When Pare MCP tools are available (prefixed with mcp\_\_pare-\*), prefer them over running raw CLI commands via Bash. Pare tools return structured JSON with ~85% fewer tokens than CLI output. COMMAND_BLOCK: ## MCP Servers This project uses Pare MCP servers for structured, token-efficient dev tool output. Prefer Pare MCP tools over raw CLI commands for git, testing, building, linting, npm, docker, python, cargo, and go. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: ## MCP Servers This project uses Pare MCP servers for structured, token-efficient dev tool output. Prefer Pare MCP tools over raw CLI commands for git, testing, building, linting, npm, docker, python, cargo, and go. COMMAND_BLOCK: ## MCP Servers This project uses Pare MCP servers for structured, token-efficient dev tool output. Prefer Pare MCP tools over raw CLI commands for git, testing, building, linting, npm, docker, python, cargo, and go. CODE_BLOCK: --- description: Use Pare MCP tools for structured dev tool output globs: ["**/*"] alwaysApply: true --- When Pare MCP tools are available, prefer them over running CLI commands in the terminal. Pare tools return structured JSON with up to 95% fewer tokens than raw CLI output. Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: --- description: Use Pare MCP tools for structured dev tool output globs: ["**/*"] alwaysApply: true --- When Pare MCP tools are available, prefer them over running CLI commands in the terminal. Pare tools return structured JSON with up to 95% fewer tokens than raw CLI output. CODE_BLOCK: --- description: Use Pare MCP tools for structured dev tool output globs: ["**/*"] alwaysApply: true --- When Pare MCP tools are available, prefer them over running CLI commands in the terminal. Pare tools return structured JSON with up to 95% fewer tokens than raw CLI output. - Parser and formatter tests cover every output format with realistic CLI fixtures — currently over 4,500 tests across 218 test files - Fidelity tests run the real CLI tool and the Pare parser against the same inputs, then diff the results. If the parser drops or misrepresents data, the test fails. This catches regressions that unit tests miss because the fixture is stale. - Security tests on every package verify that flag injection is blocked on all positional parameters and that Zod input limits prevent DoS via oversized payloads - Smoke tests replay recorded MCP sessions — real tool call transcripts captured from actual agent usage — to verify the full request/response cycle hasn't regressed - Integration tests spawn real MCP servers via StdioClientTransport and make actual tool calls, validating the entire chain from input schema through CLI execution to output schema - execFile everywhere: Argument arrays, never shell string concatenation - Flag injection detection: assertNoFlagInjection() on every positional string parameter — anything starting with - is rejected - Input size limits: Zod .max() constraints on all strings and arrays prevent payload-based DoS - Policy gates: Destructive operations like vagrant destroy or terraform apply require explicit opt-in via environment variables - Docker volume blocking: Mount validation prevents access to sensitive host paths