Tools: Latest: CI/CD in the Era of AI and Platform Engineering: A Deep Dive into Dagger CI (Part 4)
Part 4: The AI-Native CI/CD Stack: Agents, Modules, and Spec-Driven Development
What Is a Dagger Agent?
The Problem With Setting Up CI
Generating the Setup With Daggie
Meet the Agents
The Generated Setup
Running the Checks
When a Check Fails
Integrating Into CI/CD
The Developer Experience Shift
Before: Pipeline Specialists
After: Agents Configure and Fix CI
From Developer Platform to Agent Factory
Spec-Driven Development With Speck
The Pipeline Pattern
Setting Up the Workflow
Running the Workflow
What Just Happened
Agents That Learn: Self-Improvement Across Runs
Three modes
Would I Recommend Dagger CI in Production Right Now?
What the Agent Layer Needs
What's Coming, and Why It Matters
Conclusion
Key Takeaways Fixed pipelines for speed and reliability. AI agents to write them and fix them when they break. In Part 1 we built pipelines as real code. In Part 2 we decoupled them from infrastructure. In Part 3 we built AcmeCorp's private module library (acme-backend, acme-frontend, and acme-deploy) that wraps public daggerverse modules with organization-specific compliance, naming, and security. Now let's talk about where AI actually belongs in CI/CD, and where it doesn't. The thesis is simple: AI doesn't replace the pipeline. It writes the pipeline and fixes it when it breaks. The pipeline itself stays fixed, deterministic, and fast. Just as container primitives allow us to build CI pipelines, Dagger introduces an LLM() primitive that lets you create agents the same way you'd call any other pipeline function. Under the hood, dag.llm() connects to any supported model (Claude, GPT, Gemini) and gives you a composable builder to layer on system prompts, environment bindings, and tool access. What makes this powerful is the tool story. Any Dagger module, including the same private modules we built in Part 3, can be exposed as MCP tools that the agent calls at runtime. Your acme-deploy module becomes a cloud_run tool. Your acme-backend module becomes build and test tools. You can also attach any local MCP server (a language linter, a CLI wrapper, a documentation server) alongside those module tools, giving the agent both your custom CI abstractions and third-party capabilities in a single environment. The result is a tight synergy between modules and agents: modules are the typed, testable building blocks; agents are the orchestration layer that composes them through natural language. You don't choose between writing pipelines and using AI. You write modules once, and agents compose them for you. LLM provider required. The dag.llm() primitive needs access to a language model. Dagger detects your provider from environment variables — set one of: In CI, add the key as a repository secret and pass it via env:. Locally, export the variable in your shell before running dagger call. If no provider is detected, agent functions will fail at runtime with a clear error message. We solved the YAML problem in Part 1. Pipelines are real code now. And in Part 3, we went further: toolchains let you install AcmeCorp's private modules as zero-code CI, with dagger check running all your checks from a single dagger.json. No SDK, no .dagger/ directory, no pipeline code. But there's still a bottleneck: configuring that setup requires knowing the module library. AcmeCorp's platform team maintains a growing set of private modules: acme-backend, acme-frontend, acme-deploy. Each module has its own @check functions, its own parameters, its own DefaultPath conventions. Knowing which modules to install as toolchains, which customizations to add for a monorepo layout, and how to wire the deployment step in GitHub Actions still requires familiarity with the internal module library. What if you could point an AI agent at your private modules and your source code, and have it generate the complete toolchain setup and CI workflow for you? Daggie is a Dagger CI specialist agent. It reads module source code, understands their APIs, and generates the right toolchain configuration for your project. You give it your source directory and the Git URL of your module repository. Daggie discovers all available modules inside it and picks the ones relevant to the assignment. Let's pick up from where we left off in Part 3. We're in the dagger-ci-demo monorepo (FastAPI backend + Angular frontend), and AcmeCorp's private modules live at github.com/telchak/acme-dagger-modules. AcmeCorp's coding agents (Monty, Angie, Daggie) live at github.com/telchak/daggerverse. If you still have local changes from Part 3, you can stash them with git stash -u or simply delete the repo and clone it fresh — we want a clean starting point with no existing Dagger configuration. First, initialize Dagger in the project and write the assignment file: Then point Daggie at both repositories — the module library and the daggerverse (so it can discover Monty and Angie's real URLs and versions): Daggie clones both repositories and auto-discovers all Dagger modules within them by finding dagger.json files. It reads each module's source code and @check-decorated functions (acme-backend (test, lint), acme-frontend (test, lint, audit), acme-deploy (scan)), detects the monorepo layout, and finds the coding agents (Monty, Angie) with their version tags. It also fetches the latest dagger/dagger-for-github action version automatically. The export --path=. writes the generated dagger.json and .github/workflows/ci.yml to your project root, ready to review, test with dagger check, and commit. Before we look at what Daggie generates, let's introduce the three agents that work together in this setup. They're all Dagger modules, and you call them the same way you call any other module: Daggie: the CI specialist. It reads your source code and available modules, then generates the toolchain configuration and CI workflow. You've just seen it in action. Daggie writes the setup; it doesn't run in the pipeline. Monty: the Python coding agent. When a check fails on Python code (a test failure, a lint error, a broken import), Monty reads the error output and the source code, analyzes the root cause, and posts an inline code fix suggestion directly on the pull request. Angie: the Angular/TypeScript coding agent. Same role as Monty, but for the frontend stack. When an Angular build or test fails, Angie diagnoses the issue and suggests the fix. The key design: Daggie generates the toolchain setup once. Monty and Angie are called from the CI workflow only when something fails. The happy path (dagger check: lint, test, audit, scan) is pure deterministic module execution with no LLM involved. AI only enters the picture when a human needs help. Here's what Daggie generates. No .dagger/ directory, no SDK, no Python pipeline code. Just a dagger.json with toolchains and a CI workflow. The code blocks below are what Daggie consistently produced as output after 10+ runs with gemini-2.5-pro: Notice what Daggie understood from the project structure and the module library: No GCP credentials needed for the checks — they run entirely in containers: Six checks, three toolchains, zero lines of code. All six run in parallel. No tokens consumed. The private modules handled base images, cache volumes, coverage thresholds, and vulnerability scanning — all invisible to the project. Let's say a developer pushes a PR and the backend tests fail: The CI workflow's failure step kicks in. Monty reads the error output and the source code, analyzes the root cause, and posts an inline code suggestion directly on the PR: 🐍 Monty suggested a fix for backend/auth.py: The test expects a 401 when the token is expired, but validate_token doesn't check the exp claim. This adds the expiry check before returning. The developer gets actionable fix suggestions, with code they can accept in one click, instead of a wall of logs to interpret. Daggie also generates the GitHub Actions workflow. Here's what it produces: dagger check for PRs, deployment on main, and a failure handler that calls Monty or Angie directly: The check job uses zero LLM tokens. It's pure dagger check — six deterministic checks from three toolchains. The suggest-fix steps only run on failure, calling Monty and Angie directly as Dagger modules (not pipeline functions). The deploy job calls acme-deploy's functions via dagger call on the installed toolchain. You get deterministic, fast CI with intelligent failure handling. The platform team builds the modules and agents. Daggie configures toolchains and generates the CI workflow. dagger check runs fast and deterministic. When things break, coding agents step in with targeted fixes. So far we've seen how Dagger improves CI performance, maintainability, and developer experience. But there's a larger shift happening. As coding agents become more capable, the developer's core role is evolving, from pure coder to agent orchestrator. You still need to understand the code, review the output, and make architectural decisions. But more and more of the mechanical work (implementing a well-specified feature, writing tests for existing code, fixing a lint error) can be delegated to agents that understand your codebase. Follow this evolution to its conclusion, and an Internal Developer Platform starts looking like an Internal Agent Factory: a system that manages not just infrastructure and deployments, but how coding agents are built, composed, and deployed: which agents run on which tasks, with what models, under what constraints, producing what artifacts. The building blocks are already here. We have: What's missing is the orchestration layer, something that takes a feature request, breaks it into agent-assignable tasks, and dispatches them through CI. That's Speck. Speck is a Dagger agent that implements spec-driven development, inspired by GitHub's spec-kit methodology. The idea is simple: specifications first, code second. Given a feature request (either a prompt or a GitHub issue), Speck runs a three-step pipeline: The output is a structured JSON object designed for GitHub Actions fromJson() + matrix strategy consumption. Each task includes a suggested_agent (which Dagger agent should execute it), a suggested_model (which LLM complexity tier it needs), and an order field that defines the execution sequence. When --include-tests and --include-review are enabled, Speck organizes tasks into phases that follow an implement → test → review pipeline: Phases run in parallel (each on its own CI runner). Tasks within a phase use the prompt chaining pattern, a workflow where the output of one agent becomes the input of the next, forming a sequential pipeline. Concretely, each agent receives a source Directory, modifies it, and exports the result back to the workspace. The next agent in the chain picks up that modified workspace as its input. This is different from running agents independently: the test agent sees the code the implementation agent wrote, and the review agent sees both the implementation and the tests. One PR is created per phase from the accumulated changes. The model assignment is automatic: Speck maps task complexity to concrete model IDs based on the chosen provider family. Simple config changes get Haiku. Standard feature implementations get Sonnet. Cross-cutting architectural changes get Opus. Test tasks get one tier above their implementation task's complexity, since understanding the implementation requires more context. Let's see this in action. We'll fork a real-world application, the FastAPI RealWorld Example App (a production-like REST API with authentication, articles, comments, and favorites), and turn GitHub Actions into a spec-driven development platform. Step 1: Fork the repository Step 2: Add the Speck workflow Create .github/workflows/speck.yml: A few things to note in this workflow: Step 3: Configure secrets The workflow needs an LLM API key. Add it as a repository secret: The GITHUB_TOKEN is provided automatically by GitHub Actions with the permissions declared in the workflow. Step 4: Commit, push, and create a test issue Now create a GitHub issue with a feature request (see issue #1): Title: Add article bookmarking/favorites list endpoint Add the ability for authenticated users to retrieve their list of favorited articles with pagination and optional filtering. Add the speck label to the issue. This triggers the workflow. Step 1, Decomposition (Opus): Speck reads the issue, explores the FastAPI codebase (models, routes, repositories, existing test patterns), and produces a structured decomposition. It posts the result as a comment on the issue: In this case, Speck decomposed the feature into 3 phases with 9 tasks: Each task has a suggested_model based on complexity: simple schema additions get claude-haiku-4-5, standard implementations get claude-sonnet-4-6, and comprehensive test writing gets claude-opus-4-6 (since tests need to understand the full implementation context). Step 2, Execution (parallel phases, sequential tasks): GitHub Actions matrices by phase. Each phase runs on its own runner. Within each phase, tasks are chained sequentially. Monty implements the feature, then writes tests on top of the implementation, then reviews the accumulated changes: Step 3, Pull Requests: Each phase produced one PR with all accumulated changes, linked to the original issue: Each PR includes implementation, tests, and a review pass, all generated by Monty working sequentially on the same codebase within the phase. A developer wrote a feature request with acceptance criteria. The system: No pipeline code was written. No agent was invoked manually. The developer's job is now to review the PRs: read the code, check the tests, verify the approach. The mechanical work of translating a spec into code, tests, and PRs happened automatically. This is the shift from Internal Developer Platform to Internal Agent Factory: the platform doesn't just run your CI. It runs your agents, manages their model costs, chains their outputs, and produces reviewable artifacts from natural language specifications. Image generated with Google's Gemini "Nano Banana Pro" There's one more capability worth covering. Every agent (Monty, Angie, Daggie, and Goose, a GCP deployment orchestrator) reads per-repo context files to understand project conventions. But until now, the context was static. The developer wrote it once and maintained it by hand. With --self-improve, the agents can update those files themselves: As Monty works through the codebase (reading models, tracing routes, checking existing patterns), it discovers things: "This project uses Pydantic v2 field validators, not v1-style @validator." "Tests use httpx.AsyncClient, not the sync test client." "Custom exceptions live in app/errors.py." Instead of those discoveries dying with the session, Monty records them in two files: MONTY.md, Python-specific knowledge: AGENTS.md, general project knowledge shared across all agents: The next time any agent runs on this repo, whether it's Monty, Angie, or a different developer, it reads both the agent-specific file and the shared AGENTS.md, starting with better knowledge. Python patterns stay in MONTY.md where only Monty reads them; project-wide conventions go in AGENTS.md where every agent benefits. No one had to write documentation. The agents documented the project by working on it. The commit mode is useful for automation. When combined with develop-github-issue, the context file updates get included in the PR: The PR includes both the code changes and a commit like: Over time, the context files become living documents, a compressed summary of the project's architecture, conventions, and gotchas, maintained by the agents that work on it. I've been following the Dagger project for several years now. And I can say with confidence: it has never been closer to production-ready than it is today. The core primitives (typed functions, composable modules, containerized execution, deterministic caching) are solid. The dagger call experience is genuinely portable across local development and CI. The module ecosystem is growing. And as we've seen throughout this series, the LLM integration through the dag.llm() primitive opens up a category of workflows that simply didn't exist before. That said, there are areas where the platform still needs to mature. Here's what I'd like to see, and what's already on the roadmap. The current LLM primitive is functional but minimal. To build truly capable agents in Dagger, a few key features would make a significant difference: Some of the most exciting changes are already in active development: Cloud Engines: Fully managed Dagger execution environments with auto-scaling and distributed caching built in. Run dagger --cloud and your pipeline executes on managed infrastructure, with secrets and local context securely streamed to the cloud. No more managing Kubernetes daemonsets or custom cache layers. Cloud Checks: This is the big one. Cloud Checks connects directly to your Git provider and triggers dagger check on every change, running on Cloud Engines. No YAML. No vendor syntax. No orchestration layer. Just your Dagger modules. Those two previous features are welcome because the more complex our Dagger workflows get, the more trying to fit them into GitHub Actions or GitLab CI feels like forcing circles into squares. Our Speck-driven development workflow is a perfect example: a decompose job that outputs dynamic JSON, a matrix strategy that fans out phases, shell scripts converting snake_case to kebab-case, environment variables carrying JSON between steps, conditional export commands based on return types... All of this ceremony exists because GitHub Actions was designed for static, declarative workflows, not for the kind of dynamic, graph-shaped execution that Dagger naturally produces. Cloud Checks would eliminate that entire translation layer. Your Dagger module is the CI platform. Add to that a native Graph core type, and you could have a full native multi-agent workflow completely independent from GitHub Actions or any other CI engine. Dagger CI would go from a "CI development toolkit" to a fully operational CI/CD platform. Modules V2: A fundamental redesign of how modules interact with projects. Today, modules can't see your project structure unless you thread it through manually with --source flags, custom boilerplate, and static path patterns. Modules V2 introduces a typed Workspace API that lets modules parse configuration files, traverse directory trees, and adapt to any project layout, all through executable code rather than rigid pragmas. A new .dagger/config.toml file declares which modules a project uses in a human-editable format, and a lockfile ensures reproducible resolution across teams. This shifts complexity from users to module authors, which is exactly where it belongs. These three features together (managed compute, native CI triggering, and smarter module integration) would close the gap between "Dagger as a portable pipeline SDK" and "Dagger as a complete CI platform." And from everything I've seen in the project's trajectory, that gap is closing fast. This is where the whole series comes together. The key insight: CI checks need to be fast, reliable, and deterministic. AI belongs at the edges — generating the configuration, diagnosing failures, decomposing specs into tasks, and learning from every run. Never in the hot path. The example apps and Dagger module are at github.com/telchak/dagger-ci-demo. The AcmeCorp private modules from Part 3 are at github.com/telchak/acme-dagger-modules. This concludes the 4-part series. Thanks for reading. Tags: #cicd #dagger #ai-agents #platform-engineering #mcp #cloudrun #firebase #spec-driven-development Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
Learned Context - Pydantic v2 with field validators (`field_validator`), not v1 `@validator`
- All route handlers are async; tests use `httpx.AsyncClient` with `pytest-asyncio`
- Input validation pattern: Pydantic model as request body, raises `ValidationError` → 422
Learned Context - Pydantic v2 with field validators (`field_validator`), not v1 `@validator`
- All route handlers are async; tests use `httpx.AsyncClient` with `pytest-asyncio`
- Input validation pattern: Pydantic model as request body, raises `ValidationError` → 422
Learned Context - Pydantic v2 with field validators (`field_validator`), not v1 `@validator`
- All route handlers are async; tests use `httpx.AsyncClient` with `pytest-asyncio`
- Input validation pattern: Pydantic model as request body, raises `ValidationError` → 422
Learned Context - Custom exception hierarchy in `app/errors.py`, handlers in `app/middleware.py`
- Project uses src layout with `app/` as the main package
- CI runs pytest with coverage; minimum threshold is 80%
Learned Context - Custom exception hierarchy in `app/errors.py`, handlers in `app/middleware.py`
- Project uses src layout with `app/` as the main package
- CI runs pytest with coverage; minimum threshold is 80%
Learned Context - Custom exception hierarchy in `app/errors.py`, handlers in `app/middleware.py`
- Project uses src layout with `app/` as the main package
- CI runs pytest with coverage; minimum threshold is 80%