Tools
Tools: Evaluating GitHub Agentic Workflows — From a Claude Code User's Perspective [Part 10]
2026-02-17
0 views
admin
What You Will Learn ## What Are GitHub Agentic Workflows? ## How It Works ## Workflow Definition ## Compilation and Execution ## Available AI Engines ## Differences from Traditional GitHub Actions ## Considering Application to the Saru Project ## Current Automation ## What Could Be Automated with Agentic Workflows ## Application Summary ## Concerns About Adoption ## 1. Cost ## 2. Technical Preview Instability ## 3. Trust in Non-Deterministic Execution ## 4. Compatibility with Self-Hosted Runners ## 5. Coexistence with Claude Code CLI ## Next Steps ## Summary This article was originally published on Saru Blog. On February 13, 2026, GitHub released this as a technical preview. Co-developed by GitHub Next, Microsoft Research, and Azure Core Upstream, it's open source under the MIT license. In short, a mechanism for automatically running AI coding agents on GitHub Actions. Traditional GitHub Actions strictly define "when X happens, do Y" in YAML. Agentic Workflows write "when X happens, make this kind of judgment" in Markdown. The AI makes the judgment. Place Markdown files in .github/workflows/. Markdown, not YAML. The frontmatter specifies the trigger, permissions, and AI engine to use. The body describes "what you want done" in natural language. gh aw compile parses the Markdown and generates a .lock.yml for GitHub Actions. This lock file is the actual workflow that runs. The Markdown is the human-readable specification; the lock.yml is the machine-executable procedure. The ability to choose Claude Code as the engine makes it a natural choice for developers already using Claude Code. The important point is that Agentic Workflows are not a replacement for CI/CD. GitHub's official blog states this explicitly: Don't use agentic workflows as a replacement for GitHub Actions YAML workflows for CI/CD. Builds, tests, and deploys remain with traditional YAML workflows. Agentic Workflows handle "ambiguous tasks" requiring AI judgment. Under the concept of "Continuous AI," they complement existing CI/CD. Saru already has the following automated via GitHub Actions: These are all deterministic processes — no reason to replace them with Agentic Workflows. So what can be automated? Let me identify "manual tasks that require judgment." 1. Automatic Issue Triage Current state: After creating an issue, I manually add labels and set priority. As a solo developer, I'm the only one doing this. With Agentic Workflows: Trigger on issue creation to automatically analyze content, apply labels, set priority, and identify related files. Assessment: Low impact for solo development. Little need to triage issues I wrote myself. Would be effective once the project goes OSS and external issues increase. 2. Automatic CI Failure Investigation Current state: When CI fails, I read logs, investigate the cause, and fix it. As covered in Part 7, CI stabilization required enormous effort. With Agentic Workflows: Trigger on CI failure to analyze logs, identify root causes, and automatically create fix PRs. Assessment: The most compelling use case. Especially for E2E test flaky failures where root cause identification takes time. Even just having AI do the initial investigation would save significant time. 3. Automatic Dependabot PR Triage Current state: When Dependabot PRs pile up, I review each one individually before merging. With Agentic Workflows: Trigger on Dependabot PRs to review changes and make judgments: "patch version + tests pass → auto-merge," "major version → add needs-manual-review label." Assessment: Effective. Dependabot PR handling is monotonous yet requires judgment — exactly what Agentic Workflows excel at. 4. Daily Status Report Current state: None. Development status exists only in my head. With Agentic Workflows: Auto-generate reports on daily issue/PR status, CI health, and outstanding items. Assessment: Overkill for solo development. Would be effective for team development or when the project has OSS contributors. Running Agentic Workflows incurs AI engine API calls. If AI runs on every CI failure, monthly costs become unpredictable. E2E tests especially have many jobs, so failure frequency × API cost must be estimated. As of February 2026, it's still a technical preview. GitHub's official documentation explicitly states "at your own risk." Too early to integrate into production CI/CD pipelines. Documentation is still developing — details around Markdown frontmatter specifications and engine configuration require some trial-and-error exploration. In the CI/CD world, "same input → same output" is a fundamental principle. Agentic Workflows are inherently non-deterministic — AI judgment may differ each time. Safe outputs and read-only defaults provide safety margins, but handling cases like "AI applied the wrong label" or "created an irrelevant fix PR" becomes necessary. Saru runs parallel E2E tests on 15 self-hosted runners. Whether Agentic Workflows function correctly on self-hosted runners is unverified. Official documentation mostly assumes GitHub-hosted runners. This is the most important consideration. Saru already uses Claude Code CLI locally for development. If Claude Code also runs automatically on GitHub, clear role separation becomes essential: Multiple AIs operating on the same repository with different contexts requires clearly defined roles to avoid confusion. This article stays at the investigation and evaluation level. In the next article, I plan to actually implement Agentic Workflows in the Saru repository and verify: Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
Traditional GitHub Actions: Event → YAML-defined steps → Deterministic execution Agentic Workflows: Event → Markdown-described objectives → AI judges and executes Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Traditional GitHub Actions: Event → YAML-defined steps → Deterministic execution Agentic Workflows: Event → Markdown-described objectives → AI judges and executes CODE_BLOCK:
Traditional GitHub Actions: Event → YAML-defined steps → Deterministic execution Agentic Workflows: Event → Markdown-described objectives → AI judges and executes COMMAND_BLOCK:
---
on: issues: opened
permissions: contents: read issues: write
safe-outputs: add-comment: true add-labels: true
engine: claude
--- ## Issue Triage When a new issue is created, analyze its content and apply appropriate labels. ## Criteria - Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority ## Comments Leave triage results as a comment. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
---
on: issues: opened
permissions: contents: read issues: write
safe-outputs: add-comment: true add-labels: true
engine: claude
--- ## Issue Triage When a new issue is created, analyze its content and apply appropriate labels. ## Criteria - Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority ## Comments Leave triage results as a comment. COMMAND_BLOCK:
---
on: issues: opened
permissions: contents: read issues: write
safe-outputs: add-comment: true add-labels: true
engine: claude
--- ## Issue Triage When a new issue is created, analyze its content and apply appropriate labels. ## Criteria - Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority ## Comments Leave triage results as a comment. COMMAND_BLOCK:
# Compile with CLI (generates lock.yml from Markdown)
gh aw compile # Manual trigger
gh aw run Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Compile with CLI (generates lock.yml from Markdown)
gh aw compile # Manual trigger
gh aw run COMMAND_BLOCK:
# Compile with CLI (generates lock.yml from Markdown)
gh aw compile # Manual trigger
gh aw run CODE_BLOCK:
Local development: Human + Claude Code CLI → Code implementation, test creation On GitHub: Copilot → PR review (already in use) Agentic Workflows → CI failure investigation, triage (under consideration) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Local development: Human + Claude Code CLI → Code implementation, test creation On GitHub: Copilot → PR review (already in use) Agentic Workflows → CI failure investigation, triage (under consideration) CODE_BLOCK:
Local development: Human + Claude Code CLI → Code implementation, test creation On GitHub: Copilot → PR review (already in use) Agentic Workflows → CI failure investigation, triage (under consideration) - What GitHub Agentic Workflows are
- How they differ from traditional GitHub Actions
- Benefits for Claude Code users
- What can be automated in a 200K-line SaaS project
- Key factors in the adoption decision - Copilot: ~2 premium requests per execution (agent execution + safe outputs)
- Claude Code: API billing via ANTHROPIC_API_KEY
- Codex: API billing via OPENAI_API_KEY - Building a CI failure auto-investigation workflow
- Execution with the Claude Code engine
- Operation on self-hosted runners
- Actual cost measurement - Part 1: Fighting Unmaintainable Complexity with Automation
- Part 2: Automating WebAuthn Tests in CI
- Part 3: Next.js x Go Monorepo Architecture
- Part 4: Multi-Tenant Isolation with PostgreSQL RLS
- Part 5: Multi-Portal Authentication Pitfalls
- Part 6: Developing a 200K-Line SaaS Alone with Claude Code
- Part 7: Landmines and Solutions in Self-Hosted CI/CD
- Part 8: Turning Solo Development into Team Development with Claude Code Agent Teams
- Part 9: pnpm + Next.js Standalone + Docker: 5 Failures Before Success
- Part 10: Evaluating GitHub Agentic Workflows (this article)
how-totutorialguidedev.toaimlopenaipostgresqldockergitgithub