Tools: Aurora Actions: User-Defined Background Automations for Incident Response - Full Analysis
Key Takeaways
What is an Aurora Action?
Why this matters
Where Actions sit on the agentic capability spectrum
How Actions work under the hood
Trigger types in detail
Manual triggers
On-incident-completion triggers
Scheduled triggers
What Actions don't do (and why)
Safety: what to think about before enabling
How to ship your first Action
1. Pick a recurring task you currently do manually
2. Write the prompt as if you were typing it into chat
3. Create the Action with a manual trigger
4. Inspect the run trace
5. Promote to the right trigger type
6. Add it to your team's incident review
Aurora Actions vs traditional incident-management automation
What's next
Get Aurora We shipped one of the most-requested features in Aurora's history: Aurora Actions — user-defined background automations that run on Aurora's agent. An Aurora Action is a named, natural-language instruction the user writes once and then triggers manually, on incident completion, or on a recurring schedule; Aurora's agent executes it as a background task with full access to every connected integration. Where traditional incident management tools force you to pick from a fixed catalog of "automations" (close incident, post to Slack, run runbook), Actions are written in plain English and inherit the full reasoning capability of the agent. This post is for SRE and platform teams already running Aurora — or evaluating it — who want to understand what Actions actually do, where they fit on the agentic spectrum, and how to use them safely. An Aurora Action has four parts: The implementation is a thin layer over Aurora's existing chat agent. When an Action triggers, the executor service creates a background chat session with the action's instruction as the user message, runs it through the same LangGraph workflow that powers interactive chat, and persists the run history. The agent has full tool access (kubectl, cloud CLIs, Terraform, Slack, GitHub, Confluence, Memgraph, Weaviate) and eager-loaded skills — the only differences from interactive chat are scaffolded prompts and the absence of any RCA mandate. Most incident management automation today is workflow automation: PagerDuty fires, Slack channel is created, status page is updated, runbook link is posted. The "automation" is a directed graph of static actions. There is no reasoning, no investigation, no judgment. Tools like Rootly, FireHydrant, and incident.io are excellent at this — but they don't do anything an SRE wouldn't have to manually verify after the fact. Aurora's bet has always been the opposite: automate the investigation itself. Aurora Actions extend that bet from one-shot incident investigations to recurring or post-incident workflows. A few concrete examples: None of these are runbook automation. Each requires the agent to query infrastructure, reason about results, and produce a structured output. Each one was previously the job of an on-call engineer doing follow-up between pages. In our Open-Source AI SRE comparison, we proposed a four-level spectrum for AI SRE capability. Actions don't change the level — they change when the agent runs. The post-incident and scheduled triggers are the genuinely new capability. Before Actions, anything recurring or post-incident required gluing Aurora to an external scheduler, an external prompt store, and bespoke trigger code. Actions collapse all three into the product surface. This is for the technically curious. A few architecturally interesting things from the implementation: 1. Background chat sessions, not a separate runtime. When an Action triggers, the executor service creates a regular chat session with the action's instruction as the seed message and dispatches it as a background Celery task. The agent doesn't know it's running an Action — it just runs the workflow. This means every capability the interactive agent has (tool calls, RAG, graph traversal, sub-agent orchestration) is available inside Actions for free. 2. Eager-loaded skills, no RCA mandate. Interactive chat lazy-loads skills based on the user message. Background actions eager-load all skills because there is no human to clarify ambiguity. The system prompt also strips the "your job is to find root cause" framing — Actions can do anything the agent can do, not just investigate. 3. RLS context is preserved. Aurora uses PostgreSQL row-level security for multi-tenancy. The executor explicitly sets RLS context (org_id, user_id) before running so background tasks see only their own org's data — even though they run under a service identity. 4. Stale run cleanup is integrated. Aurora's existing background-chat janitor already handles orphaned chat sessions from crashed pods. Action runs go through the same path, so a worker pod dying mid-action doesn't leave the run state inconsistent. 5. RBAC is enforced at the route layer. Action CRUD is gated by Aurora's Casbin-based RBAC. Org admins can restrict which roles can create or trigger actions — important because an Action with cloud-CLI access has real blast radius. The simplest case. An admin creates the action, an engineer triggers it from the Actions page or via /action <name> in chat. Useful for codifying common operational tasks ("rotate ECS task definitions for service X", "scan Confluence for stale runbooks") into named, repeatable commands. The chat integration is worth calling out: /action is implemented as an LLM tool call using the same pattern as Aurora's /rca slash command. The agent processes the action dispatch and then continues responding to the rest of the user's message — so you can write "kick off the IAM audit and tell me what changed since last week" and the agent will dispatch the audit action and answer your question in the same turn. When an incident transitions to "resolved", any action with this trigger type runs against the incident context. The incident's metadata, RCA, and timeline are available to the action's agent without the user having to paste anything in. This is the trigger that turns Aurora from a reactive tool ("investigate this page") into a continuous one ("investigate, then run health checks, then file the postmortem"). Interval-based, driven by Celery Beat. Choose a cadence (every N minutes / hours / days), and the action runs without user involvement. This is the building block for the CI/CD auto-remediation and scheduled audit use cases — and it's why we're calling this post and the CI/CD Auto-Remediation guide sister posts. A few capability decisions worth being explicit about: Every Action is a small program with access to your cloud environment. A few rules we use ourselves: Anything you do every week or after every incident. Examples: stale-PR review, alert-noise audit, on-call handover summary. The smaller and more deterministic, the better for v1. Don't translate to "automation language." Write it the way you would write a chat message to a smart junior SRE. "Look at..." "Check whether..." "Open a PR that..." Settings → Actions → New Action. Paste the prompt, set trigger = manual, leave it disabled if you want to review before enabling. Trigger it once and watch the run. Click the run in the history view. Read every tool call. Look for: tool misuse (wrong cloud account), excessive tool calls (3 attempts at the same thing), hallucinated paths or resource IDs. Iterate on the prompt until the trace is clean for three consecutive runs. If the action makes sense after every incident → on-incident-completion. If it's a routine sweep → on-schedule with the longest cadence that still meets your need. Only use short cadences when you have a clear cost and blast-radius understanding. Treat agent runs the same way you treat human runs: include them in your weekly incident review. Look for actions that produced wrong output, actions that nobody read the output of, and actions that produced output nobody acted on. Delete or downgrade as needed. The category most people compare us to is "workflow automation in incident-management SaaS" — Rootly, FireHydrant, incident.io. The comparison is informative but ultimately category-different: The honest framing: traditional incident-management tools automate the process around the incident. Aurora Actions automate what happens inside the agent. Both have value; they cover non-overlapping work. If you live in PagerDuty and use Rootly for incident channels, Aurora Actions sit alongside that — they don't replace it. Aurora Actions is the foundation for several capabilities on our roadmap: We'll publish each of these as they ship. Aurora is fully open source under Apache 2.0. Self-host with Docker Compose or Helm. Actions ship in the next tagged release after aurora-oss-1.2.15 (April 15, 2026); the feature is available on main today. Templates let you quickly answer FAQs or store snippets for re-use. as well , this person and/or - Aurora Actions are reusable, natural-language automations that Aurora's agent executes in the background using all 22+ connected integrations. Available today on the main branch of Aurora.
- Three trigger types out of the box: manual ("run now"), on incident completion (chain follow-up work after every RCA), and recurring schedule (Celery Beat–driven intervals).- Same agent, same tools, different prompt scaffolding. Actions reuse Aurora's existing LangGraph agent and 30+ tools (kubectl, aws, gcloud, az, Terraform, Confluence, Slack, GitHub) — they just run as background chat sessions with eager-loaded skills and no RCA mandate.- /action <name> is a first-class chat primitive. Slash-command autocomplete in the chat input, "Run Action" dropdown on completed incidents, and full RBAC-gated CRUD UI in Settings.- Aurora Actions turn the agent into a programmable platform. This is the building block for CI/CD auto-remediation, scheduled audits, and post-incident health checks — covered in our CI/CD Auto-Remediation guide. - A name — used as the slash-command handle (/action <name>) and as the dropdown label on incident cards.- A natural-language instruction — the prompt the agent will execute. The same instruction the user would type into chat, except it can reference incident context placeholders when triggered post-incident.- A trigger type — manual, on-incident-completion, or on-schedule (interval-based via Celery Beat).- An on/off toggle — actions can be disabled without deletion, with full RBAC for who can create, edit, or trigger them. - Noisy alert tuning — "Every Friday at 5pm, review which Datadog alerts fired more than 20 times this week with mean time-to-acknowledge over 10 minutes. Open a Terraform PR to widen the thresholds or move them to a warning channel."- Post-incident health check — "After every completed RCA, run a 15-minute observation on the affected service: check error rate, p99 latency, and pod restart count. Post results to #incident-followup."- Scheduled infrastructure audit — "Every Monday at 9am, audit IAM roles in the production AWS account that have not been used in 90 days. List candidates for removal in a Confluence page." - No external webhook triggers in this release. We could have added "trigger on arbitrary webhook" but it overlaps with the existing alert-triggered investigation flow. We may add it if we see demand for triggers from systems that don't go through PagerDuty / Datadog / Grafana.- No agent-authored Actions yet. The agent can't create or modify Actions on its own. Self-modification is a serious security boundary; we'd want approval gating and audit logging before opening that door. (See our AI Agent kubectl Safety guide for the threat model.)- No conditional / DAG composition in this release. Actions are single-prompt for now. If you need a multi-step workflow, write a single prompt that describes the steps — the agent is good at sequencing. We'll add explicit composition if the natural-language form proves limiting. - Start read-only. Actions inherit Aurora's tool permissions. If your tool config restricts write actions (no kubectl apply, no aws ec2 terminate-instances), Actions inherit that posture. Keep it that way for the first few weeks.- Use scheduled triggers conservatively. A daily audit is cheap. A 5-minute polling loop with cloud CLI calls is not. Watch the LLM bill.- Audit who can create Actions. RBAC defaults to org-admin-only creation. Leave it there unless you have a clear reason to widen.- Pin the model. Action prompts can be sensitive to model behavior. Pin a known-good model per action (gpt-5.5, claude-sonnet-4.6, opus-4.7, etc.) using Aurora's per-org model dropdown until you have confidence in cross-model stability.- Review action runs weekly. Every action has a run-history view. Spend 10 minutes a week reading the agent's traces for your scheduled actions — anomalous reasoning is the leading indicator of prompt drift or tool drift. - DAG composition — explicit multi-step Action chains where each step is itself an Action.- Approval gates — Actions that pause for human approval before destructive tool calls (already supported in chat; explicit Action-level gating coming).- CI/CD auto-remediation hooks — first-class integration with GitHub Actions, Jenkins, and ArgoCD so a failing pipeline becomes a triggered Aurora investigation. (Background and detailed write-up in our CI/CD Auto-Remediation guide.)- Action marketplace — community-contributed Actions you can install with one click. Bring-your-own prompt store. - GitHub: github.com/Arvo-AI/aurora- Docs: arvo-ai.github.io/aurora- Compare against alternatives: Open-Source AI SRE: Aurora vs HolmesGPT vs K8sGPT · Aurora vs traditional incident-management tools