Tools: 🚀 Building an AI Incident Copilot: How I Automated the First 15 Minutes of Every Production Incident

Tools: 🚀 Building an AI Incident Copilot: How I Automated the First 15 Minutes of Every Production Incident

Project structure

Quick test (no real services needed)

Systemd journal triage

Docker triage

What you get

What to improve next Every production incident follows the same painful ritual. An alert fires at 2am. An engineer wakes up, SSH's into a server, and begins the manual loop — pulling logs, scanning for errors, guessing what to check next. This loop can take 15 to 45 minutes before the real diagnosis even begins. Multiply that by every incident across every team in your organisation, and you have thousands of engineering hours lost every year to work that is repetitive, stressful, and largely automatable. I've been on that on-call rotation. I know what it costs — not just in time, but in cognitive load, in missed context, and in the compounding pressure of an active incident. So I built incopilot: a CLI tool that automates the entire first-pass triage so engineers can skip straight to actual problem-solving. This post walks through the architecture, the design decisions, and exactly how to build it yourself. Everything is open source at https://github.com/AutoShiftOps/incopilot. out/report.md — paste into your incident doc

out/report.json — attach to a ticket or POST to a webhook Sudhakar Sajja is an Application Architect at TechMahindra with 13 years of experience across protocol testing, SDET, DevOps, and cloud architecture. He specialises in AI-powered DevOps operations — building tools that use LLMs to replace manual incident response and query diagnostics. He writes weekly at AutoShiftOps (autoshiftops.com) and built QueryTuner (querytuner.com), an AI-driven SQL query analysis tool. Based in Mississauga, Canada. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ incopilot/ __init__.py cli.py # argument parsing + console output collectors.py # journalctl, -weight: 500;">docker logs, file, bundle analyzer.py # pattern detection + line normalization reporter.py # report.md / report.json generation config.py # patterns, golden-signal map, safe-command list scripts/ demo_generate_sample_logs.py posts/ requirements.txt pyproject.toml README.md incopilot/ __init__.py cli.py # argument parsing + console output collectors.py # journalctl, -weight: 500;">docker logs, file, bundle analyzer.py # pattern detection + line normalization reporter.py # report.md / report.json generation config.py # patterns, golden-signal map, safe-command list scripts/ demo_generate_sample_logs.py posts/ requirements.txt pyproject.toml README.md incopilot/ __init__.py cli.py # argument parsing + console output collectors.py # journalctl, -weight: 500;">docker logs, file, bundle analyzer.py # pattern detection + line normalization reporter.py # report.md / report.json generation config.py # patterns, golden-signal map, safe-command list scripts/ demo_generate_sample_logs.py posts/ requirements.txt pyproject.toml README.md -weight: 500;">git clone https://github.com/AutoShiftOps/incopilot.-weight: 500;">git cd incopilot python -m venv .venv source .venv/bin/activate -weight: 500;">pip -weight: 500;">install -r requirements.txt -weight: 500;">git clone https://github.com/AutoShiftOps/incopilot.-weight: 500;">git cd incopilot python -m venv .venv source .venv/bin/activate -weight: 500;">pip -weight: 500;">install -r requirements.txt -weight: 500;">git clone https://github.com/AutoShiftOps/incopilot.-weight: 500;">git cd incopilot python -m venv .venv source .venv/bin/activate -weight: 500;">pip -weight: 500;">install -r requirements.txt python scripts/demo_generate_sample_logs.py python -m incopilot file --path sample.log ls out/ python scripts/demo_generate_sample_logs.py python -m incopilot file --path sample.log ls out/ python scripts/demo_generate_sample_logs.py python -m incopilot file --path sample.log ls out/ python -m incopilot journal --unit nginx --since "30 min ago" python -m incopilot journal --unit nginx --since "30 min ago" python -m incopilot journal --unit nginx --since "30 min ago" python -m incopilot -weight: 500;">docker --container my-api --since 1h python -m incopilot -weight: 500;">docker --container my-api --since 1h python -m incopilot -weight: 500;">docker --container my-api --since 1h python -m incopilot bundle \ --unit nginx \ --container my-api \ --since-journal "30 min ago" \ --since--weight: 500;">docker 1h python -m incopilot bundle \ --unit nginx \ --container my-api \ --since-journal "30 min ago" \ --since--weight: 500;">docker 1h python -m incopilot bundle \ --unit nginx \ --container my-api \ --since-journal "30 min ago" \ --since--weight: 500;">docker 1h - Per--weight: 500;">service pattern packs (nginx, postgres, java, node) - Slack/Teams webhook posting (--webhook <url>) - Unit tests + GitHub Actions CI - Scheduled timer (systemd timer unit) for proactive reports