Tools
Tools: Kronveil v0.2: Dashboard, gRPC, Secret Management, and Local Deployment - Here's What Changed
Quick Recap
What's New in v0.2
1. Full Dashboard UI (React + TypeScript)
2. gRPC API with TLS/mTLS
3. Secret Management: Vault + AWS Secrets Manager
4. Three New Collectors
5. Capacity Planner
6. Policy Engine (OPA/Rego)
7. Prometheus Metrics Export
8. OpenTelemetry (OTel) Integration
9. PagerDuty Integration
10. Audit Logging
11. Helm Chart for Kubernetes
Upgraded Stack
Run Kronveil Locally (5 Minutes)
Prerequisites
Step 1: Clone and Build
Step 2: Verify Everything Is Running
Step 3: Access the Endpoints
Step 4: Check Agent Health
Step 5: Open the Dashboard
Overview Page
Incidents Page
Anomalies Page
Collectors Page
Step 6: Explore the API
Step 7: Prometheus Metrics
Step 8: Tail the Logs
Cleanup
Architecture Diagram (Updated)
CI Pipeline
What's Next (v0.3 Roadmap)
Try It A week ago, I launched Kronveil - an AI-powered infrastructure observability agent that detects anomalies, performs root cause analysis, and auto-remediates incidents in milliseconds. The response was incredible. But that first version had a lot of stubs. The roadmap listed features like "Dashboard UI", "Prometheus metrics", and "multi-cloud secret management" as coming soon. This post covers every new feature shipped in v0.2, a step-by-step guide to run Kronveil locally with Docker Compose, and live screenshots from the running dashboard. The biggest visible change. Kronveil now ships with a production-ready dashboard built with React 18, TypeScript, Tailwind CSS, and Recharts. Six pages, zero fluff: The dashboard runs as a separate container behind nginx, which reverse-proxies /api/ requests to the agent. No CORS headaches. The REST API was always there. Now there's a full gRPC API on port 9091 with four services: Built with reflection support, so you can debug with grpcurl out of the box. TLS and mutual TLS are configurable - just point it at your cert/key files. Two new integrations for secret lifecycle management: Both use the graceful degradation pattern - if credentials aren't configured, the agent logs a warning and continues running without them. The original had Kubernetes and Kafka. Now there are five: Cloud Collector (AWS/Azure/GCP): CI/CD Collector (GitHub Actions): New intelligence module that goes beyond anomaly detection: Compliance and governance built into the agent: Kronveil now exposes a full Prometheus scrape endpoint on port 9090: Full OpenTelemetry support for distributed tracing: This means you can plug Kronveil into your existing OTel collector pipeline and see traces from anomaly detection through incident creation to remediation execution - all in one trace. Full Events API v2 support: Security-grade audit trail: Production-ready Helm chart with security hardened defaults: Here's the full local deployment walkthrough with live screenshots. This builds two images and starts four containers: All four containers should show Up (healthy): Once deployed, you have three endpoints available: Open http://localhost:3000 in your browser. The Overview page shows real-time infrastructure intelligence at a glance - 10.2M events/sec throughput, 2 active incidents, 23-second average MTTR, and 47 anomalies detected in the last 24 hours. The cluster health matrix shows three clusters across US, EU, and AP regions with live node and pod counts. AI-detected and auto-remediated incidents with filtering by status (all, active, acknowledged, resolved). Each incident shows the title, description, MTTR, and number of affected resources. Notice the resolved OOM incident with 23s MTTR - that's the auto-remediation in action. ML-powered anomaly detection and prediction. The distribution chart shows detected vs. predicted anomalies over 24 hours. Each anomaly has a score (0-100%) - the Kafka consumer lag spike scored 94%, and the system predicted a pod OOM 15 minutes before it happened. Telemetry collection agents across your infrastructure. Five active collectors processing 10.2M events/sec across 487 targets with only 0.001% error rate. Kubernetes leads at 4.2M events/sec monitoring 3 clusters, 54 nodes, and 312 pods. Each collector shows real-time health status. Scroll down to see all five collectors - Kubernetes, Apache Kafka, AWS CloudWatch, GitHub Actions (CI/CD), and the Logs collector. GitHub Actions shows a degraded status with 3 errors, which is expected when webhook endpoints aren't publicly accessible in a local deployment. List collectors and their health: Inject a test event (single): Inject a burst of events to trigger anomaly detection: After the burst injection, check for detected anomalies: And incidents that were auto-created: You'll see standard Go metrics plus Kronveil-specific counters for events processed, collector errors, and policy evaluations. Wire this into your Grafana instance for dashboards. Watch the agent detect anomalies, correlate incidents, and execute remediation in real-time. Every push to main runs seven jobs: All green before merge. No exceptions. GitHub: github.com/kronveil/kronveil
License: Apache 2.0 If you find it useful, star the repo. If you find a bug, open an issue. PRs welcome - especially for new collectors, dashboard improvements, and LLM prompt tuning. Follow me for more updates on building production-grade infrastructure tooling with Go and AI. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse