Tools: Kronveil v0.2: Dashboard, gRPC, Secret Management, and Local Deployment - Here's What Changed

Tools: Kronveil v0.2: Dashboard, gRPC, Secret Management, and Local Deployment - Here's What Changed

Quick Recap

What's New in v0.2

1. Full Dashboard UI (React + TypeScript)

2. gRPC API with TLS/mTLS

3. Secret Management: Vault + AWS Secrets Manager

4. Three New Collectors

5. Capacity Planner

6. Policy Engine (OPA/Rego)

7. Prometheus Metrics Export

8. OpenTelemetry (OTel) Integration

9. PagerDuty Integration

10. Audit Logging

11. Helm Chart for Kubernetes

Upgraded Stack

Run Kronveil Locally (5 Minutes)

Prerequisites

Step 1: Clone and Build

Step 2: Verify Everything Is Running

Step 3: Access the Endpoints

Step 4: Check Agent Health

Step 5: Open the Dashboard

Overview Page

Incidents Page

Anomalies Page

Collectors Page

Step 6: Explore the API

Step 7: Prometheus Metrics

Step 8: Tail the Logs

Cleanup

Architecture Diagram (Updated)

CI Pipeline

What's Next (v0.3 Roadmap)

Try It A week ago, I launched Kronveil - an AI-powered infrastructure observability agent that detects anomalies, performs root cause analysis, and auto-remediates incidents in milliseconds. The response was incredible. But that first version had a lot of stubs. The roadmap listed features like "Dashboard UI", "Prometheus metrics", and "multi-cloud secret management" as coming soon. This post covers every new feature shipped in v0.2, a step-by-step guide to run Kronveil locally with Docker Compose, and live screenshots from the running dashboard. The biggest visible change. Kronveil now ships with a production-ready dashboard built with React 18, TypeScript, Tailwind CSS, and Recharts. Six pages, zero fluff: The dashboard runs as a separate container behind nginx, which reverse-proxies /api/ requests to the agent. No CORS headaches. The REST API was always there. Now there's a full gRPC API on port 9091 with four services: Built with reflection support, so you can debug with grpcurl out of the box. TLS and mutual TLS are configurable - just point it at your cert/key files. Two new integrations for secret lifecycle management: Both use the graceful degradation pattern - if credentials aren't configured, the agent logs a warning and continues running without them. The original had Kubernetes and Kafka. Now there are five: Cloud Collector (AWS/Azure/GCP): CI/CD Collector (GitHub Actions): New intelligence module that goes beyond anomaly detection: Compliance and governance built into the agent: Kronveil now exposes a full Prometheus scrape endpoint on port 9090: Full OpenTelemetry support for distributed tracing: This means you can plug Kronveil into your existing OTel collector pipeline and see traces from anomaly detection through incident creation to remediation execution - all in one trace. Full Events API v2 support: Security-grade audit trail: Production-ready Helm chart with security hardened defaults: Here's the full local deployment walkthrough with live screenshots. This builds two images and starts four containers: All four containers should show Up (healthy): Once deployed, you have three endpoints available: Open http://localhost:3000 in your browser. The Overview page shows real-time infrastructure intelligence at a glance - 10.2M events/sec throughput, 2 active incidents, 23-second average MTTR, and 47 anomalies detected in the last 24 hours. The cluster health matrix shows three clusters across US, EU, and AP regions with live node and pod counts. AI-detected and auto-remediated incidents with filtering by status (all, active, acknowledged, resolved). Each incident shows the title, description, MTTR, and number of affected resources. Notice the resolved OOM incident with 23s MTTR - that's the auto-remediation in action. ML-powered anomaly detection and prediction. The distribution chart shows detected vs. predicted anomalies over 24 hours. Each anomaly has a score (0-100%) - the Kafka consumer lag spike scored 94%, and the system predicted a pod OOM 15 minutes before it happened. Telemetry collection agents across your infrastructure. Five active collectors processing 10.2M events/sec across 487 targets with only 0.001% error rate. Kubernetes leads at 4.2M events/sec monitoring 3 clusters, 54 nodes, and 312 pods. Each collector shows real-time health status. Scroll down to see all five collectors - Kubernetes, Apache Kafka, AWS CloudWatch, GitHub Actions (CI/CD), and the Logs collector. GitHub Actions shows a degraded status with 3 errors, which is expected when webhook endpoints aren't publicly accessible in a local deployment. List collectors and their health: Inject a test event (single): Inject a burst of events to trigger anomaly detection: After the burst injection, check for detected anomalies: And incidents that were auto-created: You'll see standard Go metrics plus Kronveil-specific counters for events processed, collector errors, and policy evaluations. Wire this into your Grafana instance for dashboards. Watch the agent detect anomalies, correlate incidents, and execute remediation in real-time. Every push to main runs seven jobs: All green before merge. No exceptions. GitHub: github.com/kronveil/kronveil

License: Apache 2.0 If you find it useful, star the repo. If you find a bug, open an issue. PRs welcome - especially for new collectors, dashboard improvements, and LLM prompt tuning. Follow me for more updates on building production-grade infrastructure tooling with Go and AI. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

helm install kronveil helm/kronveil/ \ --namespace kronveil \ --create-namespace \ --set agent.bedrock.region=us-east-1 helm install kronveil helm/kronveil/ \ --namespace kronveil \ --create-namespace \ --set agent.bedrock.region=us-east-1 helm install kronveil helm/kronveil/ \ --namespace kronveil \ --create-namespace \ --set agent.bedrock.region=us-east-1 git clone https://github.com/kronveil/kronveil.git cd kronveil docker-compose -f deploy/docker-compose.yaml up --build -d git clone https://github.com/kronveil/kronveil.git cd kronveil docker-compose -f deploy/docker-compose.yaml up --build -d git clone https://github.com/kronveil/kronveil.git cd kronveil docker-compose -f deploy/docker-compose.yaml up --build -d docker-compose -f deploy/docker-compose.yaml ps docker-compose -f deploy/docker-compose.yaml ps docker-compose -f deploy/docker-compose.yaml ps NAME STATUS PORTS deploy-agent-1 Up About a minute (healthy) 127.0.0.1:8080->8080/tcp deploy-dashboard-1 Up About a minute (healthy) 127.0.0.1:3000->8080/tcp deploy-kafka-1 Up About a minute (healthy) 127.0.0.1:9092->9092/tcp deploy-zookeeper-1 Up About a minute (healthy) 2181/tcp NAME STATUS PORTS deploy-agent-1 Up About a minute (healthy) 127.0.0.1:8080->8080/tcp deploy-dashboard-1 Up About a minute (healthy) 127.0.0.1:3000->8080/tcp deploy-kafka-1 Up About a minute (healthy) 127.0.0.1:9092->9092/tcp deploy-zookeeper-1 Up About a minute (healthy) 2181/tcp NAME STATUS PORTS deploy-agent-1 Up About a minute (healthy) 127.0.0.1:8080->8080/tcp deploy-dashboard-1 Up About a minute (healthy) 127.0.0.1:3000->8080/tcp deploy-kafka-1 Up About a minute (healthy) 127.0.0.1:9092->9092/tcp deploy-zookeeper-1 Up About a minute (healthy) 2181/tcp curl http://localhost:8080/api/v1/health curl http://localhost:8080/api/v1/health curl http://localhost:8080/api/v1/health { "data": { "status": "healthy" } } { "data": { "status": "healthy" } } { "data": { "status": "healthy" } } curl http://localhost:8080/api/v1/status | python3 -m json.tool curl http://localhost:8080/api/v1/status | python3 -m json.tool curl http://localhost:8080/api/v1/status | python3 -m json.tool curl http://localhost:8080/api/v1/collectors | python3 -m json.tool curl http://localhost:8080/api/v1/collectors | python3 -m json.tool curl http://localhost:8080/api/v1/collectors | python3 -m json.tool curl -X POST http://localhost:8080/api/v1/test/inject?mode=single curl -X POST http://localhost:8080/api/v1/test/inject?mode=single curl -X POST http://localhost:8080/api/v1/test/inject?mode=single curl -X POST http://localhost:8080/api/v1/test/inject?mode=burst curl -X POST http://localhost:8080/api/v1/test/inject?mode=burst curl -X POST http://localhost:8080/api/v1/test/inject?mode=burst curl http://localhost:8080/api/v1/anomalies | python3 -m json.tool curl http://localhost:8080/api/v1/anomalies | python3 -m json.tool curl http://localhost:8080/api/v1/anomalies | python3 -m json.tool curl http://localhost:8080/api/v1/incidents | python3 -m json.tool curl http://localhost:8080/api/v1/incidents | python3 -m json.tool curl http://localhost:8080/api/v1/incidents | python3 -m json.tool curl http://localhost:9090/metrics curl http://localhost:9090/metrics curl http://localhost:9090/metrics docker-compose -f deploy/docker-compose.yaml logs -f agent docker-compose -f deploy/docker-compose.yaml logs -f agent docker-compose -f deploy/docker-compose.yaml logs -f agent docker-compose -f deploy/docker-compose.yaml down docker-compose -f deploy/docker-compose.yaml down docker-compose -f deploy/docker-compose.yaml down +------------------+ | Dashboard UI | | (React + nginx) | | :3000 | +--------+---------+ | /api/ proxy | +------------------+ +---------v---------+ +------------------+ | Collectors | | Kronveil Agent | | Integrations | | +--->+ +--->+ | | - Kubernetes | | REST API :8080 | | - Slack | | - Kafka | | gRPC API :9091 | | - PagerDuty | | - Cloud (AWS) | | Metrics :9090 | | - Prometheus | | - CI/CD | | | | - OpenTelemetry | | - Logs | | +==============+ | | - AWS Bedrock | +------------------+ | | Intelligence | | | - Vault | | | - Anomaly | | | - AWS Secrets | | | - RootCause | | +------------------+ | | - Capacity | | | | - Incident | | +----------+ | +==============+ | +--->| OTel | | +----+ | Collector| | +==============+ | +----------+ | | Policy (OPA) | | | | Audit Log | | | +==============+ | +---------+----------+ | +--------v---------+ | Apache Kafka | | :9092 | +------------------+ +------------------+ | Dashboard UI | | (React + nginx) | | :3000 | +--------+---------+ | /api/ proxy | +------------------+ +---------v---------+ +------------------+ | Collectors | | Kronveil Agent | | Integrations | | +--->+ +--->+ | | - Kubernetes | | REST API :8080 | | - Slack | | - Kafka | | gRPC API :9091 | | - PagerDuty | | - Cloud (AWS) | | Metrics :9090 | | - Prometheus | | - CI/CD | | | | - OpenTelemetry | | - Logs | | +==============+ | | - AWS Bedrock | +------------------+ | | Intelligence | | | - Vault | | | - Anomaly | | | - AWS Secrets | | | - RootCause | | +------------------+ | | - Capacity | | | | - Incident | | +----------+ | +==============+ | +--->| OTel | | +----+ | Collector| | +==============+ | +----------+ | | Policy (OPA) | | | | Audit Log | | | +==============+ | +---------+----------+ | +--------v---------+ | Apache Kafka | | :9092 | +------------------+ +------------------+ | Dashboard UI | | (React + nginx) | | :3000 | +--------+---------+ | /api/ proxy | +------------------+ +---------v---------+ +------------------+ | Collectors | | Kronveil Agent | | Integrations | | +--->+ +--->+ | | - Kubernetes | | REST API :8080 | | - Slack | | - Kafka | | gRPC API :9091 | | - PagerDuty | | - Cloud (AWS) | | Metrics :9090 | | - Prometheus | | - CI/CD | | | | - OpenTelemetry | | - Logs | | +==============+ | | - AWS Bedrock | +------------------+ | | Intelligence | | | - Vault | | | - Anomaly | | | - AWS Secrets | | | - RootCause | | +------------------+ | | - Capacity | | | | - Incident | | +----------+ | +==============+ | +--->| OTel | | +----+ | Collector| | +==============+ | +----------+ | | Policy (OPA) | | | | Audit Log | | | +==============+ | +---------+----------+ | +--------v---------+ | Apache Kafka | | :9092 | +------------------+ git clone https://github.com/kronveil/kronveil.git cd kronveil docker-compose -f deploy/docker-compose.yaml up --build -d # Open http://localhost:3000 git clone https://github.com/kronveil/kronveil.git cd kronveil docker-compose -f deploy/docker-compose.yaml up --build -d # Open http://localhost:3000 git clone https://github.com/kronveil/kronveil.git cd kronveil docker-compose -f deploy/docker-compose.yaml up --build -d # Open http://localhost:3000 - StreamEvents - Server-side streaming of real-time telemetry events with source and severity filtering - GetIncident / ListIncidents - Incident queries with status filtering - GetHealth - Component-level health reporting - Kubernetes auth method - TLS certificate lifecycle tracking - Secret caching for performance - Prefix-based secret organization (kronveil/ default) - Rotation monitoring with configurable windows (default 30 days) - Secret expiration tracking - Built-in caching layer - CloudWatch metrics for EC2, RDS, ELB, Lambda, S3 - Multi-region support with resource enumeration - Cost tracking per resource - Webhook-based pipeline monitoring - Job and step-level tracking with duration metrics - Repository filtering with webhook secret validation - File tailing with structured log parsing - JSON, logfmt, and raw text format support - Configurable error pattern matching (error, fatal, panic, OOM, killed) - Linear regression-based forecasting (default 30-day horizon) - Right-sizing recommendations: scale_up, scale_down, right_size, optimize - Days-to-capacity projection - Cost savings calculations with confidence intervals - Historical data retention (90 days default) - Open Policy Agent integration with Rego language - Default policies pre-loaded (compliance, security) - Resource evaluation against all enabled policies - Policy violation tracking with evaluation metrics - Standard Go runtime metrics (goroutines, memory, GC) - Custom Kronveil metrics: event counts per source, collector errors, policy evaluations, processing latency - Ready-to-use with Grafana dashboards - gRPC exporter to any OTLP-compatible endpoint (Jaeger, Tempo, Datadog, etc.) - Configurable export intervals (default 30s) - Span and trace propagation across the agent pipeline - Insecure mode for local development, TLS for production - Default endpoint: localhost:4317 - Incident triggering, acknowledgment, resolution - Deduplication keys for idempotent alerts - Severity mapping (critical, high, warning, info) - Links back to Kronveil dashboard - Event types: auth, incident, remediation, policy_change, config_change, secret_access, api_call - In-memory buffer with file sink - Structured JSON output via slog - Non-root containers (UID 1000) - Read-only root filesystem - Seccomp: RuntimeDefault - NetworkPolicy for ingress/egress - RBAC: ClusterRole with minimal permissions (pods, nodes, events, deployments) - Prometheus scrape annotations built-in - Liveness and readiness probes - Docker Desktop installed and running - ~2GB free RAM (Kafka needs memory) - Lint - golangci-lint v2 with staticcheck, errcheck, govet - Test - go test -race with 40% coverage threshold - Security Scan - govulncheck for Go stdlib/dependency CVEs - Build - Cross-compile with ldflags (version, commit, date) - Docker Build & Scan - Multi-stage build + Trivy vulnerability scan (CRITICAL/HIGH) - Dashboard - npm ci, ESLint, Vite production build - Helm Lint - Chart validation - Multi-cluster support - Federated monitoring across Kubernetes clusters - Custom collector SDK - Build your own collectors with a plugin interface - Runbook automation - Attach runbooks to incident types - Cost anomaly detection - Spot unexpected cloud spend spikes - Grafana dashboards - Pre-built dashboards for Kronveil Prometheus metrics - Mobile alerts - Push notifications via native apps