Tools: I Built a Dashboard in 30 Seconds with AI - Analysis

Tools: I Built a Dashboard in 30 Seconds with AI - Analysis

The Problem

It's Not Anomaly Detection. It's Something Simpler.

1. The Dashboard Request That Normally Kills Your Afternoon

2. Same Thing, Different Domain: Infrastructure

3. Proactive: Don't Wait Until Something Breaks

4. Something's Actually Broken: Root Cause Analysis

Beyond the UI: Take It to Your IDE

What This Actually Changes It's 2 AM. An alert fires. Cart service is throwing errors. You've got five minutes before someone escalates. The runbook says: "Check the dashboard. Look at the logs." But which dashboard? What query? You're half-asleep, the alert description tells you nothing useful, and now you're supposed to write SQL from scratch while someone in Slack asks "any update?" Most of us have been there. And most runbooks were written by someone who never had to use them under pressure. What if you could just type: "cart is throwing errors. find the root cause." and get a real answer? That's what I tested with the new AI Assistant in OpenObserve. Here's what happened. Most AI + observability discussions jump straight to anomaly detection or ML-powered forecasting. Those are interesting. But the thing that's actually changing how I work right now is simpler: an assistant embedded in the platform that lets me ask questions in plain English and get answers from my own production data. No SQL. No PromQL. Just describe what you want. I ran four real scenarios against live data from an otel-demo microservices app and a Kubernetes cluster. Here's how each one went. Someone from the business team asks for a dashboard. They don't know SQL. They don't know PromQL. They just want to see what's happening with nginx — request rate, how fast it's responding, how many errors. Normally this kills thirty minutes: finding the right log stream, writing queries, dragging panels, tweaking units. Thirty seconds later I had a production-ready dashboard. It picked the right log stream. It listed the relevant fields. It wrote the SQL queries. It chose appropriate visualizations — line chart for request rate, heatmap for latency distribution, stacked bar for status codes. These were real queries against actual data. Not a template. Here's what stuck with me: the person who asked for this could have done it themselves. They don't need to know what a PromQL query looks like. They just describe what they want to see. Application logs worked. But what about infrastructure? Completely different data source — Kubernetes metrics, not nginx logs. Same experience. The assistant figured out where the data lived, what metrics to pull, and how to visualize them. What impressed me was the panel design. Usage per node and cumulative across the cluster. Separate tabs for CPU, memory, and disk. It understood that "CPU per node" implies a time series grouped by host, not a single aggregate gauge. That's the kind of design decision a human SRE makes after looking at the data — and the assistant just did it. The assistant had enough context about the infrastructure to know what clusters were running and what hosts were connected. I didn't explain my setup. It already knew. Dashboards are great, but nobody wants to stare at them all day. I wanted to see if I could use the assistant proactively — scan everything, find problems before they escalate. This isn't asking for one dashboard or one service. It's saying: scan all services, tell me how we're doing, and if something looks off, lock in an alert so I'm covered. It checked error rates and latencies across every service. Found the ones running green, identified the ones that weren't. And for anything red — it created an alert. Right there. No configuration. No navigating to the alerts page. This is the kind of thing most teams only set up after an incident, during the postmortem, when someone says "we should have caught this earlier." One sentence and you're covered before the page goes off. Now the real test. The cart service in the otel-demo app is throwing errors. Not a synthetic scenario — a real incident. What happened next is worth breaking down step by step: Every step was visible. I could expand any tool call, see the exact query it ran, and verify the result. It's not a black box. It shows its work — and if I disagreed with where it was going, I could redirect it. Once I had the root cause, I stayed in the same conversation: Same context. Same conversation. Investigation to prevention in two sentences. That last part is what I keep coming back to. The assistant doesn't just help you find problems — it helps you lock in the fix so you don't get paged for the same thing at 3 AM next week. Here's the part that changes the workflow entirely. You don't have to be inside the OpenObserve UI to get this. OpenObserve exposes all of this through an MCP server. Connect your AI coding assistant (Claude Code, Cursor, whatever you use) directly to your production observability data. One command: That's it. Under five minutes. Now your IDE can query production logs, metrics, and traces. Debug a deploy from your terminal. Pull up a trace without leaving your editor. Check error rates during a code review. The assistant follows you wherever you work — not just inside the observability platform. There's been a lot of noise about AI in observability. Most of it falls into two camps: The thing that's working right now is neither of those. It's reducing the friction between "something is wrong" and "here's what I know." Not replacing your judgment. Not replacing your experience. Just removing the parts of incident response that feel like operating a query builder with one eye open at 2 AM. From "I need to see what's happening" to "I know what happened and we're covered next time" — in one conversation. Have you tried connecting AI assistants to your observability stack? What's working? What's still painful? Drop a comment — I'm genuinely curious what others are seeing. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

create a dashboard for my nginx logs showing request rate, latency percentiles, and 4xx vs 5xx errors. create a dashboard for my nginx logs showing request rate, latency percentiles, and 4xx vs 5xx errors. create a dashboard for my nginx logs showing request rate, latency percentiles, and 4xx vs 5xx errors. build a K8s host metrics dashboard showing CPU, memory, disk per node. build a K8s host metrics dashboard showing CPU, memory, disk per node. build a K8s host metrics dashboard showing CPU, memory, disk per node. what's the health of the otel-demo right now? if anything is red, create an alert. what's the health of the otel-demo right now? if anything is red, create an alert. what's the health of the otel-demo right now? if anything is red, create an alert. otel-demo app cart is throwing errors. find the root cause. otel-demo app cart is throwing errors. find the root cause. otel-demo app cart is throwing errors. find the root cause. alert me if cart error rate crosses 10 errors in 5 minutes. alert me if cart error rate crosses 10 errors in 5 minutes. alert me if cart error rate crosses 10 errors in 5 minutes. claude mcp add o2 https://api.openobserve.ai/api/default/mcp \ -t http \ --header "Authorization: Basic <YOUR_TOKEN>" claude mcp add o2 https://api.openobserve.ai/api/default/mcp \ -t http \ --header "Authorization: Basic <YOUR_TOKEN>" claude mcp add o2 https://api.openobserve.ai/api/default/mcp \ -t http \ --header "Authorization: Basic <YOUR_TOKEN>" - It searched across both logs and traces — not one or the other, both at once - It looked for errors in the last six hours and found none - It automatically widened the search window — I didn't tell it to do that - It identified the pattern: cart service failing on database writes under load - It showed me the exact traces, the error distribution over time, and the specific downstream call that was failing - Anomaly detection — useful in theory, unpredictable in practice, hard to trust - AI replaces on-call — not happening, and most engineers don't want it to - OpenObserve MCP Setup Guide - Integration with AI Tools Using MCP — Workshop - OpenObserve on GitHub