Tools: OpenTelemetry for Node.js: Distributed Tracing, Metrics, and Logs

Tools: OpenTelemetry for Node.js: Distributed Tracing, Metrics, and Logs

Introduction

1. Why Observability Matters: The Three Pillars

2. OpenTelemetry vs. Commercial APM Tools

3. Setting Up the OTel SDK in Node.js/Express

Installation

The Tracing Entrypoint

Starting Your App

The Express Server

4. Auto-Instrumentation for HTTP, DB, and Redis

Disabling Noisy Instrumentations

5. Custom Spans and Attributes

Nested Spans

Adding Span Events

6. Metrics: Counters, Histograms, and Gauges

Using Metrics in Middleware

7. Correlating Logs with Trace IDs

8. Exporting to Jaeger (Local Dev) and OTLP (Production)

Local Development with Jaeger

Production: OTel Collector + OTLP

9. Sampling Strategies

Head-Based Sampling (SDK-Level)

Always-Sample Errors (Tail-Based via Collector)

10. Visualizing Traces in Grafana Tempo

Full Stack with Docker Compose

What You Get

Putting It All Together

Conclusion Modern backend systems are rarely a single process. A single user request might touch an API gateway, three microservices, a PostgreSQL database, a Redis cache, and an external payment provider — all in under 200 milliseconds. When something goes wrong (and it will), you need to know exactly where. Traditional logging — console.log("request received") — doesn't cut it here. You need observability: the ability to ask arbitrary questions about your system's behavior from the outside, without modifying the code. OpenTelemetry (OTel) is the open-source standard that gives you that power. It's vendor-neutral, CNCF-graduated, and has become the de facto way to instrument Node.js services. This guide walks you through everything: setting up the SDK, auto-instrumenting Express and database drivers, writing custom spans, collecting metrics, correlating logs with trace IDs, and shipping data to Jaeger or Grafana Tempo. Observability is built on three complementary signals: Traces answer where did time go? A trace is a directed acyclic graph of spans, each representing a unit of work. A root span covers the entire HTTP request; child spans cover the database query, the Redis lookup, the downstream API call. Traces reveal latency hotspots and error propagation paths. Metrics answer how is the system behaving right now? Request rates, error rates, p99 latency, queue depth, memory usage. Metrics are cheap to store and great for dashboards and alerting. Logs answer what exactly happened? Structured log lines with timestamps and context. When correlated with a trace ID, a log line becomes surgically precise — you can jump straight from a metric alert to the exact trace to the exact log line that caused it. Without all three signals connected, you're debugging in the dark. Before OTel, every APM vendor (Datadog, New Relic, Dynatrace) had a proprietary agent you'd install and be permanently coupled to. Switching vendors meant re-instrumenting your entire codebase. OpenTelemetry changes that: OTel doesn't replace Datadog entirely — Datadog still has excellent UX and ML-based anomaly detection. But OTel lets you choose your backend, or even fan out to multiple backends simultaneously. Your instrumentation code is write-once. Create tracing.js and require it before anything else. This is critical — OTel patches modules at import time, so it must run first. Or using the --require flag directly: With tracing.js loaded, every HTTP request, Postgres query, and Redis command is automatically captured as a span — zero additional code required. getNodeAutoInstrumentations() wraps over 40 popular libraries. Here's what you get for free: HTTP/Express: Every inbound request becomes a root span with attributes like http.method, http.route, http.status_code, http.url. Every outbound https.request() or axios/fetch call becomes a child span with the remote URL and status. PostgreSQL (pg): Every pool.query() call becomes a span with db.system=postgresql, db.statement (the SQL), and db.name. You'll see the exact query text in your trace. Redis: Every Redis command (GET, SET, HGET, etc.) becomes a span with db.system=redis and db.statement. Other auto-instrumented libraries include: mysql2, mongodb, grpc-js, graphql, kafkajs, aws-sdk, ioredis, knex, typeorm, and many more. The filesystem instrumentation (@opentelemetry/instrumentation-fs) creates a span for every fs.readFile call, which is typically too noisy. Disable it explicitly as shown in the SDK setup above. Auto-instrumentation is great, but business logic needs custom spans. Use the trace API to create them. Child spans are automatically associated with the parent when created inside startActiveSpan: Span events are timestamped annotations within a span — useful for checkpoints: OpenTelemetry Metrics gives you the three instrument types you need for production dashboards. Now Prometheus (or any OTLP-compatible backend) receives per-route request counts and latency histograms, enabling p50/p95/p99 latency dashboards. The power of the three pillars comes from connecting them. When a log line carries the same traceId and spanId as the trace you're investigating, you can jump directly between them. OpenTelemetry makes this trivial with the context API: This produces logs like: In Grafana, you can click traceId in a log line and jump directly to the Tempo trace. Or pivot from a slow trace to its correlated logs. This is the three-pillar payoff. Run Jaeger all-in-one with Docker Compose: Make a few requests to your Express app, then open http://localhost:16686. Select order-service from the dropdown. You'll see a waterfall of every request with its child spans — HTTP, Postgres, Redis — with exact durations. For production, route telemetry through the OpenTelemetry Collector. It handles batching, retries, filtering, and fan-out to multiple backends. Your Node.js service just points to the collector: The collector handles everything downstream — you can swap backends without touching application code. Sampling is essential. In high-traffic production systems, recording every single trace is prohibitively expensive. The goal is to capture enough data to debug issues without burning storage and money. Decide at the start of a request whether to record it: ParentBasedSampler is critical in microservices — if Service A decides to sample a trace, Service B will continue sampling it even at a lower rate. This keeps traces complete. The collector's tailsampling processor makes sampling decisions after seeing the full trace — enabling you to always keep error traces: This is the gold standard for production sampling — you never miss an interesting trace. Grafana Tempo is the open-source distributed tracing backend that integrates natively with Grafana dashboards. It's built for scale: storing traces in object storage (S3/GCS) at a fraction of the cost of Jaeger's Elasticsearch backend. In Grafana's provisioning config, link Tempo and Prometheus: With this stack running, Grafana gives you: The TraceQL query language (Grafana Tempo's trace query DSL) lets you filter traces programmatically: This returns a time series of error rates across all spans matching those conditions — a metric derived directly from trace data, without separate instrumentation. Here's a production-ready tracing.js that handles environment-based configuration: Set these environment variables in your deployment: OpenTelemetry in Node.js has reached production maturity. The auto-instrumentation layer handles the heavy lifting for HTTP, databases, and cache — giving you detailed traces from day one. Custom spans let you annotate your business logic with the context that matters. Metrics and correlated logs complete the observability picture. The vendor-neutral design is the real win: instrument once, export anywhere. Start with Jaeger locally to get familiar with traces. Graduate to the OTel Collector + Grafana Tempo + Prometheus stack for production. When (or if) you need to add Datadog or New Relic for specific features, you add a collector exporter — your application code doesn't change. The full stack outlined here costs nothing to run on your own infrastructure except compute. For most teams, that means replacing $1,000+/month APM bills with a self-hosted stack that gives you more control and equal visibility. Start with node -r ./tracing.js server.js. You'll have your first traces in under five minutes. Wilson Xu is a backend engineer specializing in distributed systems and developer tooling. He writes about Node.js, observability, and cloud-native infrastructure. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ -weight: 500;">npm -weight: 500;">install \ @opentelemetry/sdk-node \ @opentelemetry/auto-instrumentations-node \ @opentelemetry/exporter-trace-otlp-http \ @opentelemetry/exporter-prometheus \ @opentelemetry/sdk-metrics \ @opentelemetry/semantic-conventions -weight: 500;">npm -weight: 500;">install \ @opentelemetry/sdk-node \ @opentelemetry/auto-instrumentations-node \ @opentelemetry/exporter-trace-otlp-http \ @opentelemetry/exporter-prometheus \ @opentelemetry/sdk-metrics \ @opentelemetry/semantic-conventions -weight: 500;">npm -weight: 500;">install \ @opentelemetry/sdk-node \ @opentelemetry/auto-instrumentations-node \ @opentelemetry/exporter-trace-otlp-http \ @opentelemetry/exporter-prometheus \ @opentelemetry/sdk-metrics \ @opentelemetry/semantic-conventions // tracing.js 'use strict'; const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http'); const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: 'order--weight: 500;">service', [SemanticResourceAttributes.SERVICE_VERSION]: '1.4.2', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const traceExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT || 'http://localhost:4318/v1/traces', }); const metricExporter = new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT || 'http://localhost:4318/v1/metrics', }); const sdk = new NodeSDK({ resource, traceExporter, metricReader: new PeriodicExportingMetricReader({ exporter: metricExporter, exportIntervalMillis: 15_000, // export every 15 seconds }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy '@opentelemetry/instrumentation-http': { enabled: true }, '@opentelemetry/instrumentation-express': { enabled: true }, '@opentelemetry/instrumentation-pg': { enabled: true }, '@opentelemetry/instrumentation-redis': { enabled: true }, }), ], }); sdk.-weight: 500;">start(); process.on('SIGTERM', () => { sdk.shutdown() .then(() => console.log('OTel SDK shut down')) .catch(err => console.error('Error shutting down OTel SDK', err)) .finally(() => process.exit(0)); }); // tracing.js 'use strict'; const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http'); const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: 'order--weight: 500;">service', [SemanticResourceAttributes.SERVICE_VERSION]: '1.4.2', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const traceExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT || 'http://localhost:4318/v1/traces', }); const metricExporter = new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT || 'http://localhost:4318/v1/metrics', }); const sdk = new NodeSDK({ resource, traceExporter, metricReader: new PeriodicExportingMetricReader({ exporter: metricExporter, exportIntervalMillis: 15_000, // export every 15 seconds }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy '@opentelemetry/instrumentation-http': { enabled: true }, '@opentelemetry/instrumentation-express': { enabled: true }, '@opentelemetry/instrumentation-pg': { enabled: true }, '@opentelemetry/instrumentation-redis': { enabled: true }, }), ], }); sdk.-weight: 500;">start(); process.on('SIGTERM', () => { sdk.shutdown() .then(() => console.log('OTel SDK shut down')) .catch(err => console.error('Error shutting down OTel SDK', err)) .finally(() => process.exit(0)); }); // tracing.js 'use strict'; const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http'); const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: 'order--weight: 500;">service', [SemanticResourceAttributes.SERVICE_VERSION]: '1.4.2', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const traceExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT || 'http://localhost:4318/v1/traces', }); const metricExporter = new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT || 'http://localhost:4318/v1/metrics', }); const sdk = new NodeSDK({ resource, traceExporter, metricReader: new PeriodicExportingMetricReader({ exporter: metricExporter, exportIntervalMillis: 15_000, // export every 15 seconds }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy '@opentelemetry/instrumentation-http': { enabled: true }, '@opentelemetry/instrumentation-express': { enabled: true }, '@opentelemetry/instrumentation-pg': { enabled: true }, '@opentelemetry/instrumentation-redis': { enabled: true }, }), ], }); sdk.-weight: 500;">start(); process.on('SIGTERM', () => { sdk.shutdown() .then(() => console.log('OTel SDK shut down')) .catch(err => console.error('Error shutting down OTel SDK', err)) .finally(() => process.exit(0)); }); // package.json { "scripts": { "-weight: 500;">start": "node -r ./tracing.js server.js" } } // package.json { "scripts": { "-weight: 500;">start": "node -r ./tracing.js server.js" } } // package.json { "scripts": { "-weight: 500;">start": "node -r ./tracing.js server.js" } } node --require ./tracing.js server.js node --require ./tracing.js server.js node --require ./tracing.js server.js // server.js const express = require('express'); const { Pool } = require('pg'); const redis = require('redis'); const app = express(); const pool = new Pool({ connectionString: process.env.DATABASE_URL }); const redisClient = redis.createClient({ url: process.env.REDIS_URL }); redisClient.connect(); app.get('/orders/:id', async (req, res) => { const { id } = req.params; // Redis cache lookup const cached = await redisClient.get(`order:${id}`); if (cached) { return res.json(JSON.parse(cached)); } // Postgres query const { rows } = await pool.query('SELECT * FROM orders WHERE id = $1', [id]); if (!rows.length) return res.-weight: 500;">status(404).json({ error: 'not found' }); await redisClient.setEx(`order:${id}`, 300, JSON.stringify(rows[0])); res.json(rows[0]); }); app.listen(3000, () => console.log('Listening on :3000')); // server.js const express = require('express'); const { Pool } = require('pg'); const redis = require('redis'); const app = express(); const pool = new Pool({ connectionString: process.env.DATABASE_URL }); const redisClient = redis.createClient({ url: process.env.REDIS_URL }); redisClient.connect(); app.get('/orders/:id', async (req, res) => { const { id } = req.params; // Redis cache lookup const cached = await redisClient.get(`order:${id}`); if (cached) { return res.json(JSON.parse(cached)); } // Postgres query const { rows } = await pool.query('SELECT * FROM orders WHERE id = $1', [id]); if (!rows.length) return res.-weight: 500;">status(404).json({ error: 'not found' }); await redisClient.setEx(`order:${id}`, 300, JSON.stringify(rows[0])); res.json(rows[0]); }); app.listen(3000, () => console.log('Listening on :3000')); // server.js const express = require('express'); const { Pool } = require('pg'); const redis = require('redis'); const app = express(); const pool = new Pool({ connectionString: process.env.DATABASE_URL }); const redisClient = redis.createClient({ url: process.env.REDIS_URL }); redisClient.connect(); app.get('/orders/:id', async (req, res) => { const { id } = req.params; // Redis cache lookup const cached = await redisClient.get(`order:${id}`); if (cached) { return res.json(JSON.parse(cached)); } // Postgres query const { rows } = await pool.query('SELECT * FROM orders WHERE id = $1', [id]); if (!rows.length) return res.-weight: 500;">status(404).json({ error: 'not found' }); await redisClient.setEx(`order:${id}`, 300, JSON.stringify(rows[0])); res.json(rows[0]); }); app.listen(3000, () => console.log('Listening on :3000')); const { trace, SpanStatusCode } = require('@opentelemetry/api'); const tracer = trace.getTracer('order--weight: 500;">service', '1.4.2'); async function processPayment(orderId, amount, currency) { // Create a custom span return tracer.startActiveSpan('payment.process', async (span) => { try { // Add semantic attributes span.setAttributes({ 'order.id': orderId, 'payment.amount': amount, 'payment.currency': currency, 'payment.provider': 'stripe', }); const result = await stripeClient.charges.create({ amount: amount * 100, currency, source: await getPaymentToken(orderId), }); span.setAttributes({ 'payment.charge_id': result.id, 'payment.-weight: 500;">status': result.-weight: 500;">status, }); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (err) { // Record the exception — this adds a span event with the stack trace span.recordException(err); span.setStatus({ code: SpanStatusCode.ERROR, message: err.message, }); throw err; } finally { span.end(); } }); } const { trace, SpanStatusCode } = require('@opentelemetry/api'); const tracer = trace.getTracer('order--weight: 500;">service', '1.4.2'); async function processPayment(orderId, amount, currency) { // Create a custom span return tracer.startActiveSpan('payment.process', async (span) => { try { // Add semantic attributes span.setAttributes({ 'order.id': orderId, 'payment.amount': amount, 'payment.currency': currency, 'payment.provider': 'stripe', }); const result = await stripeClient.charges.create({ amount: amount * 100, currency, source: await getPaymentToken(orderId), }); span.setAttributes({ 'payment.charge_id': result.id, 'payment.-weight: 500;">status': result.-weight: 500;">status, }); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (err) { // Record the exception — this adds a span event with the stack trace span.recordException(err); span.setStatus({ code: SpanStatusCode.ERROR, message: err.message, }); throw err; } finally { span.end(); } }); } const { trace, SpanStatusCode } = require('@opentelemetry/api'); const tracer = trace.getTracer('order--weight: 500;">service', '1.4.2'); async function processPayment(orderId, amount, currency) { // Create a custom span return tracer.startActiveSpan('payment.process', async (span) => { try { // Add semantic attributes span.setAttributes({ 'order.id': orderId, 'payment.amount': amount, 'payment.currency': currency, 'payment.provider': 'stripe', }); const result = await stripeClient.charges.create({ amount: amount * 100, currency, source: await getPaymentToken(orderId), }); span.setAttributes({ 'payment.charge_id': result.id, 'payment.-weight: 500;">status': result.-weight: 500;">status, }); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (err) { // Record the exception — this adds a span event with the stack trace span.recordException(err); span.setStatus({ code: SpanStatusCode.ERROR, message: err.message, }); throw err; } finally { span.end(); } }); } async function fulfillOrder(orderId) { return tracer.startActiveSpan('order.fulfill', async (parentSpan) => { parentSpan.setAttribute('order.id', orderId); // This span is automatically a child of order.fulfill const payment = await processPayment(orderId, 99.99, 'usd'); // Another child span await tracer.startActiveSpan('order.notify', async (notifySpan) => { await sendConfirmationEmail(orderId); notifySpan.end(); }); parentSpan.end(); return { orderId, payment }; }); } async function fulfillOrder(orderId) { return tracer.startActiveSpan('order.fulfill', async (parentSpan) => { parentSpan.setAttribute('order.id', orderId); // This span is automatically a child of order.fulfill const payment = await processPayment(orderId, 99.99, 'usd'); // Another child span await tracer.startActiveSpan('order.notify', async (notifySpan) => { await sendConfirmationEmail(orderId); notifySpan.end(); }); parentSpan.end(); return { orderId, payment }; }); } async function fulfillOrder(orderId) { return tracer.startActiveSpan('order.fulfill', async (parentSpan) => { parentSpan.setAttribute('order.id', orderId); // This span is automatically a child of order.fulfill const payment = await processPayment(orderId, 99.99, 'usd'); // Another child span await tracer.startActiveSpan('order.notify', async (notifySpan) => { await sendConfirmationEmail(orderId); notifySpan.end(); }); parentSpan.end(); return { orderId, payment }; }); } span.addEvent('cache.miss', { 'cache.key': `order:${id}` }); span.addEvent('db.query.-weight: 500;">start'); // ... query executes ... span.addEvent('db.query.complete', { 'db.rows_returned': rows.length }); span.addEvent('cache.miss', { 'cache.key': `order:${id}` }); span.addEvent('db.query.-weight: 500;">start'); // ... query executes ... span.addEvent('db.query.complete', { 'db.rows_returned': rows.length }); span.addEvent('cache.miss', { 'cache.key': `order:${id}` }); span.addEvent('db.query.-weight: 500;">start'); // ... query executes ... span.addEvent('db.query.complete', { 'db.rows_returned': rows.length }); // metrics.js const { metrics } = require('@opentelemetry/api'); const meter = metrics.getMeter('order--weight: 500;">service', '1.4.2'); // Counter: monotonically increasing (requests, errors, events) const requestCounter = meter.createCounter('http.requests.total', { description: 'Total number of HTTP requests', }); // Histogram: distribution of values (latency, payload size) const latencyHistogram = meter.createHistogram('http.request.duration_ms', { description: 'HTTP request latency in milliseconds', unit: 'ms', advice: { explicitBucketBoundaries: [5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000], }, }); // UpDownCounter: can go up or down (queue depth, active connections) const activeConnections = meter.createUpDownCounter('db.connections.active', { description: 'Active database connections', }); // Observable Gauge: sampled on demand (CPU, memory — use callbacks) const memoryGauge = meter.createObservableGauge('process.memory_mb', { description: 'Process memory usage in MB', }); memoryGauge.addCallback((observableResult) => { const usage = process.memoryUsage(); observableResult.observe(usage.heapUsed / 1024 / 1024, { type: 'heap' }); observableResult.observe(usage.rss / 1024 / 1024, { type: 'rss' }); }); module.exports = { requestCounter, latencyHistogram, activeConnections }; // metrics.js const { metrics } = require('@opentelemetry/api'); const meter = metrics.getMeter('order--weight: 500;">service', '1.4.2'); // Counter: monotonically increasing (requests, errors, events) const requestCounter = meter.createCounter('http.requests.total', { description: 'Total number of HTTP requests', }); // Histogram: distribution of values (latency, payload size) const latencyHistogram = meter.createHistogram('http.request.duration_ms', { description: 'HTTP request latency in milliseconds', unit: 'ms', advice: { explicitBucketBoundaries: [5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000], }, }); // UpDownCounter: can go up or down (queue depth, active connections) const activeConnections = meter.createUpDownCounter('db.connections.active', { description: 'Active database connections', }); // Observable Gauge: sampled on demand (CPU, memory — use callbacks) const memoryGauge = meter.createObservableGauge('process.memory_mb', { description: 'Process memory usage in MB', }); memoryGauge.addCallback((observableResult) => { const usage = process.memoryUsage(); observableResult.observe(usage.heapUsed / 1024 / 1024, { type: 'heap' }); observableResult.observe(usage.rss / 1024 / 1024, { type: 'rss' }); }); module.exports = { requestCounter, latencyHistogram, activeConnections }; // metrics.js const { metrics } = require('@opentelemetry/api'); const meter = metrics.getMeter('order--weight: 500;">service', '1.4.2'); // Counter: monotonically increasing (requests, errors, events) const requestCounter = meter.createCounter('http.requests.total', { description: 'Total number of HTTP requests', }); // Histogram: distribution of values (latency, payload size) const latencyHistogram = meter.createHistogram('http.request.duration_ms', { description: 'HTTP request latency in milliseconds', unit: 'ms', advice: { explicitBucketBoundaries: [5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000], }, }); // UpDownCounter: can go up or down (queue depth, active connections) const activeConnections = meter.createUpDownCounter('db.connections.active', { description: 'Active database connections', }); // Observable Gauge: sampled on demand (CPU, memory — use callbacks) const memoryGauge = meter.createObservableGauge('process.memory_mb', { description: 'Process memory usage in MB', }); memoryGauge.addCallback((observableResult) => { const usage = process.memoryUsage(); observableResult.observe(usage.heapUsed / 1024 / 1024, { type: 'heap' }); observableResult.observe(usage.rss / 1024 / 1024, { type: 'rss' }); }); module.exports = { requestCounter, latencyHistogram, activeConnections }; // middleware/metrics.js const { requestCounter, latencyHistogram } = require('../metrics'); function metricsMiddleware(req, res, next) { const -weight: 500;">start = Date.now(); res.on('finish', () => { const duration = Date.now() - -weight: 500;">start; const labels = { method: req.method, route: req.route?.path || 'unknown', status_code: String(res.statusCode), }; requestCounter.add(1, labels); latencyHistogram.record(duration, labels); }); next(); } module.exports = metricsMiddleware; // middleware/metrics.js const { requestCounter, latencyHistogram } = require('../metrics'); function metricsMiddleware(req, res, next) { const -weight: 500;">start = Date.now(); res.on('finish', () => { const duration = Date.now() - -weight: 500;">start; const labels = { method: req.method, route: req.route?.path || 'unknown', status_code: String(res.statusCode), }; requestCounter.add(1, labels); latencyHistogram.record(duration, labels); }); next(); } module.exports = metricsMiddleware; // middleware/metrics.js const { requestCounter, latencyHistogram } = require('../metrics'); function metricsMiddleware(req, res, next) { const -weight: 500;">start = Date.now(); res.on('finish', () => { const duration = Date.now() - -weight: 500;">start; const labels = { method: req.method, route: req.route?.path || 'unknown', status_code: String(res.statusCode), }; requestCounter.add(1, labels); latencyHistogram.record(duration, labels); }); next(); } module.exports = metricsMiddleware; // server.js app.use(require('./middleware/metrics')); // server.js app.use(require('./middleware/metrics')); // server.js app.use(require('./middleware/metrics')); // logger.js — structured logger with automatic trace correlation const { trace, context } = require('@opentelemetry/api'); function getTraceContext() { const span = trace.getActiveSpan(); if (!span) return {}; const { traceId, spanId, traceFlags } = span.spanContext(); return { traceId, spanId, traceSampled: (traceFlags & 0x01) === 1, }; } const logger = { info(message, extra = {}) { console.log(JSON.stringify({ level: 'info', message, timestamp: new Date().toISOString(), -weight: 500;">service: 'order--weight: 500;">service', ...getTraceContext(), ...extra, })); }, error(message, err, extra = {}) { console.error(JSON.stringify({ level: 'error', message, timestamp: new Date().toISOString(), -weight: 500;">service: 'order--weight: 500;">service', error: { name: err?.name, message: err?.message, stack: err?.stack }, ...getTraceContext(), ...extra, })); }, }; module.exports = logger; // logger.js — structured logger with automatic trace correlation const { trace, context } = require('@opentelemetry/api'); function getTraceContext() { const span = trace.getActiveSpan(); if (!span) return {}; const { traceId, spanId, traceFlags } = span.spanContext(); return { traceId, spanId, traceSampled: (traceFlags & 0x01) === 1, }; } const logger = { info(message, extra = {}) { console.log(JSON.stringify({ level: 'info', message, timestamp: new Date().toISOString(), -weight: 500;">service: 'order--weight: 500;">service', ...getTraceContext(), ...extra, })); }, error(message, err, extra = {}) { console.error(JSON.stringify({ level: 'error', message, timestamp: new Date().toISOString(), -weight: 500;">service: 'order--weight: 500;">service', error: { name: err?.name, message: err?.message, stack: err?.stack }, ...getTraceContext(), ...extra, })); }, }; module.exports = logger; // logger.js — structured logger with automatic trace correlation const { trace, context } = require('@opentelemetry/api'); function getTraceContext() { const span = trace.getActiveSpan(); if (!span) return {}; const { traceId, spanId, traceFlags } = span.spanContext(); return { traceId, spanId, traceSampled: (traceFlags & 0x01) === 1, }; } const logger = { info(message, extra = {}) { console.log(JSON.stringify({ level: 'info', message, timestamp: new Date().toISOString(), -weight: 500;">service: 'order--weight: 500;">service', ...getTraceContext(), ...extra, })); }, error(message, err, extra = {}) { console.error(JSON.stringify({ level: 'error', message, timestamp: new Date().toISOString(), -weight: 500;">service: 'order--weight: 500;">service', error: { name: err?.name, message: err?.message, stack: err?.stack }, ...getTraceContext(), ...extra, })); }, }; module.exports = logger; // In your route handler const logger = require('./logger'); app.get('/orders/:id', async (req, res) => { logger.info('Fetching order', { orderId: req.params.id }); try { const order = await getOrder(req.params.id); logger.info('Order retrieved', { orderId: req.params.id, -weight: 500;">status: order.-weight: 500;">status }); res.json(order); } catch (err) { logger.error('Failed to fetch order', err, { orderId: req.params.id }); res.-weight: 500;">status(500).json({ error: 'internal error' }); } }); // In your route handler const logger = require('./logger'); app.get('/orders/:id', async (req, res) => { logger.info('Fetching order', { orderId: req.params.id }); try { const order = await getOrder(req.params.id); logger.info('Order retrieved', { orderId: req.params.id, -weight: 500;">status: order.-weight: 500;">status }); res.json(order); } catch (err) { logger.error('Failed to fetch order', err, { orderId: req.params.id }); res.-weight: 500;">status(500).json({ error: 'internal error' }); } }); // In your route handler const logger = require('./logger'); app.get('/orders/:id', async (req, res) => { logger.info('Fetching order', { orderId: req.params.id }); try { const order = await getOrder(req.params.id); logger.info('Order retrieved', { orderId: req.params.id, -weight: 500;">status: order.-weight: 500;">status }); res.json(order); } catch (err) { logger.error('Failed to fetch order', err, { orderId: req.params.id }); res.-weight: 500;">status(500).json({ error: 'internal error' }); } }); { "level": "info", "message": "Order retrieved", "timestamp": "2026-03-22T14:23:01.882Z", "-weight: 500;">service": "order--weight: 500;">service", "traceId": "3e8a1b2c4d5e6f7a8b9c0d1e2f3a4b5c", "spanId": "a1b2c3d4e5f6a7b8", "traceSampled": true, "orderId": "ord_9182", "-weight: 500;">status": "shipped" } { "level": "info", "message": "Order retrieved", "timestamp": "2026-03-22T14:23:01.882Z", "-weight: 500;">service": "order--weight: 500;">service", "traceId": "3e8a1b2c4d5e6f7a8b9c0d1e2f3a4b5c", "spanId": "a1b2c3d4e5f6a7b8", "traceSampled": true, "orderId": "ord_9182", "-weight: 500;">status": "shipped" } { "level": "info", "message": "Order retrieved", "timestamp": "2026-03-22T14:23:01.882Z", "-weight: 500;">service": "order--weight: 500;">service", "traceId": "3e8a1b2c4d5e6f7a8b9c0d1e2f3a4b5c", "spanId": "a1b2c3d4e5f6a7b8", "traceSampled": true, "orderId": "ord_9182", "-weight: 500;">status": "shipped" } # -weight: 500;">docker-compose.yml version: '3.8' services: jaeger: image: jaegertracing/all-in-one:1.54 ports: - "16686:16686" # Jaeger UI - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP environment: - COLLECTOR_OTLP_ENABLED=true app: build: . environment: - OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://jaeger:4318/v1/traces - NODE_ENV=development depends_on: - jaeger # -weight: 500;">docker-compose.yml version: '3.8' services: jaeger: image: jaegertracing/all-in-one:1.54 ports: - "16686:16686" # Jaeger UI - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP environment: - COLLECTOR_OTLP_ENABLED=true app: build: . environment: - OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://jaeger:4318/v1/traces - NODE_ENV=development depends_on: - jaeger # -weight: 500;">docker-compose.yml version: '3.8' services: jaeger: image: jaegertracing/all-in-one:1.54 ports: - "16686:16686" # Jaeger UI - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP environment: - COLLECTOR_OTLP_ENABLED=true app: build: . environment: - OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://jaeger:4318/v1/traces - NODE_ENV=development depends_on: - jaeger -weight: 500;">docker-compose up -d # App runs on :3000, Jaeger UI at http://localhost:16686 -weight: 500;">docker-compose up -d # App runs on :3000, Jaeger UI at http://localhost:16686 -weight: 500;">docker-compose up -d # App runs on :3000, Jaeger UI at http://localhost:16686 # otel-collector-config.yaml receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: timeout: 1s send_batch_size: 1024 memory_limiter: check_interval: 1s limit_mib: 512 exporters: otlp/tempo: endpoint: tempo:4317 tls: insecure: true prometheusremotewrite: endpoint: http://prometheus:9090/api/v1/write logging: loglevel: warn -weight: 500;">service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp/tempo] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] # otel-collector-config.yaml receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: timeout: 1s send_batch_size: 1024 memory_limiter: check_interval: 1s limit_mib: 512 exporters: otlp/tempo: endpoint: tempo:4317 tls: insecure: true prometheusremotewrite: endpoint: http://prometheus:9090/api/v1/write logging: loglevel: warn -weight: 500;">service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp/tempo] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] # otel-collector-config.yaml receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: timeout: 1s send_batch_size: 1024 memory_limiter: check_interval: 1s limit_mib: 512 exporters: otlp/tempo: endpoint: tempo:4317 tls: insecure: true prometheusremotewrite: endpoint: http://prometheus:9090/api/v1/write logging: loglevel: warn -weight: 500;">service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp/tempo] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 const { TraceIdRatioBasedSampler, ParentBasedSampler } = require('@opentelemetry/sdk-trace-base'); // Sample 10% of traces, but always respect parent sampling decision const sampler = new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(0.1), // 10% sampling rate }); const sdk = new NodeSDK({ sampler, // ...rest of config }); const { TraceIdRatioBasedSampler, ParentBasedSampler } = require('@opentelemetry/sdk-trace-base'); // Sample 10% of traces, but always respect parent sampling decision const sampler = new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(0.1), // 10% sampling rate }); const sdk = new NodeSDK({ sampler, // ...rest of config }); const { TraceIdRatioBasedSampler, ParentBasedSampler } = require('@opentelemetry/sdk-trace-base'); // Sample 10% of traces, but always respect parent sampling decision const sampler = new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(0.1), // 10% sampling rate }); const sdk = new NodeSDK({ sampler, // ...rest of config }); # otel-collector-config.yaml (tail sampling) processors: tail_sampling: decision_wait: 10s num_traces: 50000 expected_new_traces_per_sec: 1000 policies: - name: errors-policy type: status_code status_code: { status_codes: [ERROR] } - name: slow-traces-policy type: latency latency: { threshold_ms: 2000 } - name: random-policy type: probabilistic probabilistic: { sampling_percentage: 5 } # otel-collector-config.yaml (tail sampling) processors: tail_sampling: decision_wait: 10s num_traces: 50000 expected_new_traces_per_sec: 1000 policies: - name: errors-policy type: status_code status_code: { status_codes: [ERROR] } - name: slow-traces-policy type: latency latency: { threshold_ms: 2000 } - name: random-policy type: probabilistic probabilistic: { sampling_percentage: 5 } # otel-collector-config.yaml (tail sampling) processors: tail_sampling: decision_wait: 10s num_traces: 50000 expected_new_traces_per_sec: 1000 policies: - name: errors-policy type: status_code status_code: { status_codes: [ERROR] } - name: slow-traces-policy type: latency latency: { threshold_ms: 2000 } - name: random-policy type: probabilistic probabilistic: { sampling_percentage: 5 } # -weight: 500;">docker-compose.prod-local.yml version: '3.8' services: tempo: image: grafana/tempo:2.3.1 command: ["-config.file=/etc/tempo.yaml"] volumes: - ./tempo.yaml:/etc/tempo.yaml - tempo-data:/var/tempo ports: - "4317:4317" # OTLP gRPC - "3200:3200" # Tempo query API prometheus: image: prom/prometheus:v2.48.0 volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" grafana: image: grafana/grafana:10.2.2 ports: - "3001:3000" environment: - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin volumes: - ./grafana/provisioning:/etc/grafana/provisioning volumes: tempo-data: # -weight: 500;">docker-compose.prod-local.yml version: '3.8' services: tempo: image: grafana/tempo:2.3.1 command: ["-config.file=/etc/tempo.yaml"] volumes: - ./tempo.yaml:/etc/tempo.yaml - tempo-data:/var/tempo ports: - "4317:4317" # OTLP gRPC - "3200:3200" # Tempo query API prometheus: image: prom/prometheus:v2.48.0 volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" grafana: image: grafana/grafana:10.2.2 ports: - "3001:3000" environment: - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin volumes: - ./grafana/provisioning:/etc/grafana/provisioning volumes: tempo-data: # -weight: 500;">docker-compose.prod-local.yml version: '3.8' services: tempo: image: grafana/tempo:2.3.1 command: ["-config.file=/etc/tempo.yaml"] volumes: - ./tempo.yaml:/etc/tempo.yaml - tempo-data:/var/tempo ports: - "4317:4317" # OTLP gRPC - "3200:3200" # Tempo query API prometheus: image: prom/prometheus:v2.48.0 volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" grafana: image: grafana/grafana:10.2.2 ports: - "3001:3000" environment: - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin volumes: - ./grafana/provisioning:/etc/grafana/provisioning volumes: tempo-data: # tempo.yaml server: http_listen_port: 3200 distributor: receivers: otlp: protocols: grpc: http: storage: trace: backend: local local: path: /var/tempo/traces wal: path: /var/tempo/wal # tempo.yaml server: http_listen_port: 3200 distributor: receivers: otlp: protocols: grpc: http: storage: trace: backend: local local: path: /var/tempo/traces wal: path: /var/tempo/wal # tempo.yaml server: http_listen_port: 3200 distributor: receivers: otlp: protocols: grpc: http: storage: trace: backend: local local: path: /var/tempo/traces wal: path: /var/tempo/wal # grafana/provisioning/datasources/datasources.yaml apiVersion: 1 datasources: - name: Tempo type: tempo url: http://tempo:3200 jsonData: tracesToLogsV2: datasourceUid: loki serviceMap: datasourceUid: prometheus nodeGraph: enabled: true - name: Prometheus type: prometheus url: http://prometheus:9090 uid: prometheus # grafana/provisioning/datasources/datasources.yaml apiVersion: 1 datasources: - name: Tempo type: tempo url: http://tempo:3200 jsonData: tracesToLogsV2: datasourceUid: loki serviceMap: datasourceUid: prometheus nodeGraph: enabled: true - name: Prometheus type: prometheus url: http://prometheus:9090 uid: prometheus # grafana/provisioning/datasources/datasources.yaml apiVersion: 1 datasources: - name: Tempo type: tempo url: http://tempo:3200 jsonData: tracesToLogsV2: datasourceUid: loki serviceMap: datasourceUid: prometheus nodeGraph: enabled: true - name: Prometheus type: prometheus url: http://prometheus:9090 uid: prometheus { resource.-weight: 500;">service.name = "order--weight: 500;">service" && span.http.status_code >= 500 } | rate() { resource.-weight: 500;">service.name = "order--weight: 500;">service" && span.http.status_code >= 500 } | rate() { resource.-weight: 500;">service.name = "order--weight: 500;">service" && span.http.status_code >= 500 } | rate() // tracing.js — production-ready 'use strict'; const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http'); const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base'); const isProd = process.env.NODE_ENV === 'production'; const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'my--weight: 500;">service', [SemanticResourceAttributes.SERVICE_VERSION]: process.env.npm_package_version || '0.0.0', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const sdk = new NodeSDK({ resource, sampler: new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(isProd ? 0.1 : 1.0), // 100% in dev, 10% in prod }), traceExporter: new OTLPTraceExporter(), // uses OTEL_EXPORTER_OTLP_ENDPOINT env var metricReader: new PeriodicExportingMetricReader({ exporter: new OTLPMetricExporter(), exportIntervalMillis: isProd ? 15_000 : 5_000, }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, }), ], }); sdk.-weight: 500;">start(); console.log(`OTel SDK started [${process.env.NODE_ENV}]`); process.on('SIGTERM', async () => { await sdk.shutdown(); process.exit(0); }); // tracing.js — production-ready 'use strict'; const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http'); const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base'); const isProd = process.env.NODE_ENV === 'production'; const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'my--weight: 500;">service', [SemanticResourceAttributes.SERVICE_VERSION]: process.env.npm_package_version || '0.0.0', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const sdk = new NodeSDK({ resource, sampler: new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(isProd ? 0.1 : 1.0), // 100% in dev, 10% in prod }), traceExporter: new OTLPTraceExporter(), // uses OTEL_EXPORTER_OTLP_ENDPOINT env var metricReader: new PeriodicExportingMetricReader({ exporter: new OTLPMetricExporter(), exportIntervalMillis: isProd ? 15_000 : 5_000, }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, }), ], }); sdk.-weight: 500;">start(); console.log(`OTel SDK started [${process.env.NODE_ENV}]`); process.on('SIGTERM', async () => { await sdk.shutdown(); process.exit(0); }); // tracing.js — production-ready 'use strict'; const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http'); const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base'); const isProd = process.env.NODE_ENV === 'production'; const resource = new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'my--weight: 500;">service', [SemanticResourceAttributes.SERVICE_VERSION]: process.env.npm_package_version || '0.0.0', [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', }); const sdk = new NodeSDK({ resource, sampler: new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(isProd ? 0.1 : 1.0), // 100% in dev, 10% in prod }), traceExporter: new OTLPTraceExporter(), // uses OTEL_EXPORTER_OTLP_ENDPOINT env var metricReader: new PeriodicExportingMetricReader({ exporter: new OTLPMetricExporter(), exportIntervalMillis: isProd ? 15_000 : 5_000, }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, }), ], }); sdk.-weight: 500;">start(); console.log(`OTel SDK started [${process.env.NODE_ENV}]`); process.on('SIGTERM', async () => { await sdk.shutdown(); process.exit(0); }); OTEL_SERVICE_NAME=order--weight: 500;">service OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 NODE_ENV=production OTEL_SERVICE_NAME=order--weight: 500;">service OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 NODE_ENV=production OTEL_SERVICE_NAME=order--weight: 500;">service OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 NODE_ENV=production - Always keeps traces with errors - Always keeps traces slower than 2 seconds - Randomly samples 5% of everything else - Service Map — an auto-generated DAG of all your services and their dependencies, with error rate and latency for each edge - Trace Waterfall — click any span to see attributes, events, and linked logs - RED Dashboard — Rate, Errors, Duration for each -weight: 500;">service endpoint, derived automatically from trace data - Metrics Correlation — jump from a Prometheus alert to the traces that fired during the anomaly window