Tools: From System Monitoring to Business Insights: A Comprehensive Analysis of the Custom Metric Collection Feature of ARMS

Tools: From System Monitoring to Business Insights: A Comprehensive Analysis of the Custom Metric Collection Feature of ARMS

Source: Dev.to

This article introduces Alibaba Cloud ARMS' custom metric collection capability.
Introduction
In the wave of digital transformation, application performance monitoring (APM) has become an important cornerstone to ensure the stable operation of systems. However, traditional APM systems can only provide system-level performance data and cannot go deep into the business core. The custom metric collection feature of Alibaba Cloud Application Real-Time Monitoring Service (ARMS) breaks through this limitation and enables monitoring to become a real booster for business growth. ● CPU utilization and memory usage ● Request response time and throughput ● Database query performance ● API call success rate These metrics are often designed to resolve business performance issues, errors, and slow responses. They can hardly reflect the business operation directly. Therefore, monitoring blind spots occur in the following business scenarios: Scenario 1: E-commerce sales promotions
During sales promotions such as Double 11, the CPU and memory metrics of the system may run as expected. However, business issues often cannot be detected in time by using system metrics, such as a sudden drop in the order conversion rate or an anomaly in the payment success rate. Scenario 2: E-commerce system operation
For an e-commerce system, the key business metrics include: ● Real-time order quantity and order amount ● Conversion rate from the shopping cart These business metrics directly reflect business health and operational efficiency. However, these business metrics cannot be collected by traditional APM systems. Scenario 3: Financial risk control system
A financial system needs to monitor the following metrics in real time: ● Number of transactions and transaction amount ● Percentage of abnormal transactions ● Capital turnover speed These metrics are critical to business decisions. However, the metrics cannot be collected by traditional APM systems. 1.2 Value of Custom Metric Collection
ARMS provides the custom metric collection feature, which brings the following core value: ✅ Business observability: Business metrics and system metrics are monitored in a unified manner to form a complete observability system. ✅ Quick issue identification: If a business exception occurs, system metrics can be quickly associated and the root cause of the issue can be accurately located. ✅ Data-driven decision-making: Real-time business metrics provide data support for operations and product decisions. ✅ End-to-end tracing: The combination of business metrics and traces enables end-to-end business process monitoring. 2.1 Micrometer
Introduction: Micrometer is a metrics facade for the Spring ecosystem, similar to SLF4J for logging. ● Provides a unified API and supports multiple monitoring system backends, such as Prometheus, InfluxDB, and Datadog. ● Deeply integrates with Spring Boot. ● Supports dimensional metrics, such as tags or labels. ● ✅ Supports multiple backends. One set of code can be compatible with multiple monitoring systems. ● ✅ Supports automatic configurations of Spring Boot, enabling out-of-the-box use. ● ✅ Supports dimensional metrics for flexible queries. ● ✅ Active in the community and continuously updated. ● ❌ Highly dependent on the Spring ecosystem. ● ❌ Does not support distributed tracing and logging. ● ❌ Complex configurations. ● ❌ Lacks unified observability standards. Scenarios: Spring Boot microservices applications 2.2 Prometheus clients
Introduction: Prometheus clients are Java client libraries provided by Prometheus. A Prometheus client can be directly connected to the Prometheus ecosystem and is a preferred solution for many components in the Kubernetes ecosystem to expose metrics. ● Native integration: seamlessly integrates with the Prometheus monitoring system. ● Pull model: actively pulls metrics. Applications do not need to actively push metrics. ● Powerful query: supports powerful query and aggregation capabilities of Prometheus Query Language (PromQL). ● Rich ecosystem: supports the Grafana visualization tool and Alertmanager alerts. Expose the Metric Endpoint (Spring Boot): Visit http://localhost:8080/metrics\ to view metric data in the Prometheus format. ● ✅ Natively integrates with the Prometheus ecosystem. ● ✅ Supports the pull model. Applications do not need to actively push metrics. ● ✅ Supports powerful query features and complex aggregation and calculation of PromQL. ● ✅ Seamlessly connects to visualization tools such as Grafana. ● ✅ Supports flexible label mechanisms for multi-dimensional queries. ● ✅ Lightweight framework and low performance overhead. ● ❌ Only metrics can be collected. Distributed tracing and logging are not supported. ● ❌ The deployment of the pull model is complex in some network environments (port exposure required). ● ❌ Integration with non-Prometheus monitoring systems requires additional configurations. ● ❌ Data persistence depends on Prometheus servers. Prometheus clients do not store historical data. ● ❌ The automatic instrumentation capability is not provided. All metrics must be manually defined. ● Teams that use the Prometheus monitoring system ● Cloud-native applications in Kubernetes environments ● Monitoring scenarios that require powerful query capabilities ● Projects with preferred open source solutions Prometheus advantages compared to other frameworks: Core features:
● Diverse data types: supports traces, metrics, and logs. ● Vendor neutral: supports standard data models and protocols. ● Automatic instrumentation: automatically collects framework metrics using a Java agent. ● Flexible extension: provides a comprehensive plug-in ecosystem. ● ✅ Cloud-native standard with wide support. ● ✅ Provides a unified observability system that integrates traces, metrics, and logs. ● ✅ Supports automatic instrumentation. OpenTelemetry can collect framework metrics without the need to write code. ● ✅ Provides rich context information and supports associations of metrics and traces. ● ✅ Active in the community and supported by major cloud service providers. ● ❌ The learning curve is steep. ● ❌ Additional collector deployment is required. ● ❌ Some features are still evolving. ● ❌ Configuration is relatively complex. Scenarios: Cloud-native microservices, distributed systems, and scenarios that require unified observability 2.5 Framework Comparison Recommended framework selection: ● Spring Boot applications: We recommend that you select Micrometer. ● Prometheus systems: We recommend that you select a Prometheus client. ● Cloud-native or distributed systems: We recommend that you select OpenTelemetry. ● Existing Grafana dashboards: We recommend that you select a Prometheus client or Micrometer. Deep comparison between Prometheus clients and OpenTelemetry:
For cloud-native applications, a Prometheus client or OpenTelemetry is a common choice. Prometheus clients and OpenTelemetry have the following differences: Prometheus-only stack: Prometheus client + Prometheus + Grafana
Hybrid solution: OpenTelemetry collection + Metric data export in the Prometheus format + Grafana 3.1 Scenario Introduction
You want to monitor a flash sale system and need to track the following key metrics in real time: ● Number of successful flash sale requests: The statistics information is classified by success or failure. ● Current inventory: the real-time inventory. ● Flash sale success rate: used for alerts and dashboard display. 3.2 Step 1: Add Dependencies
Add the OpenTelemetry dependency to the pom.xml file of your project. Note:
● The ARMS Java agent automatically initializes an OpenTelemetry instance. ● The application code needs to only depend on opentelemetry-api. ● You do not need to configure an exporter. Data is automatically reported to ARMS. 3.3 Step 2: Define Custom Metrics
Create a flash sale service and define business metrics. Meter naming: "seckill" in getMeter("seckill") is the namespace, which needs to be configured in the ARMS console. Counter and gauge comparison: Counter: used to record a cumulative value (can be increased but not decreased), such as the total number of flash sale requests.
Gauge: used to record an instantaneous value (can be increased or decreased), such as the current inventory. Dimension design: You can use Attributes to add dimensions and use result (success or failed) and product_id to perform multi-dimensional analysis. Thread safety: Use AtomicInteger to ensure data accuracy in high-concurrency scenarios. 3.4 Step 3: Configure Custom Metric Collection in the ARMS Console Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List. On the Application List page, click the name of an application. On the page that appears, click the Configuration tab and select Custom Configurations. Enable custom metric collection.In the Probe switch settings section of the Configuration tab, configure the metrics to be collected. meters parameter: Enter the name of the meter (seckill) defined in Step 2.
You can configure multiple meters. Separate multiple meters with commas (,). Example: seckill,order,payment.
3.5 Step 4: View Metric Data 3.6 Step 5: Configure an Alert Rule
Go to the Prometheus Alert Rules page of the ARMS console. In the top navigation bar, select the region in which the application resides. Click Create Prometheus Alert Rule and configure the rule, as shown in the following figure. Alert: inventory alert For more information about alert rules, see Create an alert rule for a Prometheus instance. 3.7 Recommended Best Practices
✅ Metric naming conventions ✅ Dimension design principles ● The cardinality of a dimension should not be too large (to prevent excessive dimension data). ● An enumeration type dimension is preferred, such as status (success or failed). ● We recommend that you do not use high-cardinality dimensions, such as userId or orderId. ✅ Performance optimization ● Create metric objects in advance. This prevents frequent metric object creation. ● Use the batch API to reduce overheads. ● Keep the logic of the gauge callback function simple. ✅ Metric type selection ● ✅ Non-intrusive collection: Framework metrics can be automatically collected, and business metrics can be defined as required. ● ✅ Unified reporting: Metrics can be automatically reported to ARMS without the need to deploy a collector. 4.2 Associations of Metrics and Traces
The core advantage of ARMS is to associate custom metrics with distributed traces. Value: If an order metric is abnormal, go to a specific trace with one click to quickly locate the issue. 4.3 Powerful Data Visualization Capabilities
● 📊 Multi-dimensional aggregation queries ● 📈 Trend comparison analysis ● 🎯 Custom dashboards ● 🔔 Flexible alert rules 4.4 Enterprise-class Features
● 🔒 Secure data isolation ● 📦 Long-term data storage ● ⚡ High-performance queries ● 🌐 Cross-region deployment ✨ Standardization: supports cloud-native standards to prevent vendor lock-in. ✨ Simplification: requires only one line of configuration, enabling out-of-the-box use. ✨ Visualization: supports metrics, traces, and logs. ✨ Intelligence: supports AI-powered anomaly detection and root cause analysis. ● E-commerce systems: order, payment, and inventory monitoring ● Financial systems: transaction volume and risk control metrics ● Game systems: number of online users and top-up amount ● IoT systems: online rate of devices and number of messages ARMS will continue to deepen its custom metric capabilities and support custom metric collection for more frameworks and metric types. ● Supports the Micrometer and Prometheus frameworks. ● Supports quantile and histogram metric types. Try the custom metric collection feature of ARMS now, enabling monitoring to truly serve business growth. References
● Official documentation for custom metric collection in ARMS ● OpenTelemetry official website ● ARMS product homepage [Try now] 👉 https://www.alibabacloud.com/en/product/arms This article is presented by the Alibaba Cloud ARMS team. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
@Autowired
MeterRegistry registry;
public void processOrder(Order order) { Counter.builder("orders.processed") .tag("status", order.getStatus()) .tag("channel", order.getChannel()) .register(registry) .increment();
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
@Autowired
MeterRegistry registry;
public void processOrder(Order order) { Counter.builder("orders.processed") .tag("status", order.getStatus()) .tag("channel", order.getChannel()) .register(registry) .increment();
} CODE_BLOCK:
@Autowired
MeterRegistry registry;
public void processOrder(Order order) { Counter.builder("orders.processed") .tag("status", order.getStatus()) .tag("channel", order.getChannel()) .register(registry) .increment();
} CODE_BLOCK:
import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
public class OrderMetrics { // Define a counter to record the total number of orders. private static final Counter orderCounter = Counter.build() .name("orders_total") .help("Total number of orders") .labelNames("status", "channel") // Define labels .register(); // Define a gauge to record the number of orders that are being processed. private static final Gauge processingOrders = Gauge.build() .name("orders_processing") .help("Number of orders currently processing") .register(); // Define a histogram to record the statistics on order amount distribution. private static final Histogram orderAmount = Histogram.build() .name("order_amount") .help("Order amount distribution") .buckets(50, 100, 200, 500, 1000, 5000) // Custom buckets .register(); public void processOrder(Order order) { // Number of total orders + 1, with labels. orderCounter.labels(order.getStatus(), order.getChannel()).inc(); // Record the order amount. orderAmount.observe(order.getAmount()); // Number of orders that are being processed + 1. processingOrders.inc(); try { // Order processing logic... } finally { // Processing completed. Counter - 1. processingOrders.dec(); } }
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
public class OrderMetrics { // Define a counter to record the total number of orders. private static final Counter orderCounter = Counter.build() .name("orders_total") .help("Total number of orders") .labelNames("status", "channel") // Define labels .register(); // Define a gauge to record the number of orders that are being processed. private static final Gauge processingOrders = Gauge.build() .name("orders_processing") .help("Number of orders currently processing") .register(); // Define a histogram to record the statistics on order amount distribution. private static final Histogram orderAmount = Histogram.build() .name("order_amount") .help("Order amount distribution") .buckets(50, 100, 200, 500, 1000, 5000) // Custom buckets .register(); public void processOrder(Order order) { // Number of total orders + 1, with labels. orderCounter.labels(order.getStatus(), order.getChannel()).inc(); // Record the order amount. orderAmount.observe(order.getAmount()); // Number of orders that are being processed + 1. processingOrders.inc(); try { // Order processing logic... } finally { // Processing completed. Counter - 1. processingOrders.dec(); } }
} CODE_BLOCK:
import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
public class OrderMetrics { // Define a counter to record the total number of orders. private static final Counter orderCounter = Counter.build() .name("orders_total") .help("Total number of orders") .labelNames("status", "channel") // Define labels .register(); // Define a gauge to record the number of orders that are being processed. private static final Gauge processingOrders = Gauge.build() .name("orders_processing") .help("Number of orders currently processing") .register(); // Define a histogram to record the statistics on order amount distribution. private static final Histogram orderAmount = Histogram.build() .name("order_amount") .help("Order amount distribution") .buckets(50, 100, 200, 500, 1000, 5000) // Custom buckets .register(); public void processOrder(Order order) { // Number of total orders + 1, with labels. orderCounter.labels(order.getStatus(), order.getChannel()).inc(); // Record the order amount. orderAmount.observe(order.getAmount()); // Number of orders that are being processed + 1. processingOrders.inc(); try { // Order processing logic... } finally { // Processing completed. Counter - 1. processingOrders.dec(); } }
} CODE_BLOCK:
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient</artifactId> <version>0.16.0</version>
</dependency>
<!-- Used to expose an HTTP endpoint. -->
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_servlet</artifactId> <version>0.16.0</version>
</dependency> Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient</artifactId> <version>0.16.0</version>
</dependency>
<!-- Used to expose an HTTP endpoint. -->
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_servlet</artifactId> <version>0.16.0</version>
</dependency> CODE_BLOCK:
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient</artifactId> <version>0.16.0</version>
</dependency>
<!-- Used to expose an HTTP endpoint. -->
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_servlet</artifactId> <version>0.16.0</version>
</dependency> COMMAND_BLOCK:
@Configuration
public class PrometheusConfig { @Bean public ServletRegistrationBean<MetricsServlet> metricsServlet() { return new ServletRegistrationBean<>( new MetricsServlet(), "/metrics" ); }
} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
@Configuration
public class PrometheusConfig { @Bean public ServletRegistrationBean<MetricsServlet> metricsServlet() { return new ServletRegistrationBean<>( new MetricsServlet(), "/metrics" ); }
} COMMAND_BLOCK:
@Configuration
public class PrometheusConfig { @Bean public ServletRegistrationBean<MetricsServlet> metricsServlet() { return new ServletRegistrationBean<>( new MetricsServlet(), "/metrics" ); }
} COMMAND_BLOCK:
# Calculate the order growth rate.rate(orders_total[5m])# Collect the statistics by channel group.sum by(channel) (orders_total)# Query the P99 response time.histogram_quantile(0.99, order_amount_bucket) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Calculate the order growth rate.rate(orders_total[5m])# Collect the statistics by channel group.sum by(channel) (orders_total)# Query the P99 response time.histogram_quantile(0.99, order_amount_bucket) COMMAND_BLOCK:
# Calculate the order growth rate.rate(orders_total[5m])# Collect the statistics by channel group.sum by(channel) (orders_total)# Query the P99 response time.histogram_quantile(0.99, order_amount_bucket) CODE_BLOCK:
OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();Meter meter = openTelemetry.getMeter("order-service");LongCounter orderCounter = meter.counterBuilder("orders.total").setUnit("1").setDescription("Total number of orders").build();orderCounter.add(1, Attributes.of(AttributeKey.stringKey("status"), "success",AttributeKey.stringKey("payment_method"), "alipay")); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();Meter meter = openTelemetry.getMeter("order-service");LongCounter orderCounter = meter.counterBuilder("orders.total").setUnit("1").setDescription("Total number of orders").build();orderCounter.add(1, Attributes.of(AttributeKey.stringKey("status"), "success",AttributeKey.stringKey("payment_method"), "alipay")); CODE_BLOCK:
OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();Meter meter = openTelemetry.getMeter("order-service");LongCounter orderCounter = meter.counterBuilder("orders.total").setUnit("1").setDescription("Total number of orders").build();orderCounter.add(1, Attributes.of(AttributeKey.stringKey("status"), "success",AttributeKey.stringKey("payment_method"), "alipay")); CODE_BLOCK:
<dependencies> <!-- OpenTelemetry API --> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> <!-- OpenTelemetry SDK (Optional. Used for local testing.) --> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-sdk</artifactId> </dependency>
</dependencies>
<!-- Unified version management -->
<dependencyManagement> <dependencies> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-bom</artifactId> <version>1.32.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies>
</dependencyManagement> Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
<dependencies> <!-- OpenTelemetry API --> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> <!-- OpenTelemetry SDK (Optional. Used for local testing.) --> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-sdk</artifactId> </dependency>
</dependencies>
<!-- Unified version management -->
<dependencyManagement> <dependencies> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-bom</artifactId> <version>1.32.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies>
</dependencyManagement> CODE_BLOCK:
<dependencies> <!-- OpenTelemetry API --> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> <!-- OpenTelemetry SDK (Optional. Used for local testing.) --> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-sdk</artifactId> </dependency>
</dependencies>
<!-- Unified version management -->
<dependencyManagement> <dependencies> <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-bom</artifactId> <version>1.32.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies>
</dependencyManagement> COMMAND_BLOCK:
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongGauge;
import org.springframework.stereotype.Service;
import javax.annotation.PreDestroy;
import java.util.concurrent.atomic.AtomicInteger;
@Service
public class SeckillService { // Inventory counter (thread-safe) private final AtomicInteger stock = new AtomicInteger(0); // Counter for calculating flash sale requests private final LongCounter seckillCounter; // Inventory gauge private final ObservableLongGauge stockGauge; // Metric dimension keys private static final AttributeKey<String> RESULT_KEY = AttributeKey.stringKey("result"); private static final AttributeKey<String> PRODUCT_KEY = AttributeKey.stringKey("product_id"); public SeckillService() { // Obtain the OpenTelemetry instance initialized by the ARMS Java agent. OpenTelemetry openTelemetry = GlobalOpenTelemetry.get(); // Create a meter whose namespace is seckill. Meter meter = openTelemetry.getMeter("seckill"); // Define a counter to record the number of flash sale requests (cumulative value) seckillCounter = meter.counterBuilder("product_seckill_count") .setUnit("1") .setDescription("The number of flash sale requests. The statistics information is classified by success or failure.") .build(); // Define a gauge to record the current inventory (instantaneous value) stockGauge = meter.gaugeBuilder("product_current_stock") .ofLongs() .setDescription("The current product inventory.") .buildWithCallback(measurement -> { // Execute a callback upon each collection to report the current inventory. measurement.record(stock.get()); }); } /** * Initialize the inventory. */ public void initStock(int count) { stock.set(count); } /** * Flash sale product */ public String seckill(String productId, String userId) { int currentStock = stock.get(); // The inventory is insufficient. The flash sale request fails. if (currentStock <= 0) { // Record the number of failed flash sale requests. seckillCounter.add(1, Attributes.of( RESULT_KEY, "failed", PRODUCT_KEY, productId )); return "The flash sale request fails. The product is sold out."; } // Try to deduct the inventory. Perform the Compare and Swap (CAS) operation to ensure thread safety. if (stock.decrementAndGet() >= 0) { // The flash sale request is successful. seckillCounter.add(1, Attributes.of( RESULT_KEY, "success", PRODUCT_KEY, productId )); return "Congratulations. The flash sale request is successful. Remaining inventory:" + stock.get(); } else { // The inventory is insufficient in the concurrency situation. Roll back. stock.incrementAndGet(); seckillCounter.add(1, Attributes.of( RESULT_KEY, "failed", PRODUCT_KEY, productId )); return "The flash sale request fails. The product is sold out."; } } /** * Destroy resources. */ @PreDestroy public void destroy() { // Disable the gauge and stop collection stockGauge.close(); }
} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongGauge;
import org.springframework.stereotype.Service;
import javax.annotation.PreDestroy;
import java.util.concurrent.atomic.AtomicInteger;
@Service
public class SeckillService { // Inventory counter (thread-safe) private final AtomicInteger stock = new AtomicInteger(0); // Counter for calculating flash sale requests private final LongCounter seckillCounter; // Inventory gauge private final ObservableLongGauge stockGauge; // Metric dimension keys private static final AttributeKey<String> RESULT_KEY = AttributeKey.stringKey("result"); private static final AttributeKey<String> PRODUCT_KEY = AttributeKey.stringKey("product_id"); public SeckillService() { // Obtain the OpenTelemetry instance initialized by the ARMS Java agent. OpenTelemetry openTelemetry = GlobalOpenTelemetry.get(); // Create a meter whose namespace is seckill. Meter meter = openTelemetry.getMeter("seckill"); // Define a counter to record the number of flash sale requests (cumulative value) seckillCounter = meter.counterBuilder("product_seckill_count") .setUnit("1") .setDescription("The number of flash sale requests. The statistics information is classified by success or failure.") .build(); // Define a gauge to record the current inventory (instantaneous value) stockGauge = meter.gaugeBuilder("product_current_stock") .ofLongs() .setDescription("The current product inventory.") .buildWithCallback(measurement -> { // Execute a callback upon each collection to report the current inventory. measurement.record(stock.get()); }); } /** * Initialize the inventory. */ public void initStock(int count) { stock.set(count); } /** * Flash sale product */ public String seckill(String productId, String userId) { int currentStock = stock.get(); // The inventory is insufficient. The flash sale request fails. if (currentStock <= 0) { // Record the number of failed flash sale requests. seckillCounter.add(1, Attributes.of( RESULT_KEY, "failed", PRODUCT_KEY, productId )); return "The flash sale request fails. The product is sold out."; } // Try to deduct the inventory. Perform the Compare and Swap (CAS) operation to ensure thread safety. if (stock.decrementAndGet() >= 0) { // The flash sale request is successful. seckillCounter.add(1, Attributes.of( RESULT_KEY, "success", PRODUCT_KEY, productId )); return "Congratulations. The flash sale request is successful. Remaining inventory:" + stock.get(); } else { // The inventory is insufficient in the concurrency situation. Roll back. stock.incrementAndGet(); seckillCounter.add(1, Attributes.of( RESULT_KEY, "failed", PRODUCT_KEY, productId )); return "The flash sale request fails. The product is sold out."; } } /** * Destroy resources. */ @PreDestroy public void destroy() { // Disable the gauge and stop collection stockGauge.close(); }
} COMMAND_BLOCK:
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongGauge;
import org.springframework.stereotype.Service;
import javax.annotation.PreDestroy;
import java.util.concurrent.atomic.AtomicInteger;
@Service
public class SeckillService { // Inventory counter (thread-safe) private final AtomicInteger stock = new AtomicInteger(0); // Counter for calculating flash sale requests private final LongCounter seckillCounter; // Inventory gauge private final ObservableLongGauge stockGauge; // Metric dimension keys private static final AttributeKey<String> RESULT_KEY = AttributeKey.stringKey("result"); private static final AttributeKey<String> PRODUCT_KEY = AttributeKey.stringKey("product_id"); public SeckillService() { // Obtain the OpenTelemetry instance initialized by the ARMS Java agent. OpenTelemetry openTelemetry = GlobalOpenTelemetry.get(); // Create a meter whose namespace is seckill. Meter meter = openTelemetry.getMeter("seckill"); // Define a counter to record the number of flash sale requests (cumulative value) seckillCounter = meter.counterBuilder("product_seckill_count") .setUnit("1") .setDescription("The number of flash sale requests. The statistics information is classified by success or failure.") .build(); // Define a gauge to record the current inventory (instantaneous value) stockGauge = meter.gaugeBuilder("product_current_stock") .ofLongs() .setDescription("The current product inventory.") .buildWithCallback(measurement -> { // Execute a callback upon each collection to report the current inventory. measurement.record(stock.get()); }); } /** * Initialize the inventory. */ public void initStock(int count) { stock.set(count); } /** * Flash sale product */ public String seckill(String productId, String userId) { int currentStock = stock.get(); // The inventory is insufficient. The flash sale request fails. if (currentStock <= 0) { // Record the number of failed flash sale requests. seckillCounter.add(1, Attributes.of( RESULT_KEY, "failed", PRODUCT_KEY, productId )); return "The flash sale request fails. The product is sold out."; } // Try to deduct the inventory. Perform the Compare and Swap (CAS) operation to ensure thread safety. if (stock.decrementAndGet() >= 0) { // The flash sale request is successful. seckillCounter.add(1, Attributes.of( RESULT_KEY, "success", PRODUCT_KEY, productId )); return "Congratulations. The flash sale request is successful. Remaining inventory:" + stock.get(); } else { // The inventory is insufficient in the concurrency situation. Roll back. stock.incrementAndGet(); seckillCounter.add(1, Attributes.of( RESULT_KEY, "failed", PRODUCT_KEY, productId )); return "The flash sale request fails. The product is sold out."; } } /** * Destroy resources. */ @PreDestroy public void destroy() { // Disable the gauge and stop collection stockGauge.close(); }
} CODE_BLOCK:
<namespace>_<metric_name>
Examples:
- order_created_count // The number of created orders.
- payment_success_rate // The payment success rate.
- user_login_duration // The logon duration. Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
<namespace>_<metric_name>
Examples:
- order_created_count // The number of created orders.
- payment_success_rate // The payment success rate.
- user_login_duration // The logon duration. CODE_BLOCK:
<namespace>_<metric_name>
Examples:
- order_created_count // The number of created orders.
- payment_success_rate // The payment success rate.
- user_login_duration // The logon duration. CODE_BLOCK:
// ❌ Invalid: The cardinality of userId is too large.
counter.add(1, Attributes.of( AttributeKey.stringKey("user_id"), userId
)); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
// ❌ Invalid: The cardinality of userId is too large.
counter.add(1, Attributes.of( AttributeKey.stringKey("user_id"), userId
)); CODE_BLOCK:
// ❌ Invalid: The cardinality of userId is too large.
counter.add(1, Attributes.of( AttributeKey.stringKey("user_id"), userId
)); CODE_BLOCK:
// ✅ Valid: Use an enumeration type dimension.
counter.add(1, Attributes.of( AttributeKey.stringKey("user_type"), "vip"
)); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
// ✅ Valid: Use an enumeration type dimension.
counter.add(1, Attributes.of( AttributeKey.stringKey("user_type"), "vip"
)); CODE_BLOCK:
// ✅ Valid: Use an enumeration type dimension.
counter.add(1, Attributes.of( AttributeKey.stringKey("user_type"), "vip"
)); COMMAND_BLOCK:
Request trace:
Frontend -> Gateway -> Order service -> Payment service ↓ Custom metric: An order is created. ↓ Trace: the complete trace of the order. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
Request trace:
Frontend -> Gateway -> Order service -> Payment service ↓ Custom metric: An order is created. ↓ Trace: the complete trace of the order. COMMAND_BLOCK:
Request trace:
Frontend -> Gateway -> Order service -> Payment service ↓ Custom metric: An order is created. ↓ Trace: the complete trace of the order. - Why Is Custom Metric Collection Required?
1.1 Monitoring Blind Spots of Traditional APM Systems
Traditional APM systems generally focus on the following system-level metrics: - Comparison of Common Metric Definition Frameworks in Java
In the Java ecosystem, there are multiple mature metric collection frameworks. Understanding their characteristics helps you choose an appropriate technical solution. - Pull model:
You do not need to configure a data push address for your application. This reduces coupling.
Prometheus can detect the health of applications. If Prometheus fails to capture the metrics of an application, the application is abnormal.
The pull model facilitates service discovery and dynamic monitoring.
- Powerful PromQL: - Cloud-native standards:
Kubernetes natively supports metric data in the Prometheus format.
A large number of open source components provide the /metrics endpoint.
Monitoring as Code (MaC) and configuration version management are supported.
2.3 OpenTelemetry
Introduction: OpenTelemetry is a Cloud Native Computing Foundation (CNCF) observability standard, which is the result of a merger between OpenTracing and OpenCensus. - Best Practices for Using ARMS to Collect Custom Metrics
The preceding comparisons show that different metric definition frameworks have their advantages and disadvantages. ARMS can deeply integrate with OpenTelemetry. Compared with open source solutions, ARMS greatly simplifies the process of defining metrics, collecting metrics, and configuring dashboards and alerts by using the OpenTelemetry SDK technology stack. In the future, ARMS will support quick collection of Micrometer and Prometheus metrics. The following example shows how to use ARMS to collect custom metrics in a flash sale scenario. - Meter naming: "seckill" in getMeter("seckill") is the namespace, which needs to be configured in the ARMS console.
- Counter and gauge comparison: - Dimension design: You can use Attributes to add dimensions and use result (success or failed) and product_id to perform multi-dimensional analysis.
- Thread safety: Use AtomicInteger to ensure data accuracy in high-concurrency scenarios. - Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List. On the Application List page, click the name of an application. On the page that appears, click the Configuration tab and select Custom Configurations.
- Enable custom metric collection.In the Probe switch settings section of the Configuration tab, configure the metrics to be collected. - Configuration description: - Go to the Instances page of the ARMS console. In the top navigation bar, select the region in which the application resides. The instance whose type is Prometheus Instance for Application Monitoring is the storage instance of APM metrics and custom metrics of all ARMS applications in the current region, as shown in the following figure. - Click Shared Edition in the Grafana Workspace column of the instance to go to the Grafana page. Click Explore and select the Prometheus instance from the previous step as the data source. - Use PromQL to query the metrics that you defined in the code, as shown in the following figure. You can also create a custom dashboard in Grafana. - Core Benefits of Custom ARMS Metrics
4.1 Seamless Integration and Zero-cost Integration
● ✅ Automatic injection: The ARMS Java agent is used. You do not need to manually configure OpenTelemetry. - Summary and Future Outlook
The custom metric collection feature is a key step for APM systems to move from monitoring to observability. Alibaba Cloud ARMS deeply integrates with OpenTelemetry to provide users with the following features: