Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

How to aggregate metrics but retain critical data: Introducing Exemptions in Adaptive Metrics

How to aggregate metrics but retain critical data: Introducing Exemptions in Adaptive Metrics

2024-08-22 5 min

When you hear about Adaptive Metrics in Grafana Cloud, all signs point to how it’s a game changer.

Adaptive Metrics, which aggregates unused and partially used metrics into lower cardinality versions, has delivered a 35% reduction in metrics costs on average for more than 1,200 organizations.

Companies have also spoken candidly about the cost savings they gained from the feature. At Mux, “it not only saves us hundreds of thousands of dollars a year, but it’s also a forcing function for us to look closely at our metrics to find additional opportunities for time series reduction and cardinality improvements,” says Kyle Weaver, Staff Software Engineer at Mux.

And yet, you’re still hesitant about implementing automated aggregations and possibly losing critical data.

We get it. Which is why to help you with that delicate balance of maintaining optimal application performance and reducing metrics volume (all while calming that “what if” anxiety among your teams), we have enhanced Adaptive Metrics with the new Exemptions capability. Adaptive Metrics continues to provide daily recommendations for aggregating or dropping high cardinality metrics. But with Exemptions, your teams now have the ability to proactively preserve critical data by identifying and excluding certain metrics from aggregations.

In this blog, we’ll show you how Exemptions work and how they can empower you to manage your Adaptive Metrics recommendations and aggregations in Grafana Cloud.

How Adaptive Metrics works

Adaptive Metrics was developed to achieve three goals:

  1. Provide an automated way to aggregate or drop high cardinality metrics and, in turn, save users money.
  2. Continually identify opportunities to aggregate metrics based on usage patterns.
  3. Do all of the above while impacting as few queries as possible.

Put another way, Adaptive Metrics helps cut down on costs without compromising your ability to troubleshoot and diagnose issues in production.

At Grafana Labs, we have been using Adaptive Metrics for a couple of years now to reduce our internal ops metric volume by around 30-40%.

The process is straightforward: every weekday morning, an automated workflow makes a PR to our configuration repo to apply the latest recommendations. A few minutes later, that PR is merged automatically without a human review.

The aggregations are made possible by:

  1. The aggregation service: This aggregates incoming data as it is received in Grafana Cloud according to aggregation rules, ultimately lowering the amount of data that must be stored.
  2. The recommendation engine: This generates aggregation rules and automatically adapts them based on usage patterns as they change.
  3. Exemptions: This new capability allows you to fine-tune the recommendation engine by providing additional context for critical metrics that may not be reflected in your organization’s usage patterns.

How metrics aggregations adapt to your usage

When you set up Adaptive Metrics, you instantly begin aggregating and dropping low-value metrics to achieve significant savings. This is a great start!

As your usage patterns change — for example, by adding new dashboards or alerts — the recommendations engine will generate updates to your rules. In some cases, it may even recommend removing a rule altogether if, for example, the full cardinality is now being used.

However, we’ve found that this usage-based workflow is only a piece of the puzzle. As we rolled out aggregation automatically, we noticed a pattern of requests to remove or modify rules.

Here are some common reasons for these requests:

  1. Discovery of new use cases for the data: Teams might find new applications for previously discarded metrics, making them valuable again.
  2. Development of experimental features: New projects or experimental features often require comprehensive data sets that may include aggregated or dropped metrics.
  3. Incident reviews: When conducting incident reviews, teams may need detailed data to create new alerts or refine existing ones, which requires storing the full cardinality of some data going forward.

The initial workflow went like this: any engineer at Grafana Labs was empowered to open PRs against our production aggregation rules. The Adaptive Metrics team would review these PRs only to understand their use cases, then approve. Over time, we realized that what our users were really after is a way to protect the data they need regardless of how it’s being used.

How Exemptions work in Adaptive Metrics

An Exemption is a capability in Adaptive Metrics that allows you to identify critical metrics that should be preserved, and therefore excluded, from the recommendations engine. This will provide your teams more control over your metrics and help maintain the integrity of your important data.

A screenshot of the Exemptions feature in Adaptive Metrics.

There are four types of Exemptions in Adaptive Metrics:

  1. Keep a whole metric intact: Ensure that certain metrics remain untouched by aggregation, preserving their complete data set.
  2. Preserve a specific label across all metrics: Maintain a particular label in all metrics to ensure consistent data categorization.
  3. Retain a label on a particular metric: Keep a specific label on a specific metric, maintaining detailed tracking for important metrics.
  4. Disable recommendations entirely for a specific metric: Prevent any recommendations from being applied to a particular metric, ensuring its aggregation remains in place.

By leveraging these Exemptions, you can customize how Adaptive Metrics handles your data, ensuring that essential information is always available while enjoying the benefits of reduced cardinality and costs.

Get started with Exemptions in Adaptive Metrics

Exemptions in Adaptive Metrics address the concern of data loss by providing you with the tools to retain critical data while benefiting from cost optimization. With Exemptions, you can effectively balance control and optimization, ensuring that your metrics remain accurate and useful.

We encourage you to explore how Exemptions can enhance your Adaptive Metrics setup. To get started, check out our detailed Adaptive Metrics documentation. If you have any questions or need further assistance, don’t hesitate to contact our support team.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!