Cost management and billing

Understand your invoice

Metrics invoice

Grafana Cloud

Understand your Grafana Cloud Metrics invoice

To understand your Grafana Cloud Metrics invoice, start with the concepts active series and data points per minute.

Understand active series and DPM for billing calculations

Grafana Cloud calculates metrics usage by looking at two components: active series and data points per minute (DPM). By understanding these components, you’ll be able to better manage usage and reduce costs for your organization.

Active series

The concept of an active series is specific to Grafana Cloud Metrics billing. When you stop writing new data points to a time series, shortly afterwards it is no longer considered active.

A time series is considered active if new data points have been received within the last 20 minutes. This applies to both Hosted Prometheus and Hosted Graphite.

To view active series data, query for the grafanacloud_instance_active_series metric in the grafanacloud-usage data source.

For information about how Synthetic Monitoring usage is billed, refer to Understand your Synthetic Monitoring invoice.

To understand metrics and time-series, see Prometheus time series or read a detailed explanation of the Prometheus data model in the Prometheus documentation. For information on Graphite series, see What are Graphite time series?

Data points per minute (DPM)

Data points per minute (DPM) is a concept that is specific to Grafana Cloud Metrics Billing. In this case, a data point is a single measured occurrence, or sample, of a metric within a time series, consisting of a unique value and timestamp.

View DPM by querying for the grafanacloud_instance_samples_per_second metric (and then multiplying by 60 to convert to minutes; * 60) in the grafanacloud-usage data source.

The number of data points per minute (DPM) is the number of data points that are sent to Grafana Cloud Metrics per minute. We measure this directly from the data that is sent to our servers. You can also calculate it yourself by multiplying the count of active series by the count of data points per series per minute.

The number of data points per series per minute is calculated by dividing the scrape interval by the number of seconds in a minute. For example:

A scrape interval of 15 seconds is 4 data points per series per minute. This is the Prometheus default.
A scrape interval of 60 seconds is 1 data point per series per minute. This is the default for GrafanaCloud integrations.

See how to optimize your scrape interval to improve your DPM.

Included DPM per series

Included DPM per series is a concept that is specific to Grafana Cloud Metrics Billing. Grafana Cloud Pro includes a default resolution (Included DPM) of 1 DPM per active series.

View included DPM per series by querying for the grafanacloud_org_metrics_included_dpm_per_series metric in the grafanacloud-usage data source.

You can shorten your scrape interval and ship data points more frequently (increase your DPM), for an additional charge.

Contracted plans for Grafana Cloud Advanced allow a choice of two options for the default resolution (included DPM): the default 1 DPM option and a 4 DPM option for higher-resolution use-cases.

Billing calculations

Billing is based on usage, and usage is determined by two primary factors:

The number of active series (95th percentile)
The number of data points per minute (DPM), also at the 95th percentile.

To learn more about 95th percentile usage billing, refer to 95th percentile billing.

If your average DPM per active series is greater than the included DPM, then your usage will be based on total DPM.

Illustrative Example for Grafana Cloud Pro:

Scenario A: 50,000 active series at a 60 second scrape interval (that is, 1 DPM) Pricing: 50,000 active series * (1 DPM / 1 DPM Included) * ($8 / 1000 active series) = $400 / month
Scenario B: 50,000 active series at a 30 second scrape interval (that is, 2 DPM) Pricing: 50,000 active series* (2 DPM / 1 DPM Included) * ($8 / 1000 active series) = $800 / month

Note that Grafana Cloud Advanced also provides a higher resolution of 4 DPM per active series for higher resolution needs. Grafana can also provide volume based discounts as your data needs grow. Contact us for more information on upgrading to Grafana Cloud Advanced.

Usage and cost calculations

The following sections show usage and cost calculations (in pseudo-PromQL):

Active series:

active_series = quantile_over_time(0.95, sum by (id)(grafanacloud_instance_active_series < Inf)[30d:])

DPM:

total_dpm = quantile_over_time(0.95, sum by (id)(grafanacloud_instance_samples_per_second < Inf)[30d:]) * 60

Total usage:

usage = max(active_series, total_dpm/included_dpm)

Total cost:

cost = (usage/1000) * $8

95th percentile billing

Grafana Cloud tracks the number of active series shipped and the total DPM rate over each billing period.

For each new billing period, you are billed based on the 95th percentile of:

The total number of active series sent
The total DPM across all active series

This helps you avoid getting billed for unexpected or temporary spikes in usage, such as when initially configuring Prometheus or Grafana Agent.

In other words, Grafana Cloud forgives the top five percent of usage “time” in each billing period (month), which is roughly the top ~36 hours of usage (0.05*720 = 36).

For example, if you normally send around 6,000 active series but spike up to 30,000 active series for a total of 24 hours in a month, you would still only be billed at the rate of 6,000 active series.

Prometheus time series

A Prometheus time series is a list of timestamp and value pairs, or samples, identified by a metric name and zero or more pairs of label names and label values.

For example, consider the following output from a Prometheus metrics exporter:

node_cpu_seconds_total{host="host1",cpu="0",mode="user"}
node_cpu_seconds_total{host="host1",cpu="1",mode="user"}

In this example, we can determine the following:

The metric name is node_cpu_seconds_total.
There are three labels, where:
- The label names are host, cpu, and mode.
- The label values are host1, 0 or 1, and user.
There are two total time series, since the label value of cpu is different for each time series even though host and mode are the same.

The example above represents two time series. When you scrape a metrics endpoint and a time series generates a sample, there will be a specific timestamp and measured value associated with the time series. If you scrape the exact same time series and it generates a new sample, for example 1 minute later, that creates a second data point in the same time series, with a new timestamp and value.

Metrics usage can ramp up quickly when a given metric has many different combinations of labels. This is called high cardinality, and it is best to avoid spikes in cardinality as well. For example, with 6 different CPU modes, 10 hosts, and 4 CPUs, node_cpu_seconds_total would count towards 6*10*4, or 240 active series of your usage. If your scrape interval is 15 seconds, this would result in a DPM (per series) of 4, or total DPM of 4*240=960 data points.

You can read a detailed explanation of the Prometheus data model in the Prometheus documentation.

What are Graphite time series?

Unique times series are equivalent to metric paths for Graphite. For example, below is an output from Graphite with eight unique time series:

collect.host1.cpu-0.cpu-idle
collect.host1.cpu-0.cpu-user
collect.host1.cpu-0.cpu-wait
collect.host1.cpu-0.cpu-system
collect.host2.cpu-3.cpu-idle
collect.host2.cpu-3.cpu-user
collect.host2.cpu-3.cpu-wait
collect.host2.cpu-3.cpu-system

If Graphite tags are used, then below is an output with eight unique time series:

collect.cpu;host=host1;cpu=0;mode=idle
collect.cpu;host=host1;cpu=0;mode=user
collect.cpu;host=host1;cpu=0;mode=wait
collect.cpu;host=host1;cpu=0;mode=system
collect.cpu;host=host2;cpu=3;mode=idle
collect.cpu;host=host2;cpu=3;mode=user
collect.cpu;host=host2;cpu=3;mode=wait
collect.cpu;host=host2;cpu=3;mode=system

For more information about using Graphite tags, see Graphite Tags.