Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Overview
Grafana Enterprise Metrics provides the ability to understand the cardinality of your metrics and labels using Cardinality analysis dashboards that are shipped with the Grafana Enterprise Metrics plugin or via an API.
The APIs and dashboards help you understand the active time series in GEM. An active time series is one that has not yet been written to long-term storage.
Configuration
The API endpoints are disabled by default. Use one of the following approaches to enable or disable the endpoints for all tenants:
- Add the CLI flag
-querier.cardinality-analysis-enabled=true
. - Set
cardinality_analysis_enabled
totrue
in thelimits
section of the global configuration file as shown below:
limits:
cardinality_analysis_enabled: true
To selectively disable the endpoints for some tenants (if it’s been enabled for all tenants), or enable the endpoints for some tenants (when it is globally disabled), use the Runtime Configuration file.
Limitations
- The cardinality analysis dashboards only work for single-tenant data sources. Similarly, the cardinality analysis APIs will only return cardinality information for a single tenant at a time. You cannot get a global view of the cardinality of multiple tenants simultaneously. This means that any call to the API where you provide multiple tenants in the
username
field will fail. For example,team-a|team-b
will fail, butteam-a
orteam-b
will succeed. - The cardinality analysis dashboards do not work for data sources that use label-based access controls. Similarly, calls to the cardinality analysis APIs that use a token for an access policy with label selectors also fail.
- The cardinality analysis APIs and dashboards will only work if you are running GEM using block storage. They are incompatible with chunks storage.
Operational considerations
We do not expect this new and experimental API to negatively affect the performance of ingesters in a GEM cluster. To be sure, monitor the cluster after enabling this feature.
To monitor the performance of the cardinality endpoints, use the exposed GEM API endpoints metrics.
The following example query returns the queries-per-second to the cardinality analysis endpoints:
sum by (route) (
rate(cortex_querier_request_duration_seconds_sum{
route=~"prometheus_api_v1_cardinality_label_values|prometheus_api_v1_cardinality_label_names"
}[1m])
)
To monitor the performance of the whole cluster after enabling cardinality analysis, use the self-monitoring dashboards that are included in the GEM plugin.
Dashboards
The GEM plugin provides several useful dashboards that visualize and let you explore the data from this API.
Adding the cardinality analysis dashboards to Grafana Enterprise
The cardinality analysis dashboards are automatically installed if you install the Grafana Enterprise Metrics plugin. However, in the event that you do not see the dashboards or someone accidentally deletes them, add them back:
- Go to
Configuration > Plugins > Grafana Enterprise Metrics > Dashboards
- Install the dashboards :
Cardinality management - overview
,Cardinality management - metrics
andCardinality management - labels
.
Cardinality management - overview dashboard
This dashboard shows the cardinality for the selected data source.
Cardinality management - metrics dashboard
This dashboard helps you understand the cardinality of an individual metric. At the top of the dashboard, you can select which metric you want to explore.
Cardinality management - labels dashboard
This dashboard shows a cardinality report for the selected label. For a given label name, it shows you which label values are attached to the most series. It also shows you the highest cardinality metrics for a given label<>value pair.
HTTP API
You can use two API endpoints to understand a tenant’s metrics and label cardinality: label_names
(/api/v1/cardinality/label_names
) and label_values
(/api/v1/cardinality/label_values
). The cardinality analysis dashboards display information returned from these endpoints.
Because these endpoints generate their cardinality report using only values from currently opened TSDBs (time series databases) in the ingesters, two subsequent calls can return completely different results if an ingester cut or truncated an old block and opened a new one between calls.
Both API endpoints require authentication. Specifically, the user must provide a token which gives them metrics: read
access for that tenant.
Label names cardinality endpoint
GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_names
# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_names
Returns realtime label names cardinality across all ingesters, for the authenticated tenant, in JSON
format.
It counts distinct label values per label name.
The items in the field cardinality
are sorted by label_values_count
in descending order and by label_name
in ascending order.
The count of items returned is limited by limit
request parameter.
Request parameters
- selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
- limit - optional - specifies max count of items in field
cardinality
in response (default=20, min=0, max=500)
Example:
To understand which labels attached to the metric flower_events_created
have the most values, use the following command:
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2&selector=\{__name__='flower_events_created'\}" | jq
{
"label_values_count_total": 206,
"label_names_count": 12,
"cardinality": [
{
"label_name": "worker",
"label_values_count": 162
},
{
"label_name": "task",
"label_values_count": 29
}
]
}
From this we see that the metric flower_events_created
has 12 different label names attached to it. Across those 12 label names, there are 206 total values. The label “worker” has 162 values, and the label “task” has 29 values. Not shown are the other label names, since the sample command set limit=2
.
If the flower_events_created
selector were omitted, the API call
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2" | jq
would return the label names with the highest count of values across the entire tenant.
Response schema
{
"label_values_count_total": <number>,
"label_names_count": <number>,
"cardinality": [
{
"label_name": <string>,
"label_values_count": <number>
}
]
}
Label values cardinality endpoint
GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_values
# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_values
Returns realtime label values cardinality associated with request parameter label_names[]
across all ingesters, for the authenticated tenant, in JSON
format.
It returns the series count per label value for each label in the request parameter label_names[]
.
The items in the field labels
are sorted by series_count
in descending order and by label_name
in ascending order.
The items in the field cardinality
are sorted by series_count
in descending order and by label_value
in ascending order.
The count of cardinality
items is limited by request parameter limit
.
Request parameters
- label_names[] - required - specifies labels for which cardinality must be provided.
- selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
- limit - optional - specifies max count of items in field
cardinality
in response (default=20, min=0, max=500).
Example 1 (label values cardinality):
In case we want to understand which label values have the highest number of flower_events_created
series associated with them, we can execute:
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=worker&label_names[]=agent&limit=2&selector=\{__name__='flower_events_created'\}" | jq
{
"series_count_total": 5472781,
"labels": [
{
"label_name": "worker",
"label_values_count": 162,
"series_count": 1307,
"cardinality": [
{
"label_value": "aws-worker",
"series_count": 67
},
{
"label_value": "gcp-worker",
"series_count": 66
}
]
},
{
"label_name": "agent",
"label_values_count": 2,
"series_count": 11,
"cardinality": [
{
"label_value": "grafana-agent",
"series_count": 10
},
{
"label_value": "jaeger-agent",
"series_count": 1
}
]
}
]
}
From this, we see that there are 5,472,781 series with the metric name flower_events_created.
Of those 5,472,781 series, there are 67 series with worker=aws-worker
and 66 series with worker=gcp-worker
. From the series_count
, there are 1307 series with the label worker
(across all 162 values of worker
).
Similarly, of the 5,472,781 total series, there are 10 series with agent=grafana-agent
and 1 series with agent=jaeger-agent
. From the series_count
there are 11 total series with the label agent
(across all 2 values of agent
).
Example 2 (metric names cardinality):
In case we want to understand which metrics have the highest cardinality (i.e. have the most time series) you can look at
the cardinality of the __name__
label.
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=__name__&limit=2" | jq
{
"series_count_total": 1307,
"labels": [
{
"label_name": "__name__",
"label_values_count": 162,
"series_count": 1307,
"cardinality": [
{
"label_value": "flower_events_created",
"series_count": 67
},
{
"label_value": "flower_events_consumed",
"series_count": 66
}
]
}
]
}
In this example, there are 1307 total active time series for the tenant named tenant-id
. As there are 162 values for the label __name__
, we know this means there are 162 metrics for this tenant. The label_value
in the cardinality
part of the payload are the names of the highest cardinality metrics. In this example, we see that metric flower_events_created
has 67 series associated with it and metric flower_events_consumed
has 66 series associated with it.
Response schema
{
"series_count_total": <number>,
"labels": [
{
"label_name": <string>,
"label_values_count": <number>,
"series_count": <number>,
"cardinality": [
{
"label_value": <string>,
"series_count": <number>
}
]
}
]
}
- series_count_total - total number of series across opened TSDBs in all ingesters
- labels[].label_name - label name requested via the request parameter
label_names[]
- labels[].label_values_count - total number of label values for the label name (note that dependent on the
limit
request parameter it is possible that not all label values are present incardinality
) - labels[].series_count - total number of series having
labels[].label_name
- labels[].cardinality[].label_value - label value associated with
labels[].label_name
- labels[].cardinality[].series_count - total number of series having
label_value
forlabel_name