How relabeling in Prometheus works
Relabeling is a powerful tool that allows you to classify and filter Prometheus targets and metrics by rewriting their label set.
The purpose of this post is to explain the value of Prometheus’ relabel_config
block, the different places where it can be found, and its usefulness in taming Prometheus metrics. Much of the content here also applies to Grafana Agent users.
For reference, here’s our guide to Reducing Prometheus metrics usage with relabeling.
So without further ado, let’s get into it!
Prometheus labels
Labels are sets of key-value pairs that allow us to characterize and organize what’s actually being measured in a Prometheus metric.
For example, when measuring HTTP latency, we might use labels to record the HTTP method and status returned, which endpoint was called, and which server was responsible for the request.
Each unique combination of key-value label pairs is stored as a new time series in Prometheus, so labels are crucial for understanding the data’s cardinality and unbounded sets of values should be avoided as labels.
Internal labels
But what about metrics with no labels? Prometheus also provides some internal labels for us. These begin with two underscores and are removed after all relabeling steps are applied; that means they will not be available unless we explicitly configure them to.
Some of these special labels available to us are
Label name | Description |
---|---|
__name__ | The scraped metric’s name |
__address__ | host:port of the scrape target |
__scheme__ | URI scheme of the scrape target |
__metrics_path__ | Metrics endpoint of the scrape target |
__param_<name> | |
__scrape_interval__ | The target’s scrape interval (experimental) |
__scrape_timeout__ | The target’s timeout (experimental) |
__meta_ | Special labels set set by the Service Discovery mechanism |
__tmp | Special prefix used to temporarily store label values before discarding them |
So now that we understand what the input is for the various relabel_config rules, how do we create one? And what can they actually be used for?
Stages of application
One source of confusion around relabeling rules is that they can be found in multiple parts of a Prometheus config file.
# A list of scrape configurations.
scrape_configs:
- job_name: "some scrape job"
...
# List of target relabel configurations.
relabel_configs:
[ - <relabel_config> ... ]
# List of metric relabel configurations.
metric_relabel_configs:
[ - <relabel_config> ... ]
# Settings related to the remote write.
remote_write:
url: https://remote-write-endpoint.com/api/v1/push
...
# List of remote write relabel configurations.
write_relabel_configs:
[ - <relabel_config> ... ]
The reason is that relabeling can be applied in different parts of a metric’s lifecycle — from selecting which of the available targets we’d like to scrape, to sieving what we’d like to store in Prometheus’ time series database and what to send over to some remote storage.
First off, the relabel_configs
key can be found as part of a scrape job definition. These relabeling steps are applied before the scrape occurs and only have access to labels added by Prometheus’ Service Discovery. They allow us to filter the targets returned by our SD mechanism, as well as manipulate the labels it sets.
Once the targets have been defined, the metric_relabel_configs
steps are applied after the scrape and allow us to select which series we would like to ingest into Prometheus’ storage.
Finally, the write_relabel_configs
block applies relabeling rules to the data just before it’s sent to a remote endpoint. This can be used to filter metrics with high cardinality or route metrics to specific remote_write targets.
The base <relabel_config> block
A <relabel_config>
consists of seven fields.
These are:
- source_labels
- separator (default = ;)
- target_label
- regex (default = (.*))
- modulus
- replacement (default = $1)
- action (default = replace)
A Prometheus configuration may contain an array of relabeling steps; they are applied to the label set in the order they’re defined in. Omitted fields take on their default value, so these steps will usually be shorter.
source_labels and separator
Let’s start off with source_labels
. It expects an array of one or more label names, which are used to select the respective label values. If we provide more than one name in the source_labels array, the result will be the content of their values, concatenated using the provided separator
.
As an example, consider the following two metrics
my_custom_counter_total{server="webserver01",subsystem="kata"} 192 1644075044000
my_custom_counter_total{server="sqldatabase",subsystem="kata"} 147 1644075044000
The following relabel_config
source_labels: [subsystem, server]
separator: "@"
would extract these values.
kata@webserver01
kata@sqldatabase
regex
The regex
field expects a valid RE2 regular expression and is used to match the extracted value from the combination of the source_label
and separator
fields. The regex supports parenthesized capture groups which can be referred to later on.
This block would match the two values we previously extracted
source_labels: [subsystem, server]
separator: "@"
regex: "kata@(.*)"
However, this block would not match the previous labels and would abort the execution of this specific relabel step
source_labels: [subsystem, server]
separator: "@"
regex: "(.*)@redis"
The default regex value is (.*), so if not specified, it will match the entire input.
replacement
If the extracted value matches the given regex, then replacement
gets populated by performing a regex replace and utilizing any previously defined capture groups.
Going back to our extracted values, and a block like this
source_labels: [subsystem, server]
separator: "@"
regex: "(.*)@(.*)"
replacement: "${2}/${1}"
would result in capturing what’s before and after the @
symbol, swapping them around, and separating them with a slash.
webserver01/kata
sqldatabase/kata
The default value of the replacement is $1, so it will match the first capture group from the regex or the entire extracted value if no regex was specified.
target_label
If the relabel action results in a value being written to some label, target_label
defines to which label the replacement should be written.
For example, the following block would set a label like {env="production"}
replacement: "production"
target_label: "env"
action: "replace"
While, continuing with the previous example, this relabeling step would set the replacement
value to “my_new_label”
- source_labels: [subsystem, server]
separator: "@"
regex: "(.*)@(.*)"
replacement: "${2}/${1}"
target_label: "my_new_label"
action: "replace"
resulting in
{my_new_label="webserver01/kata"}
{my_new_label="sqldatabase/kata"}
modulus
Finally, the modulus
field expects a positive integer. The relabel_config step will use this number to populate the target_label with the result of the MD5(extracted value) % modulus
expression.
Available actions
We’ve come a long way, but we’re finally getting somewhere. Now what can we do with those building blocks? How can they help us in our day-to-day work?
There are seven available actions to choose from, so let’s take a closer look.
keep/drop
The keep and drop actions allow us to filter out targets and metrics based on whether our label values match the provided regex.
Let’s go back to our previous example
my_custom_counter_total{server="webserver01",subsystem="kata"} 192 1644075074000
my_custom_counter_total{server="sqldatabase",subsystem="kata"} 14700 1644075074000
After concatenating the contents of the subsystem
and server
labels, we could drop the target which exposes webserver-01
by using the following block
- source_labels: [subsystem, server]
separator: "@"
regex: "kata@webserver"
action: "drop"
Or if we were in an environment with multiple subsystems but only wanted to monitor kata
, we could keep specific targets or metrics about it and drop everything related to other services.
- source_labels: [subsystem, server]
separator: "@"
regex: "kata@(.*)"
action: keep
In many cases, here’s where internal labels come into play.
You can, for example, only keep specific metric names.
- source_labels: [__name__]
regex: “my_custom_counter_total|my_custom_counter_sum|my_custom_gauge”
action: keep
Or if you’re using Prometheus’ Kubernetes service discovery you might want to drop all targets from your testing
or staging
namespaces.
- source_labels: [__meta_kubernetes_namespace]
regex: “testing|staging”
action: drop
labelkeep/labeldrop
The labelkeep and labeldrop actions allow for filtering the label set itself.
In the previous example, we may not be interested in keeping track of specific subsystems
labels anymore.
The following relabeling would remove all {subsystem="<name>"}
labels but keep other labels intact.
- regex: "subsystem"
action: labeldrop
Of course, we can do the opposite and only keep a specific set of labels and drop everything else.
- regex: "subsystem|server|shard"
action: labelkeep
We must make sure that all metrics are still uniquely labeled after applying labelkeep and labeldrop rules.
replace
Replace is the default action for a relabeling rule if we haven’t specified one; it allows us to overwrite the value of a single label by the contents of the replacement
field.
As we saw before, the following block will set the env label to the replacement
provided, so {env="production"}
will be added to the labelset.
- action: replace
replacement: production
target_label: env
The replace
action is most useful when you combine it with other fields.
Here’s another example:
- action: replace
source_labels: [__meta_kubernetes_pod_name,__meta_kubernetes_pod_container_port_number]
separator: ":"
target_label: address
The above snippet will concatenate the values stored in __meta_kubernetes_pod_name
and __meta_kubernetes_pod_container_port_number
.
The extracted string would then be set written out to the target_label
and might result in {address="podname:8080}
.
hashmod
The hashmod action provides a mechanism for horizontally scaling Prometheus.
The relabeling step calculates the MD5 hash of the concatenated label values modulo a positive integer N, resulting in a number in the range [0, N-1].
An example might make this clearer. Consider the following metric and relabeling step
my_custom_metric{name="node",val="42"} 100
- action: hashmod
source_labels: [name, val]
separator: "-"
modulus: 8
target_label: __tmp_hashmod
The result of the concatenation is the string “node-42” and the MD5 of the string modulus 8 is 5
.
$ python3
>>> import hashlib
>>> m = hashlib.md5(b"node-42")
>>> int(m.hexdigest(), 16) % 8
5
So ultimately {__tmp=5}
would be appended to the metric’s label set.
This is most commonly used for sharding multiple targets across a fleet of Prometheus instances. The following rule could be used to distribute the load between 8 Prometheus instances, each responsible for scraping the subset of targets that end up producing a certain value in the [0, 7] range, and ignoring all others.
- action: keep
source_labels: [__tmp_hashmod]
regex: 5
labelmap
The labelmap action is used to map one or more label pairs to different label names.
Any label pairs whose names match the provided regex will be copied with the new label name given in the replacement
field, by utilizing group references (${1}, ${2}, etc).
The replacement
field defaults to just $1
, the first captured regex, so it’s sometimes omitted.
Here’s an example. If we’re using Prometheus’ Kubernetes SD, our targets would temporarily expose some labels such as:
__meta_kubernetes_node_name: The name of the node object.
__meta_kubernetes_node_provider_id: The cloud provider's name for the node object.
__meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.
…
__meta_kubernetes_namespace: The namespace of the service object.
__meta_kubernetes_service_external_name: The DNS name of the service. (Applies to services of type ExternalName)
__meta_kubernetes_service_name: The name of the service object.
__meta_kubernetes_service_port_name: Name of the service port for the target.
…
__meta_kubernetes_pod_name: The name of the pod object.
__meta_kubernetes_pod_ip: The pod IP of the pod object.
__meta_kubernetes_pod_container_init: true if the container is an InitContainer
__meta_kubernetes_pod_container_name: Name of the container the target address points to.
…
Labels starting with double underscores will be removed by Prometheus after relabeling steps are applied, so we can use labelmap
to preserve them by mapping them to a different name.
- action: labelmap
regex: "__meta_kubernetes_(.*)"
replacement: "k8s_${1}"
Common use cases for relabeling in Prometheus
Here’s a small list of common use cases for relabeling, and where the appropriate place is for adding relabeling steps.
- When you want to ignore a subset of applications; use relabel_config
- When splitting targets between multiple Prometheus servers; use relabel_config + hashmod
- When you want to ignore a subset of high cardinality metrics; use metric_relabel_config
- When sending different metrics to different endpoints; use write_relabel_config
Learn more
That’s all for today! Hope you learned a thing or two about relabeling rules and that you’re more comfortable with using them. For more information, check out our documentation and read more in the Prometheus documentation.
Grafana Cloud is the easiest way to get started with metrics, logs, traces, and dashboards. We have a generous free forever tier and plans for every use case. Sign up for free now!