Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Monitoring Tempo
Tempo is instrumented to expose metrics, logs and traces. Additionally, the Tempo repository has a mixin that includes a set of dashboards, rules and alerts. Together, these can be used to monitor Tempo in production.
Instrumentation
Tempo is already instrumented with metrics, logs and traces. These can be collected to observe Tempo.
Metrics
Tempo is instrumented with Prometheus metrics. It emits RED metrics for most services and backends. The Tempo mixin provides several dashboards using these metrics.
Logs
Tempo emits logs in the key=value
(logfmt) format.
Traces
Tempo uses the Jaeger Golang SDK for tracing instrumentation. As of this writing, the complete read path and some parts of the write of Tempo are instrumented for tracing.
The tracer can be configured using environment variables.
To enable tracing, set one of the following: JAEGER_AGENT_HOST
and JAEGER_AGENT_PORT
, or JAEGER_ENDPOINT
.
The Jaeger client uses remote sampling by default, if the management server is not available no traces will be sent. To always send traces (no sampling), set the following environment variables:
JAEGER_SAMPLER_TYPE=const
JAEGER_SAMPLER_PARAM=1
Dashboards
The Tempo mixin has four Grafana dashboards in the yamls
folder that you can download and import into your Grafana UI.
Note that at the moment, these work well when Tempo is run in a k8s environment and metrics scraped have the
cluster
and namespace
labels!
Tempo / Reads
This is available as
tempo-reads.json
.
The Reads dashboard gives information information on Requests, Errors and Duration (R.E.D) on the Query Path of Tempo. Each query touches the Gateway, Tempo-Query, Query-Frontend, Queriers, Ingesters, Cache (if present) and the backend.
Use this dashboard to monitor the performance of each of the above mentioned components and to decide the number of replicas in each deployment.
Tempo / Writes
This is available as
tempo-writes.json
.
The Reads dashboard gives information information on Requests, Errors and Duration (R.E.D) on the write/ingest Path of Tempo. A write query touches the Gateway, Distributors, Ingesters and eventually the backend. This dashboard also gives information on the number of operations performed by the Compactor to the backend.
Use this dashboard to monitor the performance of each of the above mentioned components and to decide the number of replicas in each deployment.
Tempo / Resources
This is available as
tempo-resources.json
.
The Resources dashboard provides information on CPU
, Container Memory
and Go Heap Inuse
, and is useful for resource
provisioning for the different Tempo components.
Use this dashboard to see if any components are running close to their assigned limits!
Tempo / Operational
This is available as
tempo-operational.json
.
The Tempo Operational dashboard deserves special mention b/c it probably a stack of dashboard anti-patterns. It’s big and complex, doesn’t use jsonnet and displays far too many metrics in one place. And I love it. For just getting started the Reads, Write and Resources dashboards are great places to learn how to monitor Tempo in an opaque way.
This dashboard is included in this repo for two reasons:
- It provides a stack of metrics for other operators to consider monitoring while running Tempo.
- We want it in our internal infrastructure and we vendor the tempo-mixin to do this.
Rules and Alerts
The Rules and Alerts are available as yaml files in the mixin on the repository.
To set up alerting, download the provided json files and configure them for use on your Prometheus monitoring server.
Check the runbook to understand the various steps that can be taken to fix firing alerts!