Simplify managing Grafana Tempo instances in Kubernetes with the Tempo Operator
Andreas Gerstmayr is a Software Engineer at Red Hat. He’s working on simplifying the deployment and operations of a modern distributed tracing stack using Tempo and OpenTelemetry on OpenShift.
I’ve been working with Grafana Tempo for about half a year now, and one thing I like about it is that Tempo requires only object storage for storing traces, which is easy to set up in both cloud environments and on-premises. Another outstanding feature is TraceQL, which allows searching for relevant traces with a powerful query language.
Now, let’s imagine you’re a busy system administrator who wants to set up Tempo in your Kubernetes cluster. Even though you’ve read through the Tempo documentation and know Tempo is extremely flexible, you’re overwhelmed by the number of configuration settings and deployment options.
I’m here with good news: There’s a solution for that!
One thing my team has been working on lately is the new Tempo operator, which simplifies deploying a Tempo stack on Kubernetes. It creates and manages all required objects, exposes metrics, and supports upgrading the Tempo instance in the cluster. In this post, I’ll walk you through how to install it.
What is the Tempo operator?
If you’re familiar with Kubernetes, you probably know that a Kubernetes operator extends the Kubernetes API by creating and managing a new Custom Resource. Similarly, the Tempo operator creates a new TempoStack
custom resource. In the same way a Kubernetes deployment creates one or more Pods, a TempoStack
instance creates all objects (Deployments, StatefulSets, Services, ConfigMaps etc.) required to manage a Tempo cluster in the microservices mode.
The operator continuously watches the cluster and converges the current state to match the expected state as defined in the TempoStack
object. What makes an operator stand out from other deployment methods (manifest files, Helm charts), is that it can dynamically react to changes, such as a high load, and perform actions (like increasing the number of replicas of a component).
Installing the operator
Note: The following instructions were tested on Kubernetes v1.26.3 and OpenShift v4.12.
One way to install the operator is via the Operator Lifecycle Manager. To do that, visit the tempo-operator page on OperatorHub and follow the installation instructions.
An alternative is to install it by applying Kubernetes manifests directly to the cluster. This requires having cert-manager installed in the cluster. If it’s not there already, please follow the cert-manager installation instructions. Once this step is completed, run the following to install the Tempo operator:
kubectl apply -f
https://github.com/grafana/tempo-operator/releases/latest/download/tempo-operator.yaml
You can verify the installation by listing the pods in the operator namespace (tempo-operator-system
when installed via manifests):
$ kubectl -n tempo-operator-system get pod
NAME READY STATUS RESTARTS AGE
tempo-operator-controller-7cd46dcd4-gs47m 2/2 Running 0 14s
Setting up object storage
Tempo supports various object storages like Amazon S3, Azure Storage, and Google Cloud Storage for storing traces. You can also use any S3 compatible object storage, for example MinIO.
In this example, we’ll use MinIO for storage. Run the following to set up a basic MinIO instance in the minio
namespace. (It is intended for testing purposes only.)
kubectl apply -f
https://raw.githubusercontent.com/grafana/tempo-operator/41d57e9ec1f78bc9789d3cf55241b2fed2faa269/minio.yaml
In the next step, we configure access to the object storage:
apiVersion: v1
kind: Secret
metadata:
name: tempo-storage
type: Opaque
stringData:
endpoint: http://minio.minio:9000
bucket: tempo
access_key_id: tempo
access_key_secret: supersecret
Deploying a Tempo cluster
The final step is to configure a Tempo cluster. This manifest creates a basic, ready-to-use one:
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
name: tempostack1
spec:
storage:
secret:
name: tempo-storage
type: s3
storageSize: 2Gi
You can watch your brand new Tempo cluster being created:
kubectl get pod -l app.kubernetes.io/instance=tempostack1 --watch
Run the following command to confirm that all pods and services are created and ready:
$ kubectl get pod,svc -l app.kubernetes.io/instance=tempostack1
NAME READY STATUS RESTARTS AGE
pod/tempo-tempostack1-compactor-75dc75d565-jxzrh 1/1 Running 0 84s
pod/tempo-tempostack1-distributor-64d486d5b6-smwhb 1/1 Running 0 84s
pod/tempo-tempostack1-ingester-0 1/1 Running 0 84s
pod/tempo-tempostack1-querier-7f95f8dbf5-hhvmh 1/1 Running 0 84s
pod/tempo-tempostack1-query-frontend-5c49496898-fbldg 1/1 Running 0 84s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tempo-tempostack1-compactor ClusterIP 10.109.222.156 <none> 7946/TCP,3200/TCP 84s
service/tempo-tempostack1-distributor ClusterIP 10.109.177.3 <none> 4317/TCP,3200/TCP 84s
service/tempo-tempostack1-gossip-ring ClusterIP None <none> 7946/TCP 84s
service/tempo-tempostack1-ingester ClusterIP 10.98.10.112 <none> 3200/TCP,9095/TCP 84s
service/tempo-tempostack1-querier ClusterIP 10.102.129.172 <none> 7946/TCP,3200/TCP,9095/TCP 84s
service/tempo-tempostack1-query-frontend ClusterIP 10.110.69.170 <none> 3200/TCP,9095/TCP 84s
service/tempo-tempostack1-query-frontend-discovery ClusterIP None <none> 3200/TCP,9095/TCP,9096/TCP 84s
Sending traces and configuring Grafana
Now it’s time to get traces into Tempo. All you have to do is point your application, Grafana Agent, or OpenTelemetry collector to send OTLP traces via gRPC to:
tempo-tempostack1-distributor.default.svc.cluster.local:4317
To generate example traces, you can create the following job:
apiVersion: batch/v1
kind: Job
metadata:
name: generate-traces
spec:
template:
spec:
restartPolicy: Never
containers:
- name: tracegen
image: ghcr.io/grafana/xk6-client-tracing:v0.0.2
env:
- name: ENDPOINT
value: tempo-tempostack1-distributor.default.svc.cluster.local:4317
Let’s configure Grafana to view the traces.
Go to your Data source settings page in Grafana, click Add new data source, select Tempo, and enter http://tempo-tempostack1-query-frontend.default.svc.cluster.local:3200
in the URL field. Once you click the Save & test button, you should see a “Data source is working" info box. Now head over to the Explore page, select your newly created Tempo data source and start querying Tempo with TraceQL!
But wait, there’s more!
The Tempo operator also supports the following features:
Resource limits
Overall resource requests and limits can be specified in the TempoStack CR, and the operator will assign fractions of it to each component (for example the ingester typically requires more CPU than the query-frontend component).
Multitenancy
Traces of multiple tenants can be stored in the same Tempo cluster.
Jaeger UI
The operator can deploy a Jaeger UI container and expose it via Ingress.
Metrics
The operator exposes metrics about itself and can create ServiceMonitors for the Prometheus operator, which will scrape metrics of each Tempo component.
mTLS
Communication between the Tempo components can be secured via mTLS.
Looking ahead
I hope the Tempo operator makes it easier for system administrators and SREs to run Tempo clusters in production by delegating operational tasks such as upgrades, setting resource limits, configuring metrics and alerting, and configuring mTLS to the operator. We’re continuously working on improving the operator, and plan to add additional self-healing functionality in the future.
Want more information on the Tempo operator? Check out these resources:
- Source: https://github.com/grafana/tempo-operator
- Documentation: https://grafana.com/docs/tempo/latest/setup/operator/
- OperatorHub: https://operatorhub.io/operator/tempo-operator
- Grafana Community Slack: #tempo-operator
Want to share your Grafana story and dashboards with the community? Drop us a note at stories@grafana.com.