Simplify managing Grafana Tempo instances in Kubernetes with the Tempo Operator

• 2023-07-28 • 5 min

Andreas Gerstmayr is a Software Engineer at Red Hat. He’s working on simplifying the deployment and operations of a modern distributed tracing stack using Tempo and OpenTelemetry on OpenShift.

I’ve been working with Grafana Tempo for about half a year now, and one thing I like about it is that Tempo requires only object storage for storing traces, which is easy to set up in both cloud environments and on-premises. Another outstanding feature is TraceQL, which allows searching for relevant traces with a powerful query language.

Now, let’s imagine you’re a busy system administrator who wants to set up Tempo in your Kubernetes cluster. Even though you’ve read through the Tempo documentation and know Tempo is extremely flexible, you’re overwhelmed by the number of configuration settings and deployment options.

I’m here with good news: There’s a solution for that!

One thing my team has been working on lately is the new Tempo operator, which simplifies deploying a Tempo stack on Kubernetes. It creates and manages all required objects, exposes metrics, and supports upgrading the Tempo instance in the cluster. In this post, I’ll walk you through how to install it.

What is the Tempo operator?

If you’re familiar with Kubernetes, you probably know that a Kubernetes operator extends the Kubernetes API by creating and managing a new Custom Resource. Similarly, the Tempo operator creates a new TempoStack custom resource. In the same way a Kubernetes deployment creates one or more Pods, a TempoStack instance creates all objects (Deployments, StatefulSets, Services, ConfigMaps etc.) required to manage a Tempo cluster in the microservices mode.

The operator continuously watches the cluster and converges the current state to match the expected state as defined in the TempoStack object. What makes an operator stand out from other deployment methods (manifest files, Helm charts), is that it can dynamically react to changes, such as a high load, and perform actions (like increasing the number of replicas of a component).

Installing the operator

Note: The following instructions were tested on Kubernetes v1.26.3 and OpenShift v4.12.

One way to install the operator is via the Operator Lifecycle Manager. To do that, visit the tempo-operator page on OperatorHub and follow the installation instructions.

An alternative is to install it by applying Kubernetes manifests directly to the cluster. This requires having cert-manager installed in the cluster. If it’s not there already, please follow the cert-manager installation instructions. Once this step is completed, run the following to install the Tempo operator:

kubectl apply -f 
https://github.com/grafana/tempo-operator/releases/latest/download/tempo-operator.yaml

You can verify the installation by listing the pods in the operator namespace (tempo-operator-system when installed via manifests):

$ kubectl -n tempo-operator-system get pod
NAME                                            READY   STATUS RESTARTS   AGE
tempo-operator-controller-7cd46dcd4-gs47m   2/2 Running   0      14s

Setting up object storage

Tempo supports various object storages like Amazon S3, Azure Storage, and Google Cloud Storage for storing traces. You can also use any S3 compatible object storage, for example MinIO.

In this example, we’ll use MinIO for storage. Run the following to set up a basic MinIO instance in the minio namespace. (It is intended for testing purposes only.)

kubectl apply -f 
https://raw.githubusercontent.com/grafana/tempo-operator/41d57e9ec1f78bc9789d3cf55241b2fed2faa269/minio.yaml

In the next step, we configure access to the object storage:

apiVersion: v1
kind: Secret
metadata:
   name: tempo-storage
type: Opaque
stringData:
   endpoint: http://minio.minio:9000
   bucket: tempo
   access_key_id: tempo
   access_key_secret: supersecret

Deploying a Tempo cluster

The final step is to configure a Tempo cluster. This manifest creates a basic, ready-to-use one:

apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
   name: tempostack1
spec:
   storage:
      secret:
         name: tempo-storage
         type: s3
   storageSize: 2Gi

You can watch your brand new Tempo cluster being created:

kubectl get pod -l app.kubernetes.io/instance=tempostack1 --watch

Run the following command to confirm that all pods and services are created and ready:

$ kubectl get pod,svc -l app.kubernetes.io/instance=tempostack1
NAME                                                READY   STATUS RESTARTS   AGE
pod/tempo-tempostack1-compactor-75dc75d565-jxzrh    1/1 Running   0      84s
pod/tempo-tempostack1-distributor-64d486d5b6-smwhb  1/1 Running   0      84s
pod/tempo-tempostack1-ingester-0                    1/1 Running   0      84s
pod/tempo-tempostack1-querier-7f95f8dbf5-hhvmh      1/1 Running   0      84s
pod/tempo-tempostack1-query-frontend-5c49496898-fbldg   1/1 Running   0      84s

NAME                                             TYPE    CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
service/tempo-tempostack1-compactor              ClusterIP   10.109.222.156   <none>    7946/TCP,3200/TCP        84s
service/tempo-tempostack1-distributor            ClusterIP   10.109.177.3 <none>    4317/TCP,3200/TCP        84s
service/tempo-tempostack1-gossip-ring            ClusterIP   None         <none>    7946/TCP                 84s
service/tempo-tempostack1-ingester               ClusterIP   10.98.10.112 <none>    3200/TCP,9095/TCP        84s
service/tempo-tempostack1-querier                ClusterIP   10.102.129.172   <none>    7946/TCP,3200/TCP,9095/TCP   84s
service/tempo-tempostack1-query-frontend         ClusterIP   10.110.69.170 <none>    3200/TCP,9095/TCP        84s
service/tempo-tempostack1-query-frontend-discovery   ClusterIP   None         <none>    3200/TCP,9095/TCP,9096/TCP   84s

Sending traces and configuring Grafana

Now it’s time to get traces into Tempo. All you have to do is point your application, Grafana Agent, or OpenTelemetry collector to send OTLP traces via gRPC to:

tempo-tempostack1-distributor.default.svc.cluster.local:4317

To generate example traces, you can create the following job:

apiVersion: batch/v1
kind: Job
metadata:
  name: generate-traces
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: tracegen
        image: ghcr.io/grafana/xk6-client-tracing:v0.0.2
        env:
        - name: ENDPOINT
          value: tempo-tempostack1-distributor.default.svc.cluster.local:4317

Let’s configure Grafana to view the traces.

Go to your Data source settings page in Grafana, click Add new data source, select Tempo, and enter http://tempo-tempostack1-query-frontend.default.svc.cluster.local:3200 in the URL field. Once you click the Save & test button, you should see a “Data source is working" info box. Now head over to the Explore page, select your newly created Tempo data source and start querying Tempo with TraceQL!

A screenshot of a Grafana dashboard with search results for traces using TraceQL — Searching for traces using TraceQL in Grafana.

But wait, there’s more!

The Tempo operator also supports the following features:

Resource limits

Overall resource requests and limits can be specified in the TempoStack CR, and the operator will assign fractions of it to each component (for example the ingester typically requires more CPU than the query-frontend component).

Multitenancy

Traces of multiple tenants can be stored in the same Tempo cluster.

Jaeger UI

The operator can deploy a Jaeger UI container and expose it via Ingress.

Metrics

The operator exposes metrics about itself and can create ServiceMonitors for the Prometheus operator, which will scrape metrics of each Tempo component.

mTLS

Communication between the Tempo components can be secured via mTLS.

Looking ahead

I hope the Tempo operator makes it easier for system administrators and SREs to run Tempo clusters in production by delegating operational tasks such as upgrades, setting resource limits, configuring metrics and alerting, and configuring mTLS to the operator. We’re continuously working on improving the operator, and plan to add additional self-healing functionality in the future.

Want more information on the Tempo operator? Check out these resources:

Source: https://github.com/grafana/tempo-operator
Documentation: https://grafana.com/docs/tempo/latest/setup/operator/
OperatorHub: https://operatorhub.io/operator/tempo-operator
Grafana Community Slack: #tempo-operator

Want to share your Grafana story and dashboards with the community? Drop us a note at stories@grafana.com.

Feedback

Relevant sources:

Feedback

Simplify managing Grafana Tempo instances in Kubernetes with the Tempo Operator

What is the Tempo operator?

Installing the operator

Setting up object storage

Deploying a Tempo cluster

Sending traces and configuring Grafana

But wait, there’s more!

Resource limits

Multitenancy

Jaeger UI

Metrics

mTLS

Looking ahead

Related content

Simplify managing Grafana Tempo instances in Kubernetes with the Tempo Operator

What is the Tempo operator?

Installing the operator

Setting up object storage

Deploying a Tempo cluster

Sending traces and configuring Grafana

But wait, there’s more!

Resource limits

Multitenancy

Jaeger UI

Metrics

mTLS

Looking ahead

Related content

Databases and SLOs: How to apply service level objectives to your databases with synthetic...

Demystifying the OpenTelemetry Operator: Observing Kubernetes applications without writing code

10 trending topics in the Grafana community