Grafana Tempo 2.6 release: performance improvements and new TraceQL features
Grafana Tempo 2.6 is here with performance improvements and buckets of new TraceQL features!
Watch the video above for an overview of the new TraceQL features, or continue reading to get a quick overview of the latest updates in Tempo. If you’re looking for something more in-depth, don’t hesitate to jump into the Grafana Tempo 2.6 release notes or the changelog.
TraceQL features
TraceQL can’t be stopped! We have a slew of new features all built on the vParquet4 backend, which has been upgraded to the default block format in Tempo.
Events
Span events are an important method of communicating when something of note occurs during a span. Common uses for span events are noting when a network connection is established or lock is obtained.
TraceQL now supports the discovery of events by name, custom attributes, or the time since the start of a span. With this simple query, you can find any events in your database:
{ event:name != "" }
And check out this query, which is finding exceptions in our traces and quickly surfacing details:
Links
Span links are a way for a single trace to link to others. Grafana has long supported span links, but now you can query for them directly using TraceQL. This easy query will find all span links in your database:
{ link:traceID != "" }
TraceQL supports querying for span links based on the trace and span they link to, or any custom attributes you attach to the link.
Arrays
vParquet4 natively and seamlessly supports arrays. For instance, if your OpenTelemetry instrumentation is storing HTTP headers in arrays, the following query will now correctly search all values in the array and match the span if the http.request.header.Accept-Encoding
array contains the value gzip
.
{ span.http.request.header.Accept-Encoding = "gzip" }
We intend to support arrays more explicitly in the future, but this is a great start that covers most use cases.
New operational patterns
We are removing RF3 (Replication Factor 3) metrics from Tempo in 2.6 and replacing them with a more performant RF1 implementation. As discussed below, we intend to productionize this implementation before we mark TraceQL metrics GA.
If you are running TraceQL metrics and would like to continue doing so in Tempo 2.6, you will need to configure the metrics-generator to generate local blocks for recent data, as well as flush these blocks to the backend for historical data. Please see the release notes for details.
Moving to RF1 TraceQL metrics is taking a bit more time, but we know the wait is worth it.
Our team’s primary focus is to settle on an operational pattern that meets our performance, durability, and availability goals. Once complete, we expect overall lower TCO, production-ready metrics, and more performant TraceQL search. Expect great things!
Oh my goodness!
There’s so much going on in this release, I wasn’t even sure what to highlight next. How about features like native histogram support in the metrics-generator, exemplars in TraceQL metrics, or the super mysterious compare function that is in no way related to a soon-to-be-announced app.
Maybe I should mention the crazy number of performance improvements that impact search and polling performance. Or, if you’re operating a multi-tenant cluster, you’d probably be interested in options to block dangerous queries or concurrent polling!
In particular, I’d like to call attention to improvements in memory consumption of simply polling and maintaining the blocklist. This additional resource consumption has been driven partly by an increase in blocks due to RF1 and partly by the additional complexity introduced by dedicated columns. Three back-to-back PRs massively reduced the steady state consumption of blocklist polling, and we’re working on more improvements!
Check out the release notes for an in-depth summary of all the goodies in Tempo 2.6!
What’s next in Grafana Tempo?
There are two main initiatives in Tempo right now. The first is to GA TraceQL metrics, which depends on the second initiative: RF1 re-architecture.
We were unable to dedupe RF3 in the storage layer for metrics and hit our performance goals. Instead, we have decided to productionize an RF1 architecture. This re-architecture will likely require a queue and may increase operational complexity in exchange for reduced TCO and increased performance. If you are interested in impacting this conversation, please join the monthly Tempo community call. This is our primary focus!
If you are interested in hearing more about Grafana Tempo news or search progress, please join us on the Grafana Labs Community Slack channel #tempo, post a question in our community forums, reach out on X (formerly Twitter), or join our monthly Tempo community call. See you there!
And if you want to get even closer to where the magic happens, why not have a look at our open positions at Grafana Labs?
The easiest way to get started with Grafana Tempo is with Grafana Cloud, and our free forever tier now includes 50GB of traces along with 50GB of logs and 10K series of metrics. Sign up today for free!