Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

Accelerate TraceQL queries at scale with dedicated attribute columns in Grafana Tempo

Accelerate TraceQL queries at scale with dedicated attribute columns in Grafana Tempo

2024-01-22 7 min

With Grafana Tempo 2.3, we introduced a new storage format (vParquet3), which enabled an exciting new feature (dedicated attribute columns) that focused on the read path. Dedicated attribute columns offer a wide range of benefits primarily centered around query performance and memory usage.

These columns can improve read speed across most queries, and they can have a major impact on resource utilization. In this blog post we will shine a light on the benefits and technical details of this feature, and we’ll walk through an example to show the type of use case where this feature could apply in your systems.

Understanding TraceQL query performance when searching for attributes

Before we dive into some of the details relating to dedicated attribute columns, it’s helpful to understand some context around Tempo’s architecture. With TraceQL, you can execute complex searches across vast amounts of tracing data. Grafana Tempo’s backend format is built on Apache Parquet and is well-suited for such workloads. Apache Parquet supports storing highly structured and nested data, such as traces, in a columnar format while still preserving the original structure.

If you are already familiar with distributed tracing, you might know that traces commonly contain data with very high cardinality, such as unique IDs, timestamps, and user-defined attributes. Storing such data is challenging, especially when it comes to attributes. While Tempo stores some well-known attributes defined by the OpenTelemetry semantic conventions in dedicated columns, the vast majority of them still reside in generic key-value columns.

A simplified schema of a span with generic key-value columns for its attributes

The figure above illustrates a simplified schema of a span with generic key-value columns for its attributes. In production tracing data, it is common for spans to have dozens of attributes and the generic attribute columns can constitute up to 70% of the overall block size.

To illustrate this, let’s use the following TraceQL query as an example:

{ span.network.peer.address = “192.168.1.2” }

A substantial percentage of the block must be downloaded and searched in order to evaluate the query. Even with the efficient physical representation of the columnar format, this introduces a limiting bottleneck.

In fact, when we analyzed Tempo blocks before creating this feature, we found that, among generic attribute columns, the 10 most frequently used span attributes typically make up over 50% of the total attribute size. Strategically selecting only a few attributes and storing them in dedicated columns can therefore improve the TraceQL performance of queries containing assertion on the selected attributes. 

Introducing dedicated attribute columns

When you enable vParquet3 in your Grafana Tempo installation, you now have the ability to configure up to 10 attributes with string values at both the resource and span levels. Tempo will read this configuration and store the selected attributes in dedicated columns, instead of using the generic attribute columns. This brings two key advantages:

  • TraceQL queries with assertions about the selected attributes exhibit significantly improved performance.
  • The reduction in the size of the generic attribute columns enhances the overall search performance for all other attributes as well. 

This enhancement is implemented by incorporating 10 spare attribute columns on the resource and span levels into the vParquet3 schema. Without dedicated attribute column configuration, these columns remain empty. However, when a dedicated attribute column configuration is provided, Tempo dynamically assigns the configured attributes to their respective spare column.

How to configure dedicated attribute columns for your workload

A requirement to start using dedicated columns—other than running Tempo v2.3 or greater—is to decide which attributes to select.

There are two main strategies to approach this:

  1. Analyze data patterns and identify attributes that use the most space
  2. Analyze query patterns and select the attributes that are used the most to search by 

Out of the two approaches, analyzing data is the most reliable and straightforward method of creating good dedicated column configurations. Because of that, we’ve added a tempo-cli subcommand to help with this task.

The tool tempo-cli analyse blocks is a wrapper of the utility tool parquet-cli made to work with a Tempo deployment. The command will output the top N attributes by size for a given number of blocks in a tenant.

tempo-cli analyse blocks <args>

Top 15 span attributes by size
name: db.statement        size: 343 MB   (72.43%)
name: trace_pipeline      size: 24 MB    (5.01%)
name: db.sql.table        size: 20 MB    (4.25%)
name: http.host           size: 11 MB    (2.24%)
name: db.system           size: 10 MB    (2.19%)
...
Top 15 resource attributes by size
name: k8s.node.name                  size: 65 MB    (13.39%)
name: k8s.pod.uid                    size: 58 MB    (11.81%)
name: service.instance.id            size: 53 MB    (10.81%)
name: k8s.pod.start_time             size: 46 MB    (9.51%)
name: k8s.replicaset.uid             size: 42 MB    (8.61%)
...

This information can be used then to create an effective configuration for dedicated columns.

yaml
storage:
 parquet_dedicated_columns:
 # Span-level attributes
 - scope: span
   name: db.statement
   type: string
 - scope: span
   name: trace_pipeline
   type: string
 - scope: span
   name: db.sql.table
   type: string
 - scope: span
   name: http.host
   type: string
 - scope: span
   name: db.system
   type: string
 # Resource-level attributes
 - scope: resource
   name: k8s.node.name
   type: string
 - scope: resource
   name: k8s.pod.uid
   type: string
 - scope: resource
   name: service.instance.id
   type: string
 - scope: resource
   name: k8s.pod.start_time
   type: string
 - scope: resource
   name: k8s.replicaset.uid
   type: string

How often should I update my config?

As we just mentioned, this new feature does require a configuration change. Dedicated attribute column configurations are not meant to be changed too frequently, as each new configuration creates a new “shard” in the block space. This means that only blocks with the same configuration can be compacted together.

A diagram shows the path for two different shards

Unless there are significant changes in the produced traces, such as new systems being instrumented, we don’t recommend that you change column configurations too frequently. A good rule of thumb is using a periodicity greater than the compaction interval. Blocks of different intervals are never compacted together, so long-term storage isn’t affected by config changes.

And remember, there’s no “bad config” — the impact of using this feature goes from negligible to very positive. It’s safe to start with any configuration and optimize it iteratively when it becomes necessary. 

A real world example

Thus far, we’ve been talking mostly about theory; it’s time we put what we’ve learned into practice.

During our first rollouts of dedicated column configurations to production Tempo deployments, we found that, on top of accelerating search speeds as we predicted, there was also a great reduction in memory usage during queries, thanks primarily to storing high cardinality data in their own dedicated columns.

We’ve tested this in clusters of more than 600 MB/s and 1.2 million spans/s, with great success. During a load test in one of these clusters, we experienced decreases of tail latency of up to 75%, and decreases of CPU and memory usage in Tempo’s queriers of up to 70% and 50%, respectively.

Grafana dashboards display resource usage, with a significant drop when vParquet3 was introduced.
Grafana dashboards display frontend and querier latency, with a significant drop when vParquet3 was introduced.

While latency decrease was expected, a decrease in memory usage was a welcomed surprise. To explain these improvements, we need to refresh our memory and briefly discuss the Parquet format.

As we saw in the beginning of the blog post, Tempo uses a generic column where attributes are stored as key-value pairs. For performance reasons, we use a dictionary encoding for string values in this generic column. This, together with the high cardinality we also discussed earlier, can result in very big dictionary sizes, which are kept in memory.

So, moving high cardinality attributes to dedicated columns, and therefore having their own dictionaries, results in better memory performance.

And one final note: While unique IDs might seem like the primary source of cardinality, what we’ve encountered most frequently are attributes like db.statement, in which entire SQL queries are logged, with all their contents. This results in very long strings that often contain very random data. Look out for those!

What’s next for dedicated attribute columns?

As we’ve shown in this blog, dedicated attribute columns can lead to substantial performance improvements. Yet the current implementation does have its constraints. 

The use of spare columns means that the type, encoding, and count of dedicated attribute columns are pre-defined by the schema. In vParquet3, users are restricted to defining 10 resource and span attributes with string values, all utilizing dictionary encoding. Addressing these constraints could be achieved through an implementation based on a dynamic schema that is defined at runtime.

In addition, enabling dedicated attribute columns detailed in the configuration section demands time and effort. You need to analyze data for each tenant, interpret the results, craft configurations, and deploy them. This works well for a smaller number of tenants. Unfortunately, this approach does not scale for Tempo installations with hundreds of tenants. 

In future releases, we envision having incoming traffic analyzed on-the-fly and dedicated column configurations generated automatically. We want to make using dedicated attribute columns as simple as adding “automated_dedicated_attribute_columns: true” to Tempo’s configuration.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!