Span metrics
The span metrics processor generates metrics from ingested tracing data, including request, error, and duration (RED) metrics.
Span metrics generate two metrics:
- A counter that computes requests
- A histogram that tracks the distribution of durations of all requests
Span metrics are of particular interest if your system is not monitored with metrics, but it has distributed tracing implemented. You get out-of-the-box metrics from your tracing pipeline.
Even if you already have metrics, span metrics can provide in-depth monitoring of your system. The generated metrics will show application level insight into your monitoring, as far as tracing gets propagated through your applications.
Last but not least, span metrics lower the entry barrier for using exemplars. An exemplar is a specific trace representative of measurement taken in a given time interval. Since traces and metrics co-exist in the metrics-generator, exemplars can be automatically added, providing additional value to these metrics.
How to run
To enable span metrics in Tempo or Grafana Enterprise Traces, enable the metrics generator and add an overrides section which enables the span-metrics
processor.
Refer to the configuration details.
If you want to enable metrics-generator for your Grafana Cloud account, refer to the Metrics-generator in Grafana Cloud documentation.
How it works
The span metrics processor works by inspecting every received span and computing the total count and the duration of spans for every unique combination of dimensions. Dimensions can be the service name, the operation, the span kind, the status code and any attribute present in the span.
This processor mirrored the implementation from the OpenTelemetry Collector of the processor with the same name.
The OTel spanmetricsprocessor
has since been deprecated and replaced with the span metric connector.
Note
To learn more about cardinality and how to perform a dry run of the metrics generator, see the Cardinality documentation.
Metrics
The following metrics are exported:
Metric | Type | Labels | Description |
---|---|---|---|
traces_spanmetrics_latency | Histogram | Dimensions | Duration of the span |
traces_spanmetrics_calls_total | Counter | Dimensions | Total count of the span |
traces_spanmetrics_size_total | Counter | Dimensions | Total size of spans ingested |
Note
In Tempo 1.4 and 1.4.1, the histogram metric was calledtraces_spanmetrics_duration_seconds
. This was changed later to be consistent with the metrics generated by Grafana Agent and the OpenTelemetry Collector.
By default, the metrics processor adds the following labels to each metric: service
, span_name
, span_kind
, status_code
, status_message
, job
, and instance
.
service
- The name of the service that generated the spanspan_name
- The unique name of the spanspan_kind
- The type of span, this can be one of five values:SPAN_KIND_SERVER
- The span was generated by a call from another serviceSPAN_KIND_CLIENT
- The span made a call to another serviceSPAN_KIND_INTERNAL
- The span does not have interaction outside of the service it was generated inSPAN_KIND_PUBLISHER
- The span created data that was pushed onto a bus or message brokerSPAN_KIND_CONSUMER
- The span consumed data that was on a bus or messaging system
status_code
- The result of the span, this can be one of three values:STATUS_CODE_UNSET
- Result of the span was unset/unknownSTATUS_CODE_OK
- The span operation completed successfullySTATUS_CODE_ERROR
- The span operation completed with an error
status_message
(optionally enabled) - The message that details the reason for thestatus_code
labeljob
- The name of the job, a combination of namespace and service; only added ifmetrics_generator.processor.span_metrics.enable_target_info: true
instance
- The instance ID; only added ifmetrics_generator.processor.span_metrics.enable_target_info: true
Additional user defined labels can be created using the dimensions
configuration option.
When a configured dimension collides with one of the default labels (for example, status_code
), the label for the respective dimension is prefixed with double underscore (for example, __status_code
).
Custom labeling of dimensions is also supported using the dimension_mapping
configuration option.
An optional metric called traces_target_info
using all resource level attributes as dimensions can be enabled in the enable_target_info
configuration option.
If you use a ratio-based sampler, you can use the custom sampler below to not lose metric information. However, you also need to set metrics_generator.processor.span_metrics.span_multiplier_key
to "X-SampleRatio"
.
package tracer
import (
"go.opentelemetry.io/otel/attribute"
tracesdk "go.opentelemetry.io/otel/sdk/trace"
)
type RatioBasedSampler struct {
innerSampler tracesdk.Sampler
sampleRateAttribute attribute.KeyValue
}
func NewRatioBasedSampler(fraction float64) RatioBasedSampler {
innerSampler := tracesdk.TraceIDRatioBased(fraction)
return RatioBasedSampler{
innerSampler: innerSampler,
sampleRateAttribute: attribute.Float64("X-SampleRatio", fraction),
}
}
func (ds RatioBasedSampler) ShouldSample(parameters tracesdk.SamplingParameters) tracesdk.SamplingResult {
sampler := ds.innerSampler
result := sampler.ShouldSample(parameters)
if result.Decision == tracesdk.RecordAndSample {
result.Attributes = append(result.Attributes, ds.sampleRateAttribute)
}
return result
}
func (ds RatioBasedSampler) Description() string {
return "Ratio Based Sampler which gives information about sampling ratio"
}
Filtering
In some cases, you may want to reduce the number of metrics produced by the spanmetrics
processor.
You can configure the processor to use an include
filter to match criteria that must be present in the span in order to be included.
Following the include filter, you can use an exclude
filter to reject portions of what was previously included by the filter policy.
Currently, only filtering by resource and span attributes with the following value types is supported.
bool
double
int
string
Additionally, these intrinsic span attributes may be filtered upon:
name
status
(code)kind
The following intrinsic kinds are available for filtering.
SPAN_KIND_SERVER
SPAN_KIND_INTERNAL
SPAN_KIND_CLIENT
SPAN_KIND_PRODUCER
SPAN_KIND_CONSUMER
Intrinsic keys can be acted on directly when implementing a filter policy. For example:
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: strict
attributes:
- key: kind
value: SPAN_KIND_SERVER
In this example, spans which are of kind
“server” are included for metrics export.
When selecting spans based on non-intrinsic attributes, it is required to specify the scope of the attribute, similar to how it is specified in TraceQL.
For example, if the resource
contains a location
attribute which is to be used in a filter policy, then the reference needs to be specified as resource.location
.
This requires users to know and specify which scope an attribute is to be found and avoids the ambiguity of conflicting values at differing scopes. The following may help illustrate.
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: strict
attributes:
- key: resource.location
value: earth
In the above examples, we are using match_type
of strict
, which is a direct comparison of values.
You can use regex
, an additional option for match_type
, to build a regular expression to match against.
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: regex
attributes:
- key: resource.location
value: eu-.*
- exclude:
match_type: regex
attributes:
- key: resource.tier
value: dev-.*
In the above, we first include all spans which have a resource.location
that begins with eu-
with the include
statement, and then exclude those with begin with dev-
.
In this way, a flexible approach to filtering can be achieved to ensure that only metrics which are important are generated.