Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
otelcol.connector.servicegraph
EXPERIMENTAL: This is an experimental component. Experimental components are subject to frequent breaking changes, and may be removed with no equivalent replacement.
otelcol.connector.servicegraph
accepts span data from other otelcol
components and
outputs metrics representing the relationship between various services in a system.
A metric represents an edge in the service graph.
Those metrics can then be used by a data visualization application (e.g.
Grafana)
to draw the service graph.
NOTE:
otelcol.connector.servicegraph
is a wrapper over the upstream OpenTelemetry Collectorservicegraph
connector. Bug reports or feature requests will be redirected to the upstream repository, if necessary.
Multiple otelcol.connector.servicegraph
components can be specified by giving them
different labels.
This component is based on Grafana Tempo’s service graph processor.
Service graphs are useful for a number of use-cases:
- Infer the topology of a distributed system. As distributed systems grow, they become more complex. Service graphs can help you understand the structure of the system.
- Provide a high level overview of the health of your system. Service graphs show error rates, latencies, and other relevant data.
- Provide a historic view of a system’s topology. Distributed systems change very frequently, and service graphs offer a way of seeing how these systems have evolved over time.
Since otelcol.connector.servicegraph
has to process both sides of an edge,
it needs to process all spans of a trace to function properly.
If spans of a trace are spread out over multiple Agent instances, spans cannot be paired reliably.
A solution to this problem is using otelcol.exporter.loadbalancing
in front of Agent instances running otelcol.connector.servicegraph
.
Usage
otelcol.connector.servicegraph "LABEL" {
output {
metrics = [...]
}
}
Arguments
otelcol.connector.servicegraph
supports the following arguments:
Name | Type | Description | Default | Required |
---|---|---|---|---|
latency_histogram_buckets | list(duration) | Buckets for latency histogram metrics. | ["2ms", "4ms", "6ms", "8ms", "10ms", "50ms", "100ms", "200ms", "400ms", "800ms", "1s", "1400ms", "2s", "5s", "10s", "15s"] | no |
dimensions | list(string) | A list of dimensions to add with the default dimensions. | [] | no |
cache_loop | duration | Configures how often to delete series which have not been updated. | "1m" | no |
store_expiration_loop | duration | The time to expire old entries from the store periodically. | "2s" | no |
Service graphs work by inspecting traces and looking for spans with
parent-children relationship that represent a request.
otelcol.connector.servicegraph
uses OpenTelemetry semantic conventions
to detect a myriad of requests.
The following requests are currently supported:
- A direct request between two services, where the outgoing and the incoming span
must have a Span Kind value of
client
andserver
respectively. - A request across a messaging system, where the outgoing and the incoming span
must have a Span Kind value of
producer
andconsumer
respectively. - A database request, where spans have a Span Kind with a value of
client
, as well as an attribute with a key ofdb.name
.
Every span which can be paired up to form a request is kept in an in-memory store:
- If the TTL of the span expires before it can be paired, it is deleted from the store. TTL is configured in the store block.
- If the span is paired prior to its expiration, a metric is recorded and the span is deleted from the store.
The following metrics are emitted by the processor:
Metric | Type | Labels | Description |
---|---|---|---|
traces_service_graph_request_total | Counter | client, server, connection_type | Total count of requests between two nodes |
traces_service_graph_request_failed_total | Counter | client, server, connection_type | Total count of failed requests between two nodes |
traces_service_graph_request_server_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the server |
traces_service_graph_request_client_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the client |
traces_service_graph_unpaired_spans_total | Counter | client, server, connection_type | Total count of unpaired spans |
traces_service_graph_dropped_spans_total | Counter | client, server, connection_type | Total count of dropped spans |
Duration is measured both from the client and the server sides.
The latency_histogram_buckets
argument controls the buckets for
traces_service_graph_request_server_seconds
and traces_service_graph_request_client_seconds
.
Each emitted metrics series have a client
and a server
label corresponding with the
service doing the request and the service receiving the request. The value of the label
is derived from the service.name
resource attribute of the two spans.
The connection_type
label may not be set. If it is set, its value will be either messaging_system
or database
.
Additional labels can be included using the dimensions
configuration option:
- Those labels will have a prefix to mark where they originate (client or server span kinds).
The
client_
prefix relates to the dimensions coming from spans with a Span Kind ofclient
. Theserver_
prefix relates to the dimensions coming from spans with a Span Kind ofserver
. - Firstly the resource attributes will be searched. If the attribute is not found, the span attributes will be searched.
Blocks
The following blocks are supported inside the definition of
otelcol.connector.servicegraph
:
Hierarchy | Block | Description | Required |
---|---|---|---|
store | store | Configures the in-memory store for spans. | no |
output | output | Configures where to send telemetry data. | yes |
store block
The store
block configures the in-memory store for spans.
Name | Type | Description | Default | Required |
---|---|---|---|---|
max_items | number | Maximum number of items to keep in the store. | 1000 | no |
ttl | duration | The time to live for spans in the store. | "2ms" | no |
output block
The output
block configures a set of components to forward resulting telemetry data to.
The following arguments are supported:
Name | Type | Description | Default | Required |
---|---|---|---|---|
metrics | list(otelcol.Consumer) | List of consumers to send metrics to. | [] | no |
You must specify the output
block, but all its arguments are optional.
By default, telemetry data is dropped.
Configure the metrics
argument accordingly to send telemetry data to other components.
Exported fields
The following fields are exported and can be referenced by other components:
Name | Type | Description |
---|---|---|
input | otelcol.Consumer | A value that other components can use to send telemetry data to. |
input
accepts otelcol.Consumer
traces telemetry data. It does not accept metrics and logs.
Component health
otelcol.connector.servicegraph
is only reported as unhealthy if given an invalid
configuration.
Debug information
otelcol.connector.servicegraph
does not expose any component-specific debug
information.
Example
The example below accepts traces, creates service graph metrics from them, and writes the metrics to Mimir. The traces are written to Tempo.
otelcol.connector.servicegraph
also adds a label to each metric with the value of the “http.method”
span/resource attribute.
otelcol.receiver.otlp "default" {
grpc {
endpoint = "0.0.0.0:4320"
}
output {
traces = [otelcol.connector.servicegraph.default.input,otelcol.exporter.otlp.grafana_cloud_tempo.input]
}
}
otelcol.connector.servicegraph "default" {
dimensions = ["http.method"]
output {
metrics = [otelcol.exporter.prometheus.default.input]
}
}
otelcol.exporter.prometheus "default" {
forward_to = [prometheus.remote_write.mimir.receiver]
}
prometheus.remote_write "mimir" {
endpoint {
url = "https://prometheus-xxx.grafana.net/api/prom/push"
basic_auth {
username = env("PROMETHEUS_USERNAME")
password = env("GRAFANA_CLOUD_API_KEY")
}
}
}
otelcol.exporter.otlp "grafana_cloud_tempo" {
client {
endpoint = "https://tempo-xxx.grafana.net/tempo"
auth = otelcol.auth.basic.grafana_cloud_tempo.handler
}
}
otelcol.auth.basic "grafana_cloud_tempo" {
username = env("TEMPO_USERNAME")
password = env("GRAFANA_CLOUD_API_KEY")
}
Some of the metrics in Mimir may look like this:
traces_service_graph_request_total{client="shop-backend",failed="false",server="article-service",client_http_method="DELETE",server_http_method="DELETE"}
traces_service_graph_request_failed_total{client="shop-backend",client_http_method="POST",failed="false",server="auth-service",server_http_method="POST"}