Menu

Caution

Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.

Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Experimental

otelcol.connector.servicegraph

EXPERIMENTAL: This is an experimental component. Experimental components are subject to frequent breaking changes, and may be removed with no equivalent replacement.

otelcol.connector.servicegraph accepts span data from other otelcol components and outputs metrics representing the relationship between various services in a system. A metric represents an edge in the service graph. Those metrics can then be used by a data visualization application (e.g. Grafana) to draw the service graph.

NOTE: otelcol.connector.servicegraph is a wrapper over the upstream OpenTelemetry Collector servicegraph connector. Bug reports or feature requests will be redirected to the upstream repository, if necessary.

Multiple otelcol.connector.servicegraph components can be specified by giving them different labels.

This component is based on Grafana Tempo’s service graph processor.

Service graphs are useful for a number of use-cases:

  • Infer the topology of a distributed system. As distributed systems grow, they become more complex. Service graphs can help you understand the structure of the system.
  • Provide a high level overview of the health of your system. Service graphs show error rates, latencies, and other relevant data.
  • Provide a historic view of a system’s topology. Distributed systems change very frequently, and service graphs offer a way of seeing how these systems have evolved over time.

Since otelcol.connector.servicegraph has to process both sides of an edge, it needs to process all spans of a trace to function properly. If spans of a trace are spread out over multiple Agent instances, spans cannot be paired reliably. A solution to this problem is using otelcol.exporter.loadbalancing in front of Agent instances running otelcol.connector.servicegraph.

Usage

river
otelcol.connector.servicegraph "LABEL" {
  output {
    metrics = [...]
  }
}

Arguments

otelcol.connector.servicegraph supports the following arguments:

NameTypeDescriptionDefaultRequired
latency_histogram_bucketslist(duration)Buckets for latency histogram metrics.["2ms", "4ms", "6ms", "8ms", "10ms", "50ms", "100ms", "200ms", "400ms", "800ms", "1s", "1400ms", "2s", "5s", "10s", "15s"]no
dimensionslist(string)A list of dimensions to add with the default dimensions.[]no
cache_loopdurationConfigures how often to delete series which have not been updated."1m"no
store_expiration_loopdurationThe time to expire old entries from the store periodically."2s"no

Service graphs work by inspecting traces and looking for spans with parent-children relationship that represent a request. otelcol.connector.servicegraph uses OpenTelemetry semantic conventions to detect a myriad of requests. The following requests are currently supported:

  • A direct request between two services, where the outgoing and the incoming span must have a Span Kind value of client and server respectively.
  • A request across a messaging system, where the outgoing and the incoming span must have a Span Kind value of producer and consumer respectively.
  • A database request, where spans have a Span Kind with a value of client, as well as an attribute with a key of db.name.

Every span which can be paired up to form a request is kept in an in-memory store:

  • If the TTL of the span expires before it can be paired, it is deleted from the store. TTL is configured in the store block.
  • If the span is paired prior to its expiration, a metric is recorded and the span is deleted from the store.

The following metrics are emitted by the processor:

MetricTypeLabelsDescription
traces_service_graph_request_totalCounterclient, server, connection_typeTotal count of requests between two nodes
traces_service_graph_request_failed_totalCounterclient, server, connection_typeTotal count of failed requests between two nodes
traces_service_graph_request_server_secondsHistogramclient, server, connection_typeTime for a request between two nodes as seen from the server
traces_service_graph_request_client_secondsHistogramclient, server, connection_typeTime for a request between two nodes as seen from the client
traces_service_graph_unpaired_spans_totalCounterclient, server, connection_typeTotal count of unpaired spans
traces_service_graph_dropped_spans_totalCounterclient, server, connection_typeTotal count of dropped spans

Duration is measured both from the client and the server sides.

The latency_histogram_buckets argument controls the buckets for traces_service_graph_request_server_seconds and traces_service_graph_request_client_seconds.

Each emitted metrics series have a client and a server label corresponding with the service doing the request and the service receiving the request. The value of the label is derived from the service.name resource attribute of the two spans.

The connection_type label may not be set. If it is set, its value will be either messaging_system or database.

Additional labels can be included using the dimensions configuration option:

  • Those labels will have a prefix to mark where they originate (client or server span kinds). The client_ prefix relates to the dimensions coming from spans with a Span Kind of client. The server_ prefix relates to the dimensions coming from spans with a Span Kind of server.
  • Firstly the resource attributes will be searched. If the attribute is not found, the span attributes will be searched.

Blocks

The following blocks are supported inside the definition of otelcol.connector.servicegraph:

HierarchyBlockDescriptionRequired
storestoreConfigures the in-memory store for spans.no
outputoutputConfigures where to send telemetry data.yes

store block

The store block configures the in-memory store for spans.

NameTypeDescriptionDefaultRequired
max_itemsnumberMaximum number of items to keep in the store.1000no
ttldurationThe time to live for spans in the store."2ms"no

output block

The output block configures a set of components to forward resulting telemetry data to.

The following arguments are supported:

NameTypeDescriptionDefaultRequired
metricslist(otelcol.Consumer)List of consumers to send metrics to.[]no

You must specify the output block, but all its arguments are optional. By default, telemetry data is dropped. Configure the metrics argument accordingly to send telemetry data to other components.

Exported fields

The following fields are exported and can be referenced by other components:

NameTypeDescription
inputotelcol.ConsumerA value that other components can use to send telemetry data to.

input accepts otelcol.Consumer traces telemetry data. It does not accept metrics and logs.

Component health

otelcol.connector.servicegraph is only reported as unhealthy if given an invalid configuration.

Debug information

otelcol.connector.servicegraph does not expose any component-specific debug information.

Example

The example below accepts traces, creates service graph metrics from them, and writes the metrics to Mimir. The traces are written to Tempo.

otelcol.connector.servicegraph also adds a label to each metric with the value of the “http.method” span/resource attribute.

river
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4320"
  }
  
  output {
    traces  = [otelcol.connector.servicegraph.default.input,otelcol.exporter.otlp.grafana_cloud_tempo.input]
  }
}

otelcol.connector.servicegraph "default" {
  dimensions = ["http.method"]
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
  }
}

otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.mimir.receiver]
}

prometheus.remote_write "mimir" {
  endpoint {
    url = "https://prometheus-xxx.grafana.net/api/prom/push"
    
    basic_auth {
      username = env("PROMETHEUS_USERNAME")
      password = env("GRAFANA_CLOUD_API_KEY")
    }
  }
}

otelcol.exporter.otlp "grafana_cloud_tempo" {
  client {
    endpoint = "https://tempo-xxx.grafana.net/tempo"
    auth     = otelcol.auth.basic.grafana_cloud_tempo.handler
  }
}

otelcol.auth.basic "grafana_cloud_tempo" {
  username = env("TEMPO_USERNAME")
  password = env("GRAFANA_CLOUD_API_KEY")
}

Some of the metrics in Mimir may look like this:

traces_service_graph_request_total{client="shop-backend",failed="false",server="article-service",client_http_method="DELETE",server_http_method="DELETE"}
traces_service_graph_request_failed_total{client="shop-backend",client_http_method="POST",failed="false",server="auth-service",server_http_method="POST"}