Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Service graph view
Grafana’s service graph view utilizes metrics generated by the metrics-generator (or Grafana Agent) to display span request rates, error rates, and durations, as well as service graphs. Once the requirements are set up, this pre-configured view is immediately available.
Using the service graph view, you can:
- Discover spans which are consistently erroring and the rates at which they occur
- Get an overview of the overall rate of span calls throughout your services
- Determine how long the slowest queries in your service take to complete
- Examine all traces that contain spans of particular interest based on rate, error and duration values (RED signals)
Requirements
You have to enable span metrics and service graph generation on the Grafana backend so metrics that are generated as traces are ingested.
To use the service graph view, you need:
- Tempo or Grafana Cloud Traces with either 1) the metrics generator enabled and configured or 2) Grafana Agent or Grafana Alloy enabled and configured to send data to a Prometheus-compatible metrics store
- Services graphs, which are enabled by default in Grafana
- Span metrics enabled in your Tempo data source configuration
The service graph view can be derived from metrics generated by either the metrics-generator or by Grafana Agent or Grafana Alloy.
For information on how to configure these features, refer to the Grafana Tempo data sources documentation.
What does the service graph view show?
Using this view, you can see the top five spans with a type of server (listed in the Name
column).
You can refine any of this data using the filters.
Selecting any of the data points lets you see more specific data.
The service graph view provides a span metrics visualization (table, screen section 2) and service graph (screen section 3). In addition, you can use the filters (screen section 1) to customize the data displayed.
Any information in the table that has an underline can be selected to show more detailed information.
You can also select any node in the service graph to display additional information.
In the dashboard shown below, the Ingester.QueryStream
span has a request rate of 144220.22
requests per second.
The /cortex.Ingester/Query
span has the highest request rate.
Error rate example
Let’s say we want to learn more about why cortex.Ingester
has the highest error rates.
Selecting the second row of the Error rate column displays details about the span metrics in a new window on the right side.
The metrics query used to generate the data appears in the Metrics browser field.
Span metrics table
The span metrics, shown in the table, are generated by the metrics-generator or the Grafana Agent. These metrics are created from ingested tracing data, including RED metrics.
Span metrics generate two metrics:
- A counter that computes requests
- A histogram that tracks the distribution of durations of all requests
For information about span metrics and how they are calculated, refer to the Span metrics documentation.
Table contents
The span metrics table contains seven columns with five column headings. Selecting a heading sorts the data by ascending or descending values.
Column | Explanation | PromQL query for span |
---|---|---|
Name | Use the span name. OTel semantic conventions generally expect the span name to be some kind of low cardinality indicator of the http route or database function being performed. | N/A |
Rate | LCD gauge (horizontal bar graph). Instances per second of the span. Clicking this field can jump to the appropriate metrics. | sum(rate( traces_spanmetrics_calls_total{ span_name="", <filters> }[$__range])) |
Error Rate | Number and LCD gauge (horizontal bar graph). Clicking this field shows more detailed metrics. | sum(rate( traces_spanmetrics_calls_total{ span_name="", span_status="STATUS_CODE_ERROR", <filters> }[$__range])) |
Duration | p90 duration: 90% of all occurrences of this span complete within this time. Clicking this field shows the appropriate metrics. | histogram_quantile(.9, sum(rate( traces_spanmetrics_duration_seconds_bucket{ span_name="", span_status="STATUS_CODE_ERROR", <filters> }[$__range]) by (le)) |
Links | Provide links to example traces given the span name and other applied filters. Link to a search for all spans with the same name from the same Tempo data source. | N/A |
Service graphs
A service graph (node graph) is a visual representation of the interrelationships between various services. Service graphs help to understand the structure of a distributed system, and the connections and dependencies between its components.
Service graphs infer the topology of a distributed system, provide a high level overview of the health of your system, and a historic view of a system’s topology. Service graphs show error rates and latencies, among other relevant data. The service graph layout can be the default or grid.
The grid layout changes the service graph to a series of rows and columns.
If you are using the metrics-generator, then it processes traces and generates service graphs in the form of time series metrics like:
traces_service_graph_request_total{client="app", server="db"} 20
For information about service graphs and how they are calculated, refer to the Service Graphs documentation.
Use filters to reveal details
The service graph view uses service graphs and span metrics to provide a gateway to your tracing information. This dashboard is derived from a fixed set of metrics queries. These underlying queries can not be changed. However, you can choose which traces are included in the metrics query by filtering.
You can explore data by clicking on selectable items or by using filters.
Selecting items or nodes for more detail
Clicking on selectable items, such as underlined text in the table or nodes on the service graph, lets you reveal specific details based upon your selection.
In the table, you can select items in the Rate, Error Rate, Duration (p90), and Links columns. Choosing one of these items provides details about the span metrics.
You can view request rate, request histogram, failed request rate, and traces for any node in the service graph. To view more information, select the node in the service graph and then choose an option from the popup. For details on navigating the service graph, refer to the Node graph panel documentation.
Filter with metric queries
Using the filters at the top of the screen, you can narrow the data set based upon span attributes (key-value pairs or labels). The filters build a query to refine what is shown in the service graph and span metrics. You can add one or more label filters.
To use the filters:
At the top of the Service Graph, select the text box after Filter to display a list of available labels. In this case, server is selected.
Select or search for a value for the label. In this case, the value of server is equal to tempo-ingester. The default operator is equals (=).
Optional: Change the operator by selecting = and choosing a new option from the drop-down.
Optional: Add additional key-value pairs to refine the data set. Any subsequent label filters use AND, which requires both key-value pairs to be presents for matches.
Select Run query.
Filters can be removed by selecting the filter drop-down and choosing – remove filter –.
In the example below, each field or label represents a key-value pair. Number 1 selects a service as the label whose value is Go-http-client
(2). The second key-value pair has a client as a label whose value is 02e807
.
If your metrics queries are too specific, they may not return any results.
Updating the filter to be less specific returns a result. In this case, the results show only span metrics data associated with the span_name
label with a value of /base.Ruler/Rules
. No service graph data was available.