Multidimensional SLO dashboards for Advanced SLOs
The SLO App generates dashboards to help user pinpoint where they are burning their error budget, in which clusters, for example, by supporting multidimensional SLOs (SLOs that preserve one or more dimensions / label-values). The dashboards that help identify in which dimension the SLI is underperforming were previously only available for ratio-type SLOs or SLO expressions that were fairly simple.
ex:
sum by (cluster) (rate(http_requests_total{code!~"5.."}[$__rate_interval]))
/ sum by (cluster) (rate(http_requests_total[$__rate_interval]))
But with new changes in the parsing logic in the SLO app, we now support these dashboard features for complex SLOs, as long as they are ratios and have the same “group by” dimensions aggregating up in both the numerator and the denominator.
ex:
(sum by (cluster, namespace) (rate(request_duration_seconds_bucket{status_code!~"5..", le="1.0", route="opentelemetry_proto_collector_trace_v1"}[$__rate_interval]))
-
(
sum by (cluster, namespace) (rate(envoy_cluster_grpc_proto_collector_trace_v1_TraceService_1[$__rate_interval]))
or
sum by (cluster, namespace) (build_info) * 0
))
/ (sum by (cluster, namespace) (rate(request_duration_seconds_count{route="opentelemetry_proto_collector_trace_v1"}[$__rate_interval])))