About Asserts
Asserts is a next-generation technology that provides valuable insights into your distributed, multi-cloud, hybrid applications. By using Asserts, your team can eliminate the frustration of using disjointed dashboards that fail to keep up with frequent updates. Your engineers no longer need to spend time deciphering visualizations to find crucial information.
Your on-call team will no longer be overwhelmed by irrelevant alerts that are difficult to manage, noisy, and become quickly outdated.
Discover a living map of application and infrastructure components
Asserts collects information from your telemetry data sources and uses it to create a visual representation of your application and infrastructure components. It then organizes and indexes this representation, making it easy to search for specific information to determine how the components fit together in real-time.
The following Asserts Entity Graph shows the relationships between and among application and infrastructure components.
Asserts curates knowledge of common runtime failure patterns and potential causes, so your team doesn’t have to research and maintain these rules.
- Asserts continuously tracks resource Saturation, Amends (for example, deployments and scale events), request, resource, and latency Anomalies, systemic Failures, and Errors on your golden signals and health metrics.
- The entity graph annotates occurrences of these assertions making it easy for you to understand and use them.
Explore with unified search
With unified search, you can combine components, relationships, configurations, and associated assertions to express your intent in a clear and simple natural language expression.
For example, this advanced query returns all Pods with assertions and their connected Nodes and all services and their connected Pods where the service name contains mysql
.
Furthermore, you can use the search expression in the RCA workbench, which enables you to instantly view all the assertions correlated across time and space. This gives you quick access to the relevant data you need.
Curated rules detect service unavailability and potential causes
Asserts actively manages and organizes information on common runtime failure patterns and their potential causes. This means your team doesn’t have to spend time researching and maintaining complex PromQL recording and alerting rules specifically for different frameworks.
Asserts continuously tracks resource:
- Saturation - Asserts monitors software objects like client connections that come with built-in limits. When their usage is close to their limits, a saturation assertion occurs.
- Amends - Asserts captures changes to your environment. Example amend assertions include container deployments, configuration updates, and HPA scale events.
- Anomalies - Asserts detects pattern changes related to traffic. Example anomaly assertions include request rate, error rate, and latency.
- Failures - Asserts detects significant or complete application degradation. Example failure assertions include Pod crash looping and CronJob failures.
- Errors - Asserts monitors erroneous events in the system about how the software handles real-world traffic. Example error assertions include 5XX/4XX status codes and a latency threshold breach on your golden signals and health metrics.
The entity graph annotates occurrences of these assertions making it easy for you to understand and use them. For more information about the SAAFE model, refer to About the SAAFE model.
Reduce mean time to resolution
Because Asserts is always checking for assertions, you don’t have to wait for SLOs to breach and alerts to fire before knowing you should act. You can identify issues quickly using the Asserts Top insights dashboards. Top insights presents a stack-ranked view of services and nodes that need attention based on their severity score. You can then quickly navigate to the RCA Workbench to perform root cause analysis. For more information about Top insights, refer to Identify entities for analysis.
Perform root cause analysis in RCA workbench
In RCA workbench, you can explore all potential causes for a particular issue correlated over time and dependency. You also have access to the relevant metrics, logs, and traces.
In the following example, a deployment amend on the shipping
service triggered a spike in error rate on the service and a p99 latency spike on the /cities/{code}
endpoint.
You can navigate to Dashboard
or Logs
to see contextual logs in your Loki log store.
For more information about using RCA workbench, refer to Perform root cause analysis in workbench.