Note
Fleet Management is currently in public preview. Grafana Labs offers limited support, and breaking changes might occur prior to the feature being made generally available. For bug reports or questions, fill out our feedback form.
Check the status of your fleet
Grafana Fleet Management provides a health status indicator so you can see at a glance if your collectors are healthy. You can find a collector’s health status in the Status column on the Inventory tab in the Fleet Management interface.
The health status indicator reflects the current state of the collector:
- Green (Healthy) means the collector is healthy.
- Yellow (Warning) means the collector is potentially unhealthy.
- Red (Error) means the collector is unhealthy.
- Gray (Unknown) means the collector is not reporting data.
You can filter your fleet by status by clicking on the Status dropdown on the Inventory tab.
How health status is determined
The health status is controlled by three factors:
- Has the collector made a
GetConfig
API request in the last 30 minutes? - Is the collector reporting an
up
metric with the correctcollector_id
label? - Does the collector have any active alerts?
Note
The health status does not check for configuration errors.
The Fleet Management service fetches active alerts from the Grafana Prometheus instance. Alerts must exist in your stack’s Prometheus Alertmanager to be discoverable by the health status check. Refer to Create new alerts for guidance on labeling alerts and the Alertmanager documentation for tips on using Mimirtool to configure Alertmanager.
Green status
A green health status indicates that the collector is operational. At minimum, an operational collector:
- Has no active alerts.
- Made a
GetConfig
API request in the last 30 minutes or reported anup
metric.
Note
A false-positive healthy status can result if thecollector_id
label in an alert does not match the id argument in theremotecfg
block of the collector. If the labels do not match, the alerts cannot be attributed to the collector. When Fleet Management checks the state of a mismatched collector, the service finds no active alerts.
Yellow status
A yellow health status warns that there might be an issue with the collector. It can have two causes:
- The collector has an active, non-critical alert; or
- The collector has not made a
GetConfig
request in the last 30 minutes and it is not reporting anup
metric.
Note
If your collector is not self-reporting its own metrics with thecollector_id
label, you might see a yellow heartbeat even if the collector is healthy. Collectors in your Fleet Management Inventory should automatically be assigned theself_monitoring_metrics
pipeline. If you see a yellow health status, make sure the pipeline is active.
Red status
A red health status indicates that the collector has an active, critical alert.
Gray status
A gray health status indicates that the collector has not reported telemetry. It has never had a heartbeat, doesn’t have an up
metric within the data retention period, and has no active alerts.
Next steps
If your collector is unhealthy or potentially unhealthy, refer to Troubleshoot an unhealthy collector for help diagnosing the problem.