Menu
Grafana Cloud

Alert insights and metrics

Grafana IRM provides detailed metrics and logs to help you monitor your alert handling performance and analyze trends. These insights enable you to identify bottlenecks, measure response effectiveness, and continuously improve your alerting processes.

About alert metrics

Alert metrics in Grafana IRM track key performance indicators related to alert handling, including:

  • Alert volume across integrations
  • Response times for alert acknowledgment
  • Notification patterns
  • Team and user metrics

These metrics are exposed in Prometheus format, making them easy to query and visualize in Grafana dashboards.

Available metrics

Grafana IRM provides the following core metrics:

MetricTypeDescription
alert_groups_totalGaugeTotal count of alert groups for each integration by state (firing, acknowledged, resolved, silenced)
alert_groups_response_timeHistogramMean time between alert start and first action over the last 7 days
user_was_notified_of_alert_groups_totalCounterTotal count of alert groups users were notified of

Access metrics

For Grafana Cloud customers

Alert metrics are automatically collected in the preinstalled grafanacloud-usage data source and have the prefix grafanacloud_oncall_instance, for example:

  • grafanacloud_oncall_instance_alert_groups_total
  • grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket
  • grafanacloud_oncall_instance_user_was_notified_of_alert_groups_total

Metric details and examples

Alert groups total

This metric tracks the count of alerts in different states with the following labels:

LabelDescription
idID of Grafana instance (stack)
slugSlug of Grafana instance (stack)
org_idID of Grafana organization
teamTeam name
integrationIntegration name
service_nameValue of alert group service_name label
stateAlert groups state (firing, acknowledged, resolved, silenced)

Example query:

Get the number of alerts in “firing” state for “Grafana Alerting” integration:

promql
grafanacloud_oncall_instance_alert_groups_total{integration="Grafana Alerting", state="firing"}

Alert groups response time

This metric tracks response times with the following labels:

LabelDescription
idID of Grafana instance (stack)
slugSlug of Grafana instance (stack)
org_idID of Grafana organization
teamTeam name
integrationIntegration name
service_nameValue of alert group service_name label
leHistogram bucket value in seconds (60, 300, 600, 3600, +Inf)

Example query:

Get the number of alerts with response time less than 10 minutes (600 seconds):

promql
grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket{integration="Grafana Alerting", le="600"}

User notification metrics

This metric tracks how many alerts each user was notified about:

LabelDescription
idID of Grafana instance (stack)
slugSlug of Grafana instance (stack)
org_idID of Grafana organization
usernameUser’s username

Example query:

Get the number of alerts a specific user was notified of:

promql
grafanacloud_oncall_instance_user_was_notified_of_alert_groups_total{username="alex"}

Alert metrics dashboard

A pre-built “OnCall Insights” dashboard is available to visualize key alert metrics. To access it:

  1. Navigate to your dashboards list in the folder General
  2. Find the dashboard with the tag oncall
  3. Select your Prometheus data source (for Cloud customers, use grafanacloud_usage)
  4. Filter data by Grafana instances, teams, and integrations

To re-import the dashboard:

  1. Go to Administration > Plugins
  2. Find OnCall in the plugins list
  3. Open the Dashboards tab
  4. Click “Re-import” next to “OnCall Metrics”

Note: Re-importing or updating the plugin will reset any customizations. To preserve changes, save a copy of the dashboard using “Save As” in dashboard settings.

You can also view insights directly in Grafana IRM by clicking Insights in the navigation menu.

Alert insight logs

Alert insight logs provide an audit trail of configuration changes and system events in your IRM environment. These logs are automatically configured in Grafana Cloud with the Usage Insights Loki data source.

Access insight logs

To retrieve all logs related to your IRM instance:

logql
{instance_type="oncall"} | logfmt | __error__=``

Types of insight logs

IRM captures three primary types of insight logs:

Resource logs

Track changes to resources (integrations, escalation chains, schedules, etc.):

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource`

Resource logs include the following key fields:

FieldDescription
action_nameType of action (created, updated, deleted)
action_typeAlways resource for resource logs
authorUsername who performed the action
resource_idID of the modified resource
resource_nameName of the modified resource
resource_typeType of resource (integration, escalation chain, etc.)
teamTeam the resource belongs to
prev_stateJSON representation of resource before update
new_stateJSON representation of resource after update

Maintenance logs

Track when maintenance mode is started or finished:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance`

Maintenance logs include:

FieldDescription
action_nameMaintenance action (started or finished)
action_typeAlways maintenance for maintenance logs
maintenance_modeType of maintenance (maintenance or debug)
resource_idID of the integration under maintenance
resource_nameName of the integration under maintenance
teamTeam the integration belongs to
authorUsername who performed the action

ChatOps logs

Track configuration changes to chat integrations:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops`

ChatOps logs include:

FieldDescription
action_nameType of chatops action
action_typeAlways chat_ops for chatops logs
authorUsername who performed the action
chat_ops_typeType of integration (telegram, slack, msteams, mobile_app)
channel_nameName of the linked channel
linked_userUsername linked to the chatops integration

Example log queries

Here are some practical log queries to analyze your alert handling configuration:

Actions by specific user:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and author="username"

Changes to schedules:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and (resource_type=`web_schedule` or resource_type=`calendar_schedule` or resource_type=`ical_schedule`)

Changes to escalation policies:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and resource_type=`escalation_policy` and escalation_chain_id=`CHAIN_ID`

Maintenance events for an integration:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance` and resource_id=`INTEGRATION_ID`

Slack chatops configuration changes:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops` and chat_ops_type=`slack`