Velero integration for Grafana Cloud

Velero is an open-source tool that helps backup and migrate Kubernetes cluster resources and persistent volumes. It allows you to create backups of your Kubernetes objects and restore them in case of disasters or when moving to a different environment. Velero provides a simple and reliable way to protect your Kubernetes applications and data, ensuring continuity and portability across various platforms.

This integration supports Velero 1.13+ and Kubernetes 1.16+

This integration includes 4 useful alerts and 3 pre-built dashboards to help monitor and visualize Velero metrics and logs.

Before you begin

1. Check pre-requisites specific to the Velero integration

Metrics

Velero exposes a Prometheus metrics endpoint, /metrics, on Velero containers by default.

You can verify that this endpoint is enabled by running the following commands:

kubectl port-forward -n <namespace> <name-of-velero-pod> 8085:8085 &
curl http://localhost:8085/metrics

Logs

By default, Velero sends logs to stdout.

You can verify this by running the following command:

kubectl logs -n <namespace> <name-of-velero-pod>

2. Configuration & Installation

Kubernetes Monitoring Helm chart configuration

To use this integration, modify your Kubernetes Monitoring Helm chart deployment with these configuration snippets. Metrics snippet

Copy the following and add to the .extraConfig value of the Kubernetes Monitoring Helm chart.

discovery.relabel "velero" {
  targets = discovery.kubernetes.pods.targets
  rule {
    action        = "keep"
    source_labels = ["__meta_kubernetes_pod_label_component"]
    regex         = "velero"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_container_port_number"]
    regex = "8085"
    action = "keep"
 }
  rule {
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label = "instance"
  }
}

prometheus.scrape "velero" {
  job_name     = "integrations/velero"
  targets      = discovery.relabel.velero.output
  forward_to   = [prometheus.relabel.metrics_service.receiver]
}

Logs snippet

Copy the following and add to the .logs.extraConfig value of the Kubernetes Monitoring Helm chart.

discovery.relabel "logs_velero" {
  targets = discovery.relabel.pod_logs.output

  rule {
    action        = "keep"
    source_labels = ["__meta_kubernetes_pod_label_component"]
    regex         = "velero"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }
}

loki.source.kubernetes "logs_velero" {
  targets    = discovery.relabel.logs_velero.output
  forward_to = [loki.process.logs_velero.receiver]
 }

loki.process "logs_velero" {
  forward_to = [loki.process.logs_service.receiver]
  stage.cri {}
  stage.multiline {
    firstline = "time=\"(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z)\""
  }
  stage.regex {
    expression = "time=\"(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z)\" level=(?P<level>\\w+)"
  }
  stage.labels {
    values = {
      level  = "",
    }
  }
}

For more information about how to set values for .extraConfig or .logs.extraConfig, see the (Helm chart documentation)[https://github.com/grafana/k8s-monitoring-helm/blob/main/charts/k8s-monitoring/docs/UsingExtraConfig.md].

Dashboards

The Velero integration installs the following dashboards in your Grafana Cloud instance to help monitor your system.

Velero cluster view
Velero logs
Velero overview

Velero overview (Backups)

Velero overview (snapshots)

Velero cluster view

Alerts

The Velero integration includes the following useful alerts:

Alert	Description
VeleroBackupFailure	Critical: Velero backup failures detected.
VeleroHighBackupDuration	Warning: Velero backups taking longer than usual.
VeleroHighRestoreFailureRate	Critical: Velero restore failures detected.
VeleroUpStatus	Critical: Velero is down.

Metrics

The most important metrics provided by the Velero integration, which are used on the pre-built dashboards and Prometheus alerts, are as follows:

up
velero_backup_attempt_total
velero_backup_duration_seconds_bucket
velero_backup_failure_total
velero_backup_success_total
velero_backup_tarball_size_bytes
velero_backup_validation_failure_total
velero_csi_snapshot_attempt_total
velero_csi_snapshot_success_total
velero_restore_attempt_total
velero_restore_failed_total
velero_restore_success_total
velero_restore_validation_failed_total
velero_volume_snapshot_attempt_total
velero_volume_snapshot_failure_total
velero_volume_snapshot_success_total

Changelog

# 1.0.1 - November 2024

- Update status panel check queries

# 1.0.0 - April 2024

- Initial release

Cost

By connecting your Velero instance to Grafana Cloud, you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.