Menu

Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Open source

Configure Grafana Mimir zone-aware replication

Zone-aware replication is the replication of data across failure domains. Zone-aware replication helps to avoid data loss during a domain outage. Grafana Mimir defines failure domains as zones, which includes, but are not limited to:

  • Availability zones
  • Data centers
  • Racks

Without zone-aware replication enabled, Grafana Mimir replicates data randomly across all component replicas, regardless of whether the replicas are running within the same zone. Even with a Grafana Mimir cluster deployed across multiple zones, the replicas for any given data could reside in the same zone. If an outage affects a zone containing multiple replicas, data loss might occur.

With zone-aware replication enabled, Grafana Mimir ensures data replication to replicas across different zones.

Warning: Ensure that you configure deployment tooling so that it is also zone-aware. The deployment tooling is responsible for executing rolling updates. Rolling updates should only update replicas in a single zone at any given time.

Grafana Mimir supports zone-aware replication for the following:

Configuring Alertmanager alerts replication

Zone-aware replication in the Alertmanager ensures that Grafana Mimir replicates alerts across -alertmanager.sharding-ring.replication-factor Alertmanager replicas, with one replica located in each zone.

To enable zone-aware replication for alerts:

  1. Configure the zone of each Alertmanager replica via the -alertmanager.sharding-ring.instance-availability-zone CLI flag or its respective YAML configuration parameter.
  2. Roll out Alertmanagers so that each Alertmanager replica runs with a configured zone.
  3. Set the -alertmanager.sharding-ring.zone-awareness-enabled=true CLI flag or its respective YAML configuration parameter for Alertmanagers.

Configuring ingester time series replication

Zone-aware replication in the ingester ensures that Grafana Mimir replicates each time series to -ingester.ring.replication-factor ingester replicas, with one replica located in each zone.

To enable zone-aware replication for time series:

  1. Configure the zone of each ingester replica via the -ingester.ring.instance-availability-zone CLI flag or its respective YAML configuration parameter.
  2. Roll out ingesters so that each ingester replica runs with a configured zone.
  3. Set the -ingester.ring.zone-awareness-enabled=true CLI flag or its respective YAML configuration parameter for distributors, ingesters, and queriers.

Note: The requests that the distributors receive are usually compressed, and the requests that the distributors send to the ingesters are uncompressed by default. This can result in increased cross-zone bandwidth costs (because at least two ingesters will be in different availability zones). If this cost is a concern, you can compress those requests by setting the -ingester.client.grpc-compression CLI flag, or its respective YAML configuration parameter, to snappy or gzip in the distributors.

Configuring store-gateway blocks replication

To enable zone-aware replication for the store-gateways, refer to Zone awareness.

Minimum number of zones

To ensure zone-aware replication, deploy Grafana Mimir across a number of zones equal-to or greater-than the configured replication factor. With a replication factor of 3, which is the default, deploy the Grafana Mimir cluster across at least three zones. Deploying Grafana Mimir clusters to more zones than the configured replication factor does not have a negative impact. Deploying Grafana Mimir clusters to fewer zones than the configured replication factor can cause writes to the replica to be missed, or can cause writes to fail completely.

If there are no more than floor(replication factor / 2) zones with failing replicas, reads and writes can withstand zone failures.

Unbalanced zones

To ensure that the workload across zones is balanced, run the same number of replicas of each component in each zone. When replica counts are unbalanced, zones with fewer replicas have higher resource utilization than those with more replicas.

Costs

Most cloud providers charge for inter-availability zone networking. Deploying Grafana Mimir with zone-aware replication across multiple cloud provider availability zones likely results in additional networking costs.

Kubernetes operator for simplifying rollouts of zone-aware components

The Kubernetes Rollout Operator is a Kubernetes operator that makes it easier for you to manage multi-availability-zone rollouts. Consider using the Kubernetes Rollout Operator when you run Grafana Mimir on Kubernetes with zone awareness enabled.

Enabling zone-awareness via the Grafana Mimir Jsonnet

Instead of configuring Grafana Mimir directly, you can use the Grafana Mimir Jsonnet to enable ingester and store-gateway zone awareness. To enable ingester and store-gateway zone awareness, set the top level multi_zone_store_gateway_enabled or multi_zone_ingester_enabled Jsonnet fields to true. These flags set the required Grafana Mimir configuration parameters that support ingester and store-gateway zone awareness.