Migrate from single zone to zone-aware replication in Mimir Helm chart version 4.0
The mimir-distributed
Helm chart version 4.0 enables zone-aware replication by default. This is a breaking change for existing installations and requires a migration.
This document explains how to migrate stateful components from single zone to zone-aware replication with Helm. The three components in question are the alertmanager, the store-gateway and the ingester.
The migration path of Alertmanager and store-gateway is straight forward, however migrating ingesters is more complicated.
This document is applicable to both Grafana Mimir and Grafana Enterprise Metrics.
Prerequisite
Depending on what version of the mimir-distributed
Helm chart is installed currently, make sure to meet the following requirements.
If the current version of the
mimir-distributed
Helm chart is less than 4.0.0 (version < 4.0.0).Follow the upgrade instructions for 4.0.0 in the CHANGELOG.md. In particular make sure to disable zone awareness before upgrading the chart:
ingester: zoneAwareReplication: enabled: false store_gateway: zoneAwareReplication: enabled: false rollout_operator: enabled: false
Note
A direct upgrade from non-zone aware ingesters to zone-aware ingesters will cause data loss.
If you have modified the
mimir.config
value, either make sure to merge in the latest version from the chart, or consider usingmimir.structuredConfig
instead.For more information, see Manage the configuration of Grafana Mimir with Helm.
If the current version of the
mimir-distributed
Helm chart is greater than 4.0.0 (version >= 4.0.0).Make sure that zone-aware replication is turned off for the component in question.
For example, the store-gateway:
store_gateway: zoneAwareReplication: enabled: false
If you have modified the
mimir.config
value, either make sure to merge in the latest version from the chart, or consider usingmimir.structuredConfig
instead.For more information, see Manage the configuration of Grafana Mimir with Helm.
Migrate alertmanager to zone-aware replication
Using zone-aware replication for alertmanager is optional and is only available if alertmanager is deployed as a StatefulSet.
Configure zone-aware replication for Alertmanagers
This section is about planning and configuring the availability zones defined under the alertmanager.zoneAwareReplication
Helm value.
There are two use cases in general:
Speeding up rollout of alertmanagers in case there are more than 3 replicas. In this case use the default value in the
small.yaml
,large.yaml
,capped-small.yaml
orcapped-large.yaml
. The default value defines 3 “virtual” zones and sets affinity rules so that alertmanagers from different zones do not mix, but it allows multiple alertmanagers of the same zone on the same node:alertmanager: zoneAwareReplication: topologyKey: "kubernetes.io/hostname" # Triggers creating anti-affinity rules
Geographical redundancy. In this case you need to set a suitable nodeSelector value to choose where the pods of each zone are to be placed. Setting
topologyKey
will instruct the Helm chart to create anti-affinity rules so that alertmanagers from different zones do not mix, but it allows multiple alertmanagers of the same zone on the same node. For example:alertmanager: zoneAwareReplication: topologyKey: "kubernetes.io/hostname" # Triggers creating anti-affinity rules zones: - name: zone-a nodeSelector: topology.kubernetes.io/zone: us-central1-a - name: zone-b nodeSelector: topology.kubernetes.io/zone: us-central1-b - name: zone-c nodeSelector: topology.kubernetes.io/zone: us-central1-c
Note: as the
zones
value is an array, you must copy and modify it to make changes to it, there is no way to overwrite just parts of the array!
Set the chosen configuration in your custom values (e.g. custom.yaml
).
Note: The number of alertmanager Pods that will be started is derived from
alertmanager.replicas
. Each zone will startalertmanager.replicas / number of zones
pods, rounded up to the nearest integer value. For example if you have 3 zones, thenalertmanager.replicas=3
will yield 1 alertmanager per zone, butalertmanager.replicas=4
will yield 2 per zone, 6 in total.
Migrate Alertmanager
Before starting this procedure, set up your zones according to Configure zone-aware replication for alertmanagers.
Create a new empty YAML file called
migrate.yaml
.Start the migration.
Copy the following into the
migrate.yaml
file:alertmanager: zoneAwareReplication: enabled: true migration: enabled: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.In this step zone-awareness is enabled with the default zone and new StatefulSets are created for zone-aware alertmanagers, but no new pods are started.
Wait until all alertmanagers are restarted and are ready.
Scale up zone-aware alertmanagers.
Replace the contents of the
migrate.yaml
file with:alertmanager: zoneAwareReplication: enabled: true migration: enabled: true writePath: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until all new zone-aware alertmanagers are started and are ready.
Set the final configuration.
Merge the following values into your custom Helm values file:
alertmanager: zoneAwareReplication: enabled: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command using your regular command line flags.Ensure that the Service and StatefulSet resources of the non zone-aware alertmanagers have been deleted. The previous step also removes the Service and StatefulSet manifests of the old non zone-aware alertmanagers. In some cases, such as when using Helm from Tanka, these resources will not be automatically deleted from your Kubernetes cluster even if the Helm chart no longer renders them. If the old resources still exist, delete them manually. If not deleted, some of the pods may be scraped multiple times when using the Prometheus operator for metamonitoring.
Wait until old non zone-aware alertmanagers are terminated.
Migrate store-gateways to zone-aware replication
Configure zone-aware replication for store-gateways
This section is about planning and configuring the availability zones defined under the store_gateway.zoneAwareReplication
Helm value.
There are two use cases in general:
Speeding up rollout of store-gateways in case there are more than 3 replicas. In this case use the default value in the
small.yaml
,large.yaml
,capped-small.yaml
orcapped-large.yaml
. The default value defines 3 “virtual” zones and sets affinity rules so that store-gateways from different zones do not mix, but it allows multiple store-gateways of the same zone on the same node:store_gateway: zoneAwareReplication: enabled: false # Do not turn on zone-awareness without migration because of potential query errors topologyKey: "kubernetes.io/hostname" # Triggers creating anti-affinity rules
Geographical redundancy. In this case you need to set a suitable nodeSelector value to choose where the pods of each zone are to be placed. Setting
topologyKey
will instruct the Helm chart to create anti-affinity rules so that store-gateways from different zones do not mix, but it allows multiple store-gateways of the same zone on the same node. For example:store_gateway: zoneAwareReplication: enabled: false # Do not turn on zone-awareness without migration because of potential query errors topologyKey: "kubernetes.io/hostname" # Triggers creating anti-affinity rules zones: - name: zone-a nodeSelector: topology.kubernetes.io/zone: us-central1-a - name: zone-b nodeSelector: topology.kubernetes.io/zone: us-central1-b - name: zone-c nodeSelector: topology.kubernetes.io/zone: us-central1-c
Note: as
zones
value is an array, you must copy and modify it to make changes to it, there is no way to overwrite just parts of the array!
Set the chosen configuration in your custom values (e.g. custom.yaml
).
Note: The number of store-gateway pods that will be started is derived from
store_gateway.replicas
. Each zone will startstore_gateway.replicas / number of zones
pods, rounded up to the nearest integer value. For example if you have 3 zones, thenstore_gateway.replicas=3
will yield 1 store-gateway per zone, butstore_gateway.replicas=4
will yield 2 per zone, 6 in total.
Decide which migration path to take for store-gateways
There are two ways to do the migration:
- With downtime. In this procedure old non zone-aware store-gateways are stopped, which will cause queries that look back more than 12 hours (or whatever
querier.query_store_after
Mimir parameter is set to) to fail. Ingestion is not impacted. This is the quicker and simpler way. - Without downtime. This is a multi step procedure which requires additional hardware resources as the old and new store-gateways run in parallel for some time.
Migrate store-gateways with downtime
Before starting this procedure, set up your zones according to Configure zone-aware replication for store-gateways.
Create a new empty YAML file called
migrate.yaml
.Scale the current store-gateways to 0.
Copy the following into the
migrate.yaml
file:store_gateway: replicas: 0 zoneAwareReplication: enabled: false
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until all store-gateways have terminated.
Set the final configuration.
Merge the following values into your custom Helm values file:
store_gateway: zoneAwareReplication: enabled: true rollout_operator: enabled: true
These values are actually the default, which means that removing the values
store_gateway.zoneAwareReplication.enabled
androllout_operator.enabled
is also a valid step.Upgrade the installation with the
helm
command using your regular command line flags.Ensure that the Service and StatefulSet resources of the non zone-aware store-gateways have been deleted. The previous step also removes the Service and StatefulSet manifests of the old non zone-aware store-gateways. In some cases, such as when using Helm from Tanka, these resources will not be automatically deleted from your Kubernetes cluster even if the Helm chart no longer renders them. If the old resources still exist, delete them manually. If not deleted, some of the pods may be scraped multiple times when using the Prometheus operator for metamonitoring.
Wait until all store-gateways are running and ready.
Migrate store-gateways without downtime
Before starting this procedure, set up your zones according to Configure zone-aware replication for store-gateways.
Create a new empty YAML file called
migrate.yaml
.Create the new zone-aware store-gateways
Copy the following into the
migrate.yaml
file:store_gateway: zoneAwareReplication: enabled: true migration: enabled: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait for all new store-gateways to start up and be ready.
Make the read path use the new zone-aware store-gateways.
Replace the contents of the
migrate.yaml
file with:store_gateway: zoneAwareReplication: enabled: true migration: enabled: true readPath: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait for all queriers and rulers to restart and become ready.
Set the final configuration.
Merge the following values into your custom Helm values file:
store_gateway: zoneAwareReplication: enabled: true rollout_operator: enabled: true
These values are actually the default, which means that removing the values
store_gateway.zoneAwareReplication.enabled
androllout_operator.enabled
is also a valid step.Upgrade the installation with the
helm
command using your regular command line flags.Ensure that the Service and StatefulSet resources of the non zone-aware store-gateways have been deleted. The previous step also removes the Service and StatefulSet manifests of the old non zone-aware store-gateways. In some cases, such as when using Helm from Tanka, these resources will not be automatically deleted from your Kubernetes cluster even if the Helm chart no longer renders them. If the old resources still exist, delete them manually. If not deleted, some of the pods may be scraped multiple times when using the Prometheus operator for metamonitoring.
Wait for non zone-aware store-gateways to terminate.
Migrate ingesters to zone-aware replication
Configure zone-aware replication for ingesters
This section is about planning and configuring the availability zones defined under the ingester.zoneAwareReplication
Helm value.
There are two use cases in general:
Speeding up rollout of ingesters in case there are more than 3 replicas. In this case use the default value in the
small.yaml
,large.yaml
,capped-small.yaml
orcapped-large.yaml
. The default value defines 3 “virtual” zones and sets affinity rules so that ingesters from different zones do not mix, but it allows multiple ingesters of the same zone on the same node:ingester: zoneAwareReplication: enabled: false # Do not turn on zone-awareness without migration because of potential data loss topologyKey: "kubernetes.io/hostname" # Triggers creating anti-affinity rules
Geographical redundancy. In this case you need to set a suitable nodeSelector value to choose where the pods of each zone are to be placed. Setting
topologyKey
will instruct the Helm chart to create anti-affinity rules so that ingesters from different zones do not mix, but it allows multiple ingesters of the same zone on the same node. For example:ingester: zoneAwareReplication: enabled: false # Do not turn on zone-awareness without migration because of potential data loss topologyKey: "kubernetes.io/hostname" # Triggers creating anti-affinity rules zones: - name: zone-a nodeSelector: topology.kubernetes.io/zone: us-central1-a - name: zone-b nodeSelector: topology.kubernetes.io/zone: us-central1-b - name: zone-c nodeSelector: topology.kubernetes.io/zone: us-central1-c
Note: as
zones
value is an array, you must copy and modify it to make changes to it, there is no way to overwrite just parts of the array!
Set the chosen configuration in your custom values (e.g. custom.yaml
).
Note: The number of ingester pods that will be started is derived from
ingester.replicas
. Each zone will startingester.replicas / number of zones
pods, rounded up to the nearest integer value. For example if you have 3 zones, theningester.replicas=3
will yield 1 ingester per zone, butingester.replicas=4
will yield 2 per zone, 6 in total.
Decide which migration path to take for ingesters
There are two ways to do the migration:
- With downtime. In this procedure ingress is stopped to the cluster while ingesters are migrated. This is the quicker and simpler way. The time it takes to execute this migration depends on how fast ingesters restart and upload their data to object storage, but in general should be finished in an hour.
- Without downtime. This is a multi step procedure which requires additional hardware resources as the old and new ingesters run in parallel for some time. This is a complex migration that can take days and requires monitoring for increased resource utilization. The minimum time it takes to do this migration can be calculated as (
querier.query_store_after
) + (2h TSDB blocks range period +blocks_storage.tsdb.head_compaction_idle_timeout
) * (1 + number_of_ingesters / 21). With the default values this means 12h + 3h * (1 + number of ingesters / 21) = 15h + 3h * (number_of_ingesters / 21). Add an extra 12 hours if shuffle sharding is enabled.
Migrate ingesters with downtime
Before starting this procedure, set up your zones according to Configure zone-aware replication for ingesters.
Create a new empty YAML file called
migrate.yaml
.Enable flushing data from ingesters to storage on shutdown.
Copy the following into the
migrate.yaml
file:mimir: structuredConfig: blocks_storage: tsdb: flush_blocks_on_shutdown: true ingester: ring: unregister_on_shutdown: true ingester: zoneAwareReplication: enabled: false
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait for all ingesters to restart and be ready.
Turn off traffic to the installation.
Replace the contents of the
migrate.yaml
file with:mimir: structuredConfig: blocks_storage: tsdb: flush_blocks_on_shutdown: true ingester: ring: unregister_on_shutdown: true ingester: zoneAwareReplication: enabled: false nginx: replicas: 0 gateway: replicas: 0
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until there is no nginx or gateway running.
Scale the current ingesters to 0.
Replace the contents of the
migrate.yaml
file with:mimir: structuredConfig: blocks_storage: tsdb: flush_blocks_on_shutdown: true ingester: ring: unregister_on_shutdown: true ingester: replicas: 0 zoneAwareReplication: enabled: false nginx: replicas: 0 gateway: replicas: 0
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until no ingesters are running.
Start the new zone-aware ingesters.
Replace the contents of the
migrate.yaml
file with:ingester: zoneAwareReplication: enabled: true nginx: replicas: 0 gateway: replicas: 0 rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until all requested ingesters are running and are ready.
Enable traffic to the installation.
Merge the following values into your custom Helm values file:
ingester: zoneAwareReplication: enabled: true rollout_operator: enabled: true
These values are actually the default, which means that removing the values
ingester.zoneAwareReplication.enabled
androllout_operator.enabled
is also a valid step.Ensure that the Service and StatefulSet resources of the non zone-aware ingesters have been deleted. The previous step also removes the Service and StatefulSet manifests of the old non zone-aware ingesters. In some cases, such as when using Helm from Tanka, these resources will not be automatically deleted from your Kubernetes cluster even if the Helm chart no longer renders them. If the old resources still exist, delete them manually. If not deleted, some of the pods may be scraped multiple times when using the Prometheus operator for metamonitoring.
Upgrade the installation with the
helm
command using your regular command line flags.
Migrate ingesters without downtime
Before starting this procedure, set up your zones according to Configure zone-aware replication for ingesters
Double the series limits for tenants and the ingesters.
Explanation: while new ingesters are being added, some series will start to be written to new ingesters, however the series will also exist on old ingesters, thus the series will count twice towards limits. Not updating the limits might lead to writes to be refused due to limits violation.
The
limits.max_global_series_per_user
Mimir configuration parameter has a non-zero default value of 150000. Double the default or your value by setting:mimir: structuredConfig: limits: max_global_series_per_user: 300000 # <-- or your value doubled
If you have set the Mimir configuration parameter
ingester.instance_limits.max_series
viamimir.config
ormimir.structuredConfig
or via runtime overrides, double it for the duration of the migration.If you have set per tenant limits in the Mimir configuration parameters
limits.max_global_series_per_user
,limits.max_global_series_per_metric
viamimir.config
ormimir.structuredConfig
or via runtime overrides, double the set limits. For example:runtimeConfig: ingester_limits: max_series: X # <-- double it overrides: tenantA: max_global_series_per_metric: Y # <-- double it max_global_series_per_user: Z # <-- double it
Create a new empty YAML file called
migrate.yaml
.Start the migration.
Copy the following into the
migrate.yaml
file:ingester: zoneAwareReplication: enabled: true migration: enabled: true replicas: 0 rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.In this step new zone-aware StatefulSets are created - but no new pods are started yet. The parameter
ingester.ring.zone_awareness_enabled: true
is set in the Mimir configuration via themimir.config
value. The flag-ingester.ring.zone-awareness-enabled=false
is set on distributors, rulers and queriers. The flags-blocks-storage.tsdb.flush-blocks-on-shutdown
and-ingester.ring.unregister-on-shutdown
are set to true for the ingesters.Wait for all Mimir components to restart and be ready.
Add zone-aware ingester replicas, maximum 21 at a time.
Explanation: while new ingesters are being added, some series will start to be written to new ingesters, however the series will also exist on old ingesters, thus the series will count twice towards limits. Adding only 21 replicas at a time reduces the number of series affected and thus the likelihood of breaching maximum series limits.
Replace the contents of the
migrate.yaml
file with:ingester: zoneAwareReplication: enabled: true migration: enabled: true replicas: <N> rollout_operator: enabled: true
Note: replace
<N>
with the number of replicas in each step until<N>
reaches the same number as iningester.replicas
, do not increase<N>
with more than 21 in each step.Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Once the new ingesters are started and are ready, wait at least 3 hours.
The 3 hours is calculated from 2h TSDB block range period +
blocks_storage.tsdb.head_compaction_idle_timeout
Grafana Mimir parameters to give enough time for ingesters to remove stale series from memory. Stale series will be there due to series being moved between ingesters.If the current
<N>
above iningester.zoneAwareReplication.migration.replicas
is less thaningester.replicas
, go back and increase<N>
with at most 21 and repeat these four steps.
If you are using shuffle sharding, it must be turned off on the read path at this point.
Update your configuration with these values and keep them until otherwise instructed.
querier: extraArgs: "querier.shuffle-sharding-ingesters-enabled": "false" ruler: extraArgs: "querier.shuffle-sharding-ingesters-enabled": "false"
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until queriers and rulers have restarted and are ready.
Monitor resource utilization of queriers and rulers and scale up if necessary. Turning off shuffle sharding may increase resource utilization.
Enable zone-awareness on the write path.
Replace the contents of the
migrate.yaml
file with:ingester: zoneAwareReplication: enabled: true migration: enabled: true writePath: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.In this step the flag
-ingester.ring.zone-awareness-enabled=false
is removed from distributors and rulers.Once all distributors and rulers have restarted and are ready, wait 12 hours.
The 12 hours is calculated from the
querier.query_store_after
Grafana Mimir parameter.Enable zone-awareness on the read path.
Replace the contents of the
migrate.yaml
file with:ingester: zoneAwareReplication: enabled: true migration: enabled: true writePath: true readPath: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.In this step the flag
-ingester.ring.zone-awareness-enabled=false
is removed from queriers.Wait until all queriers have restarted and are ready.
Exclude non zone-aware ingesters from the write path.
Replace the contents of the
migrate.yaml
file with:ingester: zoneAwareReplication: enabled: true migration: enabled: true writePath: true readPath: true excludeDefaultZone: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.In this step the flag
-ingester.ring.excluded-zones=zone-default
is added to distributors and rulers.Wait until all distributors and rulers have restarted and are ready.
Scale down non zone-aware ingesters to 0.
Replace the contents of the
migrate.yaml
file with:ingester: zoneAwareReplication: enabled: true migration: enabled: true writePath: true readPath: true excludeDefaultZone: true scaleDownDefaultZone: true rollout_operator: enabled: true
Upgrade the installation with the
helm
command and make sure to provide the flag-f migrate.yaml
as the last flag.Wait until all non zone-aware ingesters are terminated.
Delete the default zone.
Merge the following values into your custom Helm values file:
ingester: zoneAwareReplication: enabled: true rollout_operator: enabled: true
These values are actually the default, which means that removing the values
ingester.zoneAwareReplication.enabled
androllout_operator.enabled
from yourcustom.yaml
is also a valid step.Upgrade the installation with the
helm
command using your regular command line flags.Ensure that the Service and StatefulSet resources of the non zone-aware ingesters have been deleted. The previous step also removes the Service and StatefulSet manifests of the old non zone-aware ingesters. In some cases, such as when using Helm from Tanka, these resources will not be automatically deleted from your Kubernetes cluster even if the Helm chart no longer renders them. If the old resources still exist, delete them manually. If not deleted, some of the pods may be scraped multiple times when using the Prometheus operator for metamonitoring.
Wait at least 3 hours.
The 3 hours is calculated from 2h TSDB block range period +
blocks_storage.tsdb.head_compaction_idle_timeout
Grafana Mimir parameters to give enough time for ingesters to remove stale series from memory. Stale series will be there due to series being moved between ingesters.If you are using shuffle sharding:
Wait an extra 12 hours.
The 12 hours is calculated from the
querier.query_store_after
Grafana Mimir parameter. After this time, no series are stored outside their dedicated shard, meaning that shuffle sharding on the read path can be safely enabled.Remove these values from your configuration:
querier: extraArgs: "querier.shuffle-sharding-ingesters-enabled": "false" ruler: extraArgs: "querier.shuffle-sharding-ingesters-enabled": "false"
Upgrade the installation with the
helm
command using your regular command line flags.Wait until queriers and rulers have restarted and are ready.
The resource utilization of queriers and rulers should return to pre-migration levels and you can scale them down to previous numbers.
Undo the doubling of series limits done in the first step.
Upgrade the installation with the
helm
command using your regular command line flags.