How Cortex uses the Prometheus Write-Ahead Log (WAL) to prevent data loss

• 2020-04-29 • 2 min

After five years leading the development of Cortex, Grafana Labs is no longer contributing to this project. In March 2022, we launched Grafana Mimir, an open source long-term storage for Prometheus that lets you scale to 1 billion metrics and beyond. To learn more, please read the TSDB announcement blog and visit the Grafana Mimir page.

30 Mar 2022

Since the beginning of the Cortex project, there was a flaw with the ingester service responsible for storing the incoming series data in memory for a while before writing it to a long-term storage backend. If any ingester happened to crash, it would lose all the data that it was holding. While there is replication to take care of this issue, if one ingester could crash, the same bug could cause all the other ingesters to crash – and all the data in a time frame would be lost until we fix it or roll back to an old version.

This was fixed when we introduced a Write-Ahead Log (WAL) similar to Prometheus’ TSDB in January of this year. With WAL, whenever an ingester gets a write request, it logs this event into a file along with storing it in the memory. Then, if an ingester happens to crash, it can replay these events on the disk and restore the in-memory state that it had before crashing. We use the Prometheus WAL package to manage writing and reading these events on the disk.

In heavy load ingesters, the WAL replay is usually slower than what Cortex would require. So along with WAL, the ingester writes a snapshot of all the data in its memory to disk (in the form of chunks, which has multiple samples compressed into a single blob) at regular intervals, called a “checkpoint.” A checkpoint is faster to replay as it restores a chunk (up to 6h of data) at a time, while replay of WAL is a sample at a time. Finally after this, the replay consists of replaying the checkpoint and the remaining WAL that is not included in the checkpoint.

When this feature was added, it was marked as experimental, as it was not battle-tested yet. After some rigorous testing and some enhancements like spreading the checkpoint writes for evenly distributed disk writes and switching over to Prometheus TSDB WAL record format, it is ready to shed its experimental tag. The tag will be officially removed in a couple of weeks.

If you would like to migrate to using WAL (or starting with Cortex :)), have a look at the production guide on WAL for more information.

Interested in finding out more about Cortex?

Check out the on-demand recording of Grafana Labs’ recent Taking Prometheus to Scale with Cortex webinar featuring Cortex co-creator Tom Wilkie and maintainer Goutham Veeramachaneni.

Feedback

How Cortex uses the Prometheus Write-Ahead Log (WAL) to prevent data loss

Interested in finding out more about Cortex?

Related content

How Cortex uses the Prometheus Write-Ahead Log (WAL) to prevent data loss

Interested in finding out more about Cortex?

Related content

Grafana Loki 101: How to ingest logs with Alloy or the OpenTelemetry Collector

The next generation of Grafana Mimir: Inside Mimir's redesigned architecture for increased...

Metrics, logs, and literature: Inside The National Library of the Netherlands’ observability stack