How We’ve Made It Easy to Migrate Data Using Metrictank Importer Tools
There’s a huge need among our Metrictank and Grafana Cloud customers to be able to import their existing data from Graphite, so we recently refactored the importer tools to make the process easier.
A few years ago, I wrote a utility to import Whisper data into Cassandra. At that time, Cassandra was the only store that Metrictank supported. But since then, we’ve added Bigtable, and we’ll likely add more stores. So we needed to modify these important utilities so they are able to import into all types of stores that Metrictank supports now and in the future.
There are two tools that work in unison: a reader and a writer.
The reader runs where the source data lives in the customer’s infrastructure. It currently only supports Graphite’s Whisper format . The importer writer runs on our side, receives the data that the reader sends, and writes it into the store, which is where Metrictank also accesses it.
When we refactored the writer, we modified it so that it uses the standard Store interface, which means it can import data into all the stores that already exist in Metrictank. And every time we add another store to Metrictank, we can automatically use it in the importer writer. A big bonus is that when we reuse stores that we have already implemented for Metrictank, we also get all of those optimizations that we added there.
The reader is converting the schema of the data in some cases. In our infrastructure, we have one schema that we store the data in. When a customer wants to import data, very often the customer’s data is going to be in a different schema, so we need to convert that schema to make it match our schema. The new and improved reader is doing that for us automatically by comparing the two schemas and finding the best way to convert the data from one to the other. It can also move the imported metrics into a new namespace, if desired.
We also modified the format on the wire that is used to send the data from the reader to the writer. It’s now the same format that Metrictank uses internally (gorilla-tsz encoded chunks) and in the persistent stores. This makes writing more readers for different types of data in the future easy as well, while saving significantly on bandwidth.
Since the writer is very agnostic of what you actually do – it basically just receives data in that format and writes it into the store – we can now start using it for other purposes too. Data migrations between different types of stores have become much easier, which we can leverage for cluster migrations.
For example, if we want to migrate data from one type of database to another database, we can use that same writer. We would just need to point it at one of the Metrictank stores that we want to write the data to, then write a reader, which reads it from the source, converts it into that internal format that we use to pass the data into the store, and sends it over the wire to the writer. From there, it gets written into the store.
If you’re a Metrictank of Grafana Cloud customer and would like to use the importer tools, contact us for help.