How PostgreSQL and Grafana Can Improve Monitoring Together
TimescaleDB is an open source database packaged as a Postgres extension that supports time series, but “it looks as if it were just Postgres,” said Timescale’s Head of Product, Diana Hsieh. “So you can actually use the entire ecosystem. You can use all of the functions that are enabled in Postgres – like JSON indexes, relational tables, post JSON – and they all work with Timescale.”
Last year, Timescale contributed the PostgreSQL query builder to Grafana v5.3. The new visual query editor for the PostgreSQL datasource makes it easier for users to explore time series data by improving the discoverability of data stored in PostgreSQL. Users can use drop-down menus to formulate their queries with valid selections and macros to express time series specific functionalities.
“The reason why we decided to build that was because it really makes it easier for people with varying levels of SQL ability to actually write SQL queries for advanced users,” Hsieh told the crowd at GrafanaCon.
The query builder allows users to auto-fill different columns so that “people who work more in devops and maybe only use PromQL can actually use a query editor that looks more familiar,” Hsieh explained.
The tool also reflects the permissions given to each Grafana instance. “Users will be able to know which table they can query within a database without having to actually know what the underlying schema is in the actual database,” said Hsieh. (For more about the query builder’s features, check out this blog post.)
But is SQL even a feasible solution for time series monitoring systems? At GrafanaCon, Hsieh debunked all the doubts around SQL during her talk.
Not a lot of people are familiar with SQL
One of the main reasons why Hsieh says SQL is “so awesome” is because it’s actually a really well-known querying language.
According to a recent Stack Overflow survey, SQL is ranked fourth among the most popular technologies for programming, scripting, and markup languages. “I also did a quick search on LinkedIn about how many people know SQL versus do not list SQL in their profiles, and it’s nine million versus a couple hundred thousand,” said Hsieh. “I think this is really telling.”
Hsieh also points out that business analysts know SQL. “I knew SQL before I started coding,” said Hsieh. “It’s actually a really well-known language that spans across an organization.”
Finally, two other distributed non-SQL databases are actually introducing SQL. “Elastic has an SQL-like language, and Kafka even has a little logo for KSQL,” said Hsieh. “This just tells you that even some of the non-SQL databases also really like SQL.”
There are new databases languages to invest in
The value of implementing SQL goes beyond technical benefits.
“It’s a cost consideration,” said Hsieh. “At the end of the day, what we’ve come to realize is that data isn’t just about being able to store it. You also have to be able to access it, and this happens at all of the layers of a given architecture.”
On a micro level, for example, KDB is a specific database that financial companies tend to use and because KDB developers are harder to find, they earn considerably more money. “If you think about the database that you’re choosing to be a foundation for your organization, consider how expensive it is to get an SQL developer compared to a KDB developer.”
And if you take a look at the bigger picture, “if you choose a language that people don’t know, you’re going to have to train your entire organization,” said Hsieh. Or else, you risk creating different silos where some people know how to access sources of data that others don’t.
“It’s not just the developers and business analysts who are touching SQL,” said Hsieh. “It’s also the people who know how to manage SQL, people who know how to build relational schemas, people who know how to operate Postgres. It’s a larger ecosystem. So it’s really an investment in choosing something that expands across an organization and lasts a little longer.”
SQL doesn’t scale
Hsieh admitted that this was “the elephant in the room.”
“I would say that I kind of agree with that statement,” said Hsieh.
Traditional SQL databases are optimized for transactional semantics, which time series databases are not.
With time series data, there’s continuously new data being ingested over time, which requires more storage. But in a Postgres server built for a transactional use case, the ratio of CPU, RAM, and storage will be dramatically different. As a data set grows, the ingestion in Postgres will decrease dramatically because the index gets so big it can’t fit in RAM anymore.
Still, Hsieh remains dedicated to the 20-year-old database language because “we think boring is awesome with databases,” said Hsieh.
So Timescale implements Postgres at the storage layer. “Postgres is so interestingly extensible that we’ve actually ripped out how schemas are managed, and we’ve provided our own abstraction layer,” explained Hsieh. “What the abstraction layer does is take a large table that has a lot of times series in it, and it will break everything up into chunks based on time intervals. So it automatically partitions data.”
The index is, therefore, not built on the entire table. “So now when you’re inserting data, all of those indexes built in a chunk can actually live within memory,” said Hsieh. “It speeds everything up, and you can write things a lot faster at the query planner level.”
With Timescale, if users write into a table, underneath the partitioning will automatically happen. And if there is a time series query, it will route it through an optimized query planner for the time series. But if you have a standard SQL, it’ll go through standard SQL queries.
Ultimately, “you can actually have relational databases and time series databases all in the same database,” said Hsieh. “You can actually get a lot better performance using Timescale, and you don’t see that performance degradation in terms of inserts … With an 8-core machine, we’re able to get 1.11 million metrics per second.”
In the end, it’s not about whether or not SQL can scale, said Hsieh. “One of the biggest challenges people have is changing the time series framework into thinking about how they would actually model it in SQL because it is a different way of thinking about SQL.”
For more from GrafanaCon 2019, check out all the talks on YouTube.