Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

Behind the scenes of a major Grafana release: the journey to Grafana 11 and what’s next

Behind the scenes of a major Grafana release: the journey to Grafana 11 and what’s next

2024-11-15 18 min

When we released Grafana 11 earlier this year, we had three main goals in mind:

  1. Make it easier to explore, visualize, and alert on your data
  2. Make it easier to manage Grafana at scale
  3. Make it easier to extend the power of Grafana via plugins and better plugin and app development tools

You may be sensing a theme here. And some of you — especially those who are just starting to explore all the new features in the latest major Grafana release — might be wondering how, exactly, we delivered on each of these goals.

In episode 7 of “Grafana’s Big Tent” podcast, a team of Grafanistas answer that very question and more. This week, hosts Mat Ryer, Grafana Labs Engineering Director, and Tom Wilkie, Grafana Labs CTO, are joined by Mitch Seaman, Senior Product Director for Grafana, and Torkel Ödegaard, co-founder of Grafana Labs and the creator of Grafana.

The group dives into the details of Grafana 11, covering everything from Explore Metrics to the arrival of the much-anticipated subfolders feature. They also offer a behind-the-scenes look at how the Grafana Labs team builds a pipeline of new Grafana features (hint: the community and hackathons play a big role), a preview of what’s in store for Grafana 12, and, importantly, their opinions on the Oasis reunion tour.

In this blog, we share some of the highlights from the show. You can also check out the full podcast episode below.

Note: The following are highlights from episode 7, season 2 of “Grafana’s Big Tent” podcast. The transcript below has been edited for length and clarity.

Mitch Seaman: Torkel, when you started Grafana, 10 and a bit years ago, did you think you’d be working on it 10 and a bit years later?

Torkel Ödegaard: No. It was like a hobby project that maybe would be somewhat useful in the team I was working in… so I had no idea.

Mat Ryer: It’s a hobby project that’s got way out of hand.

Torkel: It is kind of fun, though, in the sense that my day-to-day isn’t radically different from the early days and in the sense that I still mostly code, look at PRs, and help teams with some tricky changes maybe. But yeah, it is still looking at UX challenges and UI things, and trying to make Grafana a little bit better every day.

Grafana 11: the highlights

Tom Wilkie: Anyway, enough about memory lane. What is new in Grafana 11, Mitch?

Mitch: Yeah, so at a high level, the things that we were trying to do in Grafana 11 were to make sure we helped newer users to monitor and troubleshoot more easily. And “newer” is a pretty broad term. This could actually mean newer, or it could be just users who are a little bit less patient. We’re sort of past the early adopter stage with Grafana, where you have a lot of tinkerers who are happy to sort of get into the weeds. And so we just wanted to iron out some of the rough edges. The biggest rough edge is learning PromQL and LogQL, and executing your first query.

Then we improved managing Grafana as well — so, making improvements to auth and the way that you’d set up things like single sign-on. We also created a really nice tool to migrate from Grafana open source or Grafana Enterprise up to Grafana Cloud, which is great for a bunch of reasons, and free for most people.

We always like to say “Whatever else is happening in the Grafana world, we are always working on new data sources and new visualizations.” So we had a bunch of new data sources, and some more panels and better features in panels.

And then the big thing that happened, which we’re constantly working on, is making Grafana itself a lot more extensible, and sort of refactoring our backend to make it much easier to build applications that look and feel just like regular features in Grafana.

Ease-of-use improvements: Explore Metrics, streamlined migrations to Grafana Cloud, and more

Tom: Let’s start with the first one you mentioned, the ease of use features. I think you’re referring to the new Explore features, right? Torkel, you actually built the first prototype for Explore Metrics, didn’t you?

Torkel: Yeah, so that was the feature that I was most excited about in Grafana 11, as well, is this Explore Metrics feature, which really tackles the “getting started” experience. But also, the thing I like is that it targets both new users, but also really experienced users, in that it kind of does a lot of work for you, and you don’t have to write your Prometheus query. You can look at a metric from all its possible dimensions, and break it down or group it by all its possible dimensions, without having to build a complex dashboard, add a bunch of group-by variables, or do a lot of work.

So even if you’re an experienced PromQL user, you can actually get a lot of mileage out of this UI as a way to troubleshoot, and find out what kind of pod is causing a spike in a heavily aggregated graph.

Tom: I can vividly remember when you were working on this, because you were sitting next to me in London at the time. And you were so excited about it.

Torkel: Yeah, most of it was done during a leadership offsite, where I didn’t pay much attention to what you and others in that room were saying, because I was just so into this and saw it bridging such a huge gap in terms getting to the end goal: what do you actually want to use when you’re troubleshooting? Like, being able to quickly go from a graph to seeing what’s actually causing a spike, and being able to do that without having to manually jump to Explore, modify the query, or build detailed, drill-down dashboards. Having Grafana do all of that for you was just… really magical.

Tom: So I consider myself to be a PromQL expert, and even I find Explore Metrics super useful. You can take a histogram and get to the heatmap of that histogram in the new Explore Metrics so much quicker and easier than you can if you try and write the correct query, and get the right group-bys, and configure the panel yourself. I’ve actually found myself using it almost on a daily basis, even though I can do the PromQL myself.

And so this was, January, I think, when you did the first prototype. And then we had a hackathon, and you’d taken the inspiration from Explore Metrics to build Explore Logs, right, Mat?

Mat: That’s it. So it was the same idea, looking at what they’ve done in Explore Metrics and this new way of looking at it. So Cyril Tovena [Senior Software Engineer at Grafana Labs] said to me, “What if we did the same thing as Explore Metrics, but for logs?” And it just worked out really well. It’s just the same idea transposed. And that one did win the hackathon. And then there was already a team that jumped on this, and did a fantastic job to get Explore Logs out for public preview at GrafanaCON.

Tom: I think we’ve really kind of broken down a mental barrier in Grafana now, because this is also one of the first Grafana features that’s very data source-specific. This is an experience dedicated for Prometheus, dedicated for Loki, and now with Tempo and Pyroscope, as well. Torkel, am I right? We’ve not really done anything like that in the past, have we?

Torkel: No, no. I think that there have been some features that have had limited data source support, but not like a main feature where we thought “Let’s build it for Prometheus only. Let’s start there. Let’s really nail this problem. Let’s think about other data sources in the future, if ever… But let’s just see where we get to if we only focus on one data source and one problem.”

Tom: Yeah. And it turns out we get pretty far. When you narrow down the space of challenges, you can build richer solutions.

Mitch: Yeah, and behind every Grafana user who’s trying to learn PromQL for the first time, there’s a team of site reliability engineers who have set up the observability tooling. Especially in a bigger company, you’ve got a core team — maybe it’s five or six people — who know something about observability and know something about operating our stack. And actually, for those of us working on Grafana, those are the people we talk to the most often, because they’re trying to offer a really good service to their users. And we want to make sure that we’re providing them with good tools as well.

So the things that we focused on this time around had to do with ease of use. So we shipped a new UI for single sign-on.

But I guess the thing that I was the most excited to present, and definitely the most nervous to demo, was the cloud migration app. This is something we hear about over and over again… and it turns out the migration process can be a bit of a barrier. So we started making a tool. The focus was to be able to press one button and migrate my entire Grafana instance into Granfana Cloud.

Subfolders: a seemingly subtle, but mighty, feature

Tom: I think one of the most requested features in GitHub for Grafana was subfolders, as well, wasn’t it?

Mitch: That’s true. You could call that a management feature, for sure, but it’s something that affects everyone. Obviously, it took a huge amount of work to make this feature happen, because folders, as it happened — Torkel, right? — were dashboards, or are dashboards?

Torkel: Well, our first iteration of it — we had an idea that they were dashboards, in the sense that you could create a better hierarchy there. So, an overview dashboard would actually be like a container, as well. We did sort of abandon that idea, but they still kind of lived at the same dashboard. We never truly refactored away from it, so they still lived in the dashboard table, which created all sorts of headaches later on.

Mitch: Yeah. So two years and thousands of hours later, you can now put folders in folders, which sounds subtle, but is very important. And features like that are surprisingly challenging to execute. Once they’re there, it’s obvious they always should have been.

Updates to data visualizations

Mat: There’s sort of a theme around more “grown-up” features being needed, as more and more people start using Grafana. But there’s always a bunch of improvements to the “everyday” — meaning, the data visualizations — isn’t there?

Torkel: Yeah. That’s something that we always do — add new options to visualizations. But every new feature to a visualization needs to really prove itself. I can come in quite late in some PRs and say, “But is this really needed?” Because one thing that I’ve been worried about since the start of Grafana is feature creep.

It’s always a hard thing to know which new option to pursue. And, especially if it’s a UI option, I ask if it’s really needed. But many times the answer is “yes,” because it solves a very broad problem that many users and customers have.

So, the Canvas panel has a new button feature. You can add buttons and actions, which is really cool. And the XY chart is a charting/graphing panel that is much more flexible in terms of what types of data you visualize. It’s not just time series. You can have anything on the X and Y axis, and it’s geared towards non-time series graphs.

Mitch: I think those are the highlights. And I’ll note that whatever else is going on, there’s always this steady stream of teams in Grafana who are shipping and improving data sources and panels. These are the roots of the company. And it’s funny how much we talk a lot about the new stuff and the flashy stuff, but when it comes time to communicate a release, what the data visualization team is working on and what the data sources team is working on, is really the bread and butter.

What sparks ideas for new Grafana features?

Mat: So, where do the ideas come from? How does that happen? And how do you choose what’s important, and what’s going to make it in?

Mitch: The short answer is everywhere.

Torkel: Yeah, we collect feedback from GitHub, which is the most direct place. We look at how many thumbs-up a GitHub issue has, how many comments, etc. Obviously, we also get lots of feedback from customers that use Grafana through Grafana Cloud, and through Twitter. When we have conferences, we get lots of ideas from people that influence what we build. And also a lot internally. We are super heavy users of Grafana, and through our Hackathons, and teams just trying to solve their own problems and painpoints, we come up with a lot of good ideas.

Mitch: There were two features in particular in Grafana 11 that are really interesting because big customers requested them, and they ended up being really big in the community. So, Torkel, you were alluding to the button panel. We had a customer say, “Hey, we have an old IT monitoring tool that has a really handy button that allows us to start, stop, or restart the server.” We said, “That sounds really dangerous,” and they said, “Well, we really want it.” So we thought, what if in our most flexible panel — the canvas panel — we give you the option to create a button? You can use it sort of like Postman; you can hit an API, and we’ll give you even some basic auth, and things like that. And it turns out there’s something very powerful about being able to put an interactive button on a Grafana dashboard.

And then cloud migration was another one. We were hearing about this from a lot of customers who wanted to migrate to Grafana Cloud. Especially in the early iterations, it’s a really useful tool for even a hobbyist who just wants to migrate over into the free tier of Grafana Cloud.

How we deliver new features

Tom: So there’s been a lot of changes in the way we build Grafana in the last year. What kind of things have we started doing differently at Grafana Labs?

Torkel: Well, the biggest thing is really how we ship software now to the cloud. We do that regularly, every week, every day, almost, depending on what we change — and have that be the way we build software. Like, okay, this has to work on Grafana Cloud first.

Mitch: Yeah, and then for open source releases, every two months we do a minor release that introduces new features, and then in the intervening months, we do a scheduled patch release that’s just bug fixes.

Tom: And this means the quality of the software we’re releasing, the open source software — it means it’s been tested with our Grafana Cloud users beforehand, right? So it means we’re starting to move quicker, and we’re releasing higher-quality software. And as an engineer at Grafana Labs, you can discover issues with new features that you’ve built significantly quicker than waiting for a release and waiting for someone to upgrade, and then use the feature, and then report a bug.

Mitch: Exactly, yeah. There’s always been, internally, a little bit of a difference between some of the new apps that we’re building in Grafana Cloud and Grafana OSS, which was that Grafana OSS was stuck to this monthly, or every six weeks, release cycle. That made it really hard for us to get feedback and, most importantly, to test reliability before the software went out to a really broad audience.

Tom: So what does a “major” release mean these days, when we’re releasing all these features into Grafana Cloud?

Mitch: Yeah, it sort of took the wind out of the sails of the GrafanaCON keynote a little bit, because cloud users are saying, “I’ve had access to this feature for the last three or four weeks.” But it meant that the release itself was way more stable. We’re talking, like, cutting bugs by a significant margin, and just seeing faster upgrades as a result, as people gain more confidence.

Tom: That is super cool. So how are we managing that kind of flow of changes in Grafana? We’re launching stuff into Grafana Cloud on a daily or weekly basis. When does that make it into the hands of our open source users and our Grafana Enterprise users?

Mitch: So, some of the specific dates and timings are in the details, but I’ll do my best. We have release channels in Grafana Cloud. So we have a fast channel, which upgrades roughly daily; we have the stable channel, which upgrades weekly; and then the slow channel, which upgrades monthly. And then Grafana open source and Enterprise releases on that sort of bimonthly cadence, at least for new features. So the releases are sort of cycling, and we’ll get features into, for example, the stable channel, I think that were from the past (again) week, roughly. But yeah, the minor releases will get software that has been running in Grafana Cloud for at least a week. That’s the the cliff-notes version.

What to know about Angular deprecation

Tom: So there’s one feature and one kind of inside-baseball thing we did in Grafana 11 that took seven years, I think, that no one’s mentioned yet. Torkel, do you know what I’m talking about?

Torkel: Is it Angular?

Tom: Yeah, tell us about the Angular deprecation.

Torkel: Grafana, as we said earlier on, is soon 11 years old. And through that time, technologies have changed. So, in the early days we started with a JavaScript frontend framework called Angular 1. So there’s a new modern Angular that’s not the same library. This is a library that Google deprecated, and replaced with a new version of Angular that is completely different. So we started switching and rewriting Grafana to React, starting in 2018, and replaced component by component, and also started working on a new panel and plugin architecture based on React. It took a long time, because it’s like replacing all the parts of a car as it’s speeding down the track. And it also has to be compatible with all the car extensions that people put on their cars via Grafana plugins, which has been very challenging.

We try really hard to not break existing setups, and existing plugins, and existing dashboards, because we want people to upgrade, get new features, and take advantage of new capabilities. So having a really strong focus on backward compatibility has made this process take a lot longer than it would have otherwise. But yeah, this journey is now finally at an end.

What’s next?

Tom: What does the immediate future hold for Grafana? What’s going to be in Grafana 12?

Torkel: Now that we’re kind of wrapping up this new dashboard architecture and the migration to it, we at a point that I’m really excited about. We can explore impactful new dashboard features and changes, which we haven’t done in a long time, just because the old dashboard architecture was still too entangled in that old Angular way of working.

So that’s kind of what the new Scenes architecture does… and it makes it easier to add more dynamic features. One of the things that excites me there is making it easier to build more dynamic layouts of panels, maybe tabbed layouts, and more flexible positioning of panels that could make defining dashboards as code easier.

Mitch: For sure. We’re talking a lot internally about making Grafana easy to operate and extend in cloud, which of course means nice things, like speed and reliability in the codebase, but also, there are some cool opportunities that come out of it, like the ability to search for any resource in Grafana by a lot more attributes — so making search faster, more powerful, and a little bit more accessible.

Tom: You also alluded to a data sources roadmap. Do you happen to know which ones are coming soon?

Mitch: So, the data sources team in the background has made some improvements to the way that they ship and manage data sources, and that’s made it possible to ship a lot really quickly. So just listen to this list. These are data sources that have come out so far this year. And for reference, we usually put out four or five data sources. We’ve got PagerDuty, SurrealDB, DynamoDB, the Infinity plugin is supported, CosmosDB, Yugabyte, Catchpoint, Cloudflare, Adobe Analytics, CockroachDB, Netlify, Drone, Zendesk, and Atlassian Statuspage. We’re talking 15 data sources in the first half of the year. So there are a lot coming out.

And yeah, up next, we got some nice ones. Basically, we’re getting to the end of the long tail of observability tools. We’re spending a lot of time covering databases, like direct integrations to databases, so you can visualize application data, and then we have some really interesting forays into developer tools. There’s probably even more stuff in the public roadmap, but there’s a lot happening in data sources. It’s almost like a different podcast.

“Grafana’s Big Tent” podcast wants to hear from you. If you have a great story to share, want to join the conversation, or have any feedback, please contact the Big Tent team at bigtent@grafana.com. You can also catch up on the first and second season of “Grafana’s Big Tent” on Apple Podcasts and Spotify.