Why companies choose Grafana Cloud over self-managed OSS stacks
While we all love open source technology and the community that comes with it, we don’t always have the time or resources to stand up, maintain, update, and troubleshoot a self-hosted stack.
“To spend time carefully managing where the storage goes, what our retention period is, and to make sure that the Prometheus node is beefy enough that we can actually do queries across the last six months of data … It was all a headache,” says Andrew Burian, principal SRE and engineering manager for the IT, security, and SRE teams at Dapper Labs, which shifted from a self-hosted stack to Grafana Cloud.
With the fully managed Grafana LGTM Stack hosted in Grafana Cloud, “we have curated the open source experience into an easy-to-use, opinionated, and integrated platform,” Grafana Labs CTO Tom Wilkie said in the ObservabilityCON 2022 keynote. Plus, “it’s still powered by the same open source projects you know and love.” We also give away what we call an “actually useful” free tier that comes with a lot of backend log, tracing, metric storage, and much more.
Considering a migration or starting from a self-managed OSS stack? It’s a position many organizations have found themselves in as their observability strategies evolved and grew. Here, five observability practitioners share their top reasons why they chose Grafana Cloud.
Support modern cloud architecture
Reduce burden of managing an observability stack and save engineering time
Reduce costs
In the end, there was one common outcome they all experienced after moving to Grafana Cloud. “It’s hard to please engineers,” summed up Carl Johnson, Director of Infrastructure and SRE at The Trade Desk. Since his team made the switch, “our engineers have been quite pleased.”
More time focused on apps, not managing an observability stack
The benefits of Grafana Cloud were almost instantaneous at The Trade Desk. “Query time immediately improved and many, many developers seemed to notice. Also, our reliability improved quite a bit,” says Site Reliability Engineer Patrick O’Brien. Today, “we have zero storage nodes, which were the most expensive piece of that stack. Now we just have three nodes and everything feeds back to Grafana Labs.”
Not only did the migration save the company money, but the shift also spared the engineering department the headaches of troubleshooting. “Metrics usage frustration improved nearly overnight once we went with the hosted platform,” says Carl Johnson. “The reason we know it was a success is the complaints and frustrations internally stopped."
Adds Johnson: “I think most of the ROI is really coming from time and labor savings. We can all say that what was once a time-sink was removed from our radar altogether.”
Same results happened at Dapper Labs. As their products experienced a 100x increase in users and their traffic shot up by 1,000x, their self-managed Prometheus-Grafana stack had to scale from 200,000 to almost 4M active series.
To eliminate the operational burden, Burian chose Grafana Cloud to run their visualization as well as deal with data warehousing. With only six people in the observability pod supporting an engineering organization of 100, Grafana Cloud allows Burian’s team to focus on bigger projects without having to worry about maintaining and upgrading every few months. Says Burian: “Anything that requires babysitting is a lost opportunity cost for us.”
Read more about Dapper Labs’ Grafana Cloud migration and check out how Grafana Cloud saves The Trade Desk valuable engineering hours.
Consolidate tools
Ultimate AI is an industry-leading customer support automation platform that helps companies improve customer satisfaction and increase efficiency with AI. Ultimate’s incident response, however, was anything but automatic prior to adopting Grafana Cloud.
Though they were already Grafana OSS users, “it wasn’t heavily used because we had dashboards and logs and on-call stuff spread across many different applications,” says Shashi Ravula, Platform Engineering Manager. Same could be said for their observability bills. “We were spreading our money across multiple different tools and [the system] was indeed doing its job, but it took a lot of cognitive load for developers to actually understand all of those tools,” says Senior Software Engineer Alexander Rösel.
They eventually centralized on-call management in Grafana IRM, which includes Grafana OnCall and Grafana Incident hosted on Grafana Cloud. Then they quickly built out their managed stack on Grafana Cloud to include Grafana Cloud Logs, Grafana Cloud k6, and soon they’ll be adding Grafana Cloud Traces. “Part of the appeal of Grafana Cloud was the idea that we can have all of those things in one suite,” says Ravula, “so it will be very easy for developers to navigate through the dashboard and OnCall and get those metrics right next to the logs and traces.”
Learn how Ultimate AI leverages Grafana Cloud IRM.
Increase security for customers
Sometimes you just want more of a good thing. That’s the case at Royal IHC, which used Grafana OSS to create dashboards for their customers as part of integrated solutions that improve operational efficiency for maritime fleets around the world.
Guus Derksen, a Royal IHC project leader, was so happy with the dashboards his team was able to create that he wanted to expand their offerings with Grafana Cloud Advanced, which provides built-in security and access features for their clients.
There are a lot more opportunities for growth and evolution at Royal IHC, which is possible because Grafana Cloud also maintains everything for their individual clients. “It was quite user friendly,” said Derksen. “It definitely gave us the right direction to move in with the development we are going through in general.”
Find out more about Royal IHC’s observability journey with Grafana Cloud Advanced
Migration to Prometheus
At Kambi, they had a “pretty standard” setup for Graphite that was based on Python. As the leading independent provider of premium sports betting technology and services within the global regulated betting and gaming industry, Kambi had an infrastructure that included around 500 services feeding into an HAProxy that divided the load between six instances of carbon-relay. Carbon-relay nodes then forwarded it to the carbon-cache nodes, which stored the actual data as whisper files.
Soon, however, issues began popping up. Not only was their disk space, CPU, and even RAM running out. Kambi SRE Frank Stengård’s team discovered that in Graphite, many metrics were being sent at more frequent intervals than they were actually stored at, and the values were zeros or mostly zeros. As Stengård put it: “The house was burning now. We needed to fix it.”
To battle their three-alarm fire, Stengård and his team decided to modify Hadrianus, their own open source application-aware firewall load-balancer, to send Kambi data in a mirror replica to a third-party provider. Since Kambi was already a Grafana OSS fan, they decided to test sending all of the production data straight to Grafana Cloud— and to their pleasant surprise the hosted platform was able to handle their telemetry load. It also solved Stengård problems— the company decided they wanted to use Prometheus instead of Graphite because it has more popular support and works slightly better in Kubernetes. So Grafana Cloud was also a good fit because it not only supports Graphite; it also enabled an easy migration path to Prometheus, which they did over time.
Watch Kambi’s deep dive into their Prometheus migration with Grafana Cloud.
If you’re not already using Grafana Cloud — the easiest way to get started with observability — sign up now for a free 14-day trial of Grafana Cloud Pro, with unlimited metrics, logs, traces, and users, long-term retention, and access to all Enterprise plugins.