How Adform transformed 1,300 different data sources into one central observability system with Grafana
Tech is filled with origin stories rooted in humble beginnings. From Hewlett Packard and Amazon (garage) to Facebook (dorm room), inspiration and innovation can happen anywhere.
In 2002, three men huddled in a Copenhagen basement bent on changing the way digital advertising buyers and sellers worked: How can we make that process better? Their answer was Adform, an advertising technology platform that now powers 25,000 clients across the world.
All of Adform’s consumer-facing usability and accolades — like a spot in Gartner’s Magic Quadrant and a Red Dot award for user experience — means a lot of behind-the-scenes efforts from their development and operations team to consistently innovate. As Louis e Kloster, Adform’s SVP of Marketing puts it: “Building an advanced ad tech platform is really a complex process.”
One that prompted Adform’s DevOps team to re-evaluate their observability solutions and wrestle with the same question that launched the company: How can we make that process better?
After years of operating in startup mode where developers were cherry-picking tools based on personal preferences and past experiences, it was time to take a step back and restructure a disparate and disorganized observability strategy into a centralized, cohesive approach. “We wanted to build monitoring as a service on the products we believe in,” says Adform’s DevOps Tech Lead Linas Daneliukas.
And they believed that Grafana could help them achieve that. By taking full advantage of Grafana’s open and composable platform and pairing it with Prometheus, Adform’s developers are able to maintain the flexibility in the tools they used while allowing the DevOps team to offer a centralized monitoring experience within the organization. Today, Adform runs more than 1,400 dashboards in Grafana, connecting more than 1,300 data sources and nearly 200 active users across 86 organizations.
“We, as an operator, can have control of the instance itself, but teams have control over their own smaller spaces and can grow as they see fit,” says Daneliukas. “Developers get the control they need, while still having someone who can manage the whole instance. So it was a win-win for us and them.”
Adform’s new observability slogan: One central system
In Adform’s early days, developers “were using what they knew or what they thought was good,” says Tomas Dabašinskas, a DevOps Services Delivery Manager.
But as the company matured and more teams were created, this disconnected ecosystem caused onboarding difficulties, reporting challenges, and infrastructure inefficiency when it came to maintenance and improvement. Any dreams of optimizing existing solutions ended as they faced the reality of constant troubleshooting and ad hoc problem solving. “We weren’t even able to keep things up-to-date,” says Daneliukas.
Around 2018, the DevOps team changed their approach. “We came to the decision as a company to move to centralized services,” adds Dabašinskas.
The first focus was to create a monitoring as a service solution. Initially, they tried to work with the existing software in place such as Graphite, Nagios, Zadig, Graylog, ELK Stack, and multiple Prometheus instances.
We tried to consolidate and do a selection of the tools that we can develop further. However, we view ourselves as a modern company and wanted to see what the best practices were and what’s emerging. So the decision was made to scrap everything we had in regards to observability and create one central and unified system, all powered by Grafana.
Linas Daneliukas, Devops Tech Lead, Adform
Still, the migration was not without challenges. Change, as always, is hard, especially when it includes people revising their processes. But Grafana’s ability to manage a huge number of data sources allowed Adform to slowly transition their teams to Prometheus for metrics by using Grafana as a core part of their stack. “We knew it was difficult for developers to move to Prometheus instantly,” says Dabašinskas. “But Grafana supporting options from Graphite to Prometheus and other choices allowed for this interim period [when developers could still use their preferred tools]."
The work was worth it. Grafana not only brought together data, but teams too — even if they didn’t share the same opinions about their observability tools. With Grafana’s organizations functionality, the DevOps team provided each organization in Adform its own isolated monitoring experience within Adform’s Grafana instance, which made the solution a more cost-effective and streamlined path compared to managing multiple instances. Users would then have view permissions for all of the other organization’s data while maintaining admin access over their own.
“It was a good way to separate our teams, but still allow them that visibility so they can find each other’s dashboards and visualizations,” Daneliukas says. “It also wouldn’t impose roadblocks down the line. We didn’t want to create a solution we would have to deprecate. We raised the criteria quite high for what the tool has to do so that we’re not stuck five or ten years down the line.”
Grafana’s open source roots and growing global community also aligned with Adform’s criteria. If developers get stuck on a problem, Grafana’s online communities have become a prime resource for staying in sync on how to optimize their stack. “Everything is solved via community forums,” says Daneliukas. With the help of the active community, “we’ve never had issues of getting stuck. Everything is just simple. We don’t run into roadblocks with Grafana.”
Instead they have a roadmap of new features and functionalities to always look forward to.
Choosing Grafana was a no-brainer for us because we saw that the tool has a future in front of it and a community behind it. It’s constantly being developed. It was the easiest path and the best path.
Linas Daneliukas, Devops Tech Lead, Adform
An inventory of wins with Grafana
Having a visualization tool across environments, while still maintaining multiple data sources, is key to Adform’s DevOps team turning their developers from hesitant participants to Grafana power users.
“The biggest impact is that people can find what they are looking for,” says Daneliukas. “When we had different tools and an alert fired, it was hard to know who to contact, where to find the dashboard. Then you needed credentials for that monitoring system, which you didn’t always have. You’re basically blind to most things that are happening. Now, with a central monitoring system, when you see an alert anywhere within the company, you can click on the alert and have access. You can go to their organization and view the corresponding dashboard. You have the data in front of you.”
This transparency — aided by built-in support for Prometheus Alertmanager in Grafana — facilitated greater communication and cross-collaboration across Adform’s teams. “Now everything is streamlined,” Daneliukas continues. “You can be sure, when you and the development team are talking about monitoring, you’re using the same language. You know Grafana is the place where we visualize metrics, and Prometheus is the tool we use to collect them. You don’t spend an hour talking to someone trying to fix a problem only to realize they are using their own monitoring solution, which is why nothing seemed to add up."
Consolidation also meant operational efficiency. Originally, development teams averaged 1-3 full-time employees (FTEs) per team and per month on infra maintenance and monitoring. With 25 teams, that adds up to 75 FTEs per month tied up with upkeep rather than innovation. Now, that number is down to 1 FTE per team. The team also provides better upkeep across the board, ensuring the latest versions of software are being used.
Walking around the office, you’ll see TV screens with Grafana dashboards all around. People continuously refer to them. In the end, with Grafana, the ease of troubleshooting has increased exponentially all while we’ve been able to cut our monitoring stack CPU usage by 50%.
Linas Daneliukas, Devops Tech Lead, Adform
Adform flows into the future
As they continue to scale, Adform is looking forward to exploring other ways to improve using Grafana, says Dabašinskas, pointing to Grafana Alerting. “We definitely will test it since for us, it would give us the best of both worlds of managing our alerts in Prometheus format as a code and also being able to see and manage all alerts in a user-friendly way via Grafana UI,” he says.
In the team’s continued effort to “make observability very simple” at Adform, they recently ran a successful PoC of Grafana Loki during a company-wide hackathon where the team collected all the logs across over 3,000 virtual machines with Loki. “We wanted to implement everything from zero overnight and it worked!" says Daneliukas. “We had a single Loki instance that was collecting OS-level logs from every single virtual machine. You could just open up a dashboard, select your machine, grab your logs. That was just astonishing. Now we have it in our roadmaps to implement it.”
The Adform DevOps team is also considering doing a PoC of Grafana Mimir to continue building out an easy-to-use observability experience within their growing Grafana stack.
We like things being a seamless, unified experience.Our developers want that too. We never decided to build everything on top of Grafana. It just happened naturally. We’re not favoring Grafana, but somehow it ends up on the top of the list time and time again.
Linas Daneliukas, Devops Tech Lead, Adform
What started as an effort to provide monitoring as a service naturally evolved to include logging as a service, before adding tracing. Soon, Daneliukas and Dabašinskas can realize the end vision of offering observability as a service.
With the constant ebbs and flows of business and technology, Daneliukas and Dabašinskas are reflective about how they can empower Adform’s teams to realize the company’s ambitions. “We want to make our development team’s life as easy as possible so they can focus on their daily job of building client-facing applications successfully,” says Dabašinskas. “They need to have the right tools. By us providing monitoring and logging as a service, they can focus on what’s important.”