How Snyk, TripAdvisor, and Citibank use Grafana to effectively scale observability
It’s one thing to set up an observability strategy. But what’s it like to introduce and scale observability effectively across an organization?
In a wide-ranging conversation at ObservabilityCON 2021, three technical pros from Snyk, TripAdvisor, and Citibank joined Grafana Labs VP Global Solutions Engineering Steve Mayzak and — with more than 75 years experience between them — they shared the triumphs and turbulence in their respective observability journeys.
Buy vs. build
When it comes to the foundation of monitoring their systems, each expert started from a different vantage point before gravitating to Grafana.
“I’m a big fan of buy vs. build where you can get away with it,” said Crystal Hirschorn, Director of Engineering at developer security platform Snyk. With a small team supporting more than 150 developers, Crystal focused on building one observability platform instead of adopting tools from multiple vendors. “We started using Prometheus and Grafana, and we’ve recently started experimenting with Tempo.”
At TripAdvisor where Nathan Berkley serves as a Principal Software Engineer, the team started playing with Grafana in hopes to avoid the high cost of producing reusable dashboards and to adopt a tool with more interactive graphs and a user-friendly frontend.
“Since I was introduced to the stack, it’s become an increasingly central part of our strategy,” said Berkley, who is now shifting from on-prem to the cloud and also experimenting with Tempo and Loki. “We’re trying to modernize our stack and get out of the business of running things ourselves. That’s how we started interacting with Grafana Cloud, specifically to support some of the first adopters in our move to AWS.”
Michael Johnson, Global Head of Support Engineering and Digital Payments Support at Citibank, also reflected on the move to the cloud from the perspective of working with a legacy stack. “We’re architected in verticals. Each application team looks over their piece of the world, but from the support organization we need to look at the business process overall,” explained Johnson. “We have a lot of niche tools which support various technologies, and we’re using Grafana to collect all the data and see the end-to-end business flow in a single spot.”
Big tent vs. rip-and-repair
With Grafana’s “big tent” philosophy, it’s been easier for these three organizations to expand their monitoring capabilities as the number of technologies and various stacks contributing to the ecosystem grows.
“Rather than fight to set up integrations with each of those things into one repository, it’s been a lot easier to just add that data source to Grafana and let the data live where it wants to live,” said Berkley. “We could spend effort to get [the data] all out into Prometheus, or we could save time and only pay to store it once.”
Johnson also favors a big tent approach over a rip-and-repair model: “We’ve been investing in a lot of our tools for years. There are a lot of costs to move away from them, so we’ve been looking to send a lot of our data to Grafana.”
The effects go beyond data, too. “[Grafana] definitely has changed how the teams are working with each other,” says Johnson. “Before we would have verticals for application diagrams. Now it’s more horizontal with the flow of the application itself, and folks are thinking about what they have to look at upstream and downstream. It’s been an interesting transition to see a lot of these teams work across from each other.”
How does the newfound collaboration propel innovation? What is the future of open telemetry? What are the tools the panelists are most excited to explore? (Two words: Machine learning.) And how do SLOs, SLAs, and SLIs play into what’s next?
The panel went on to answer all these questions and reflect on how they hope their strategies push their businesses forward and establish observability as a first-class citizen in their enterprise.
“We’re at the front door of that right now,” said Johnson. “I think the journey is there to make it all the way to observability end-to-end.”
To learn more about how observability works at Snyk, TripAdvisor, and Citibank and to hear what the panelists really think about shift left observability trends, check out the full ObservabilityCON session. All our sessions from ObservabilityCON 2021 are now available on demand.