Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

How the growing Grafana Observability team restructured themselves successfully

How the growing Grafana Observability team restructured themselves successfully

2022-05-13 9 min

Over the past year, Grafana Labs has grown from 300 to 700 Grafanistas. Moving forward, we expect to continue to maintain a high rate of change, and to sustain that, we need to ensure there is flexibility in how our teams* are set up. 

The majority of our Engineering squads have changed in size and structure — and the same goes for the Grafana Observability team, where I work. In the past several months alone, my group has restructured twice, doubled in size from six to 12 (which resulted in double the responsibilities), then split into three squads. And we’ve been onboarding new Grafanistas along the way. I am a senior engineering manager on the team (check this out to learn more about me), and in this story, I am a facilitator of change.

The last time we restructured our team, six months ago, many members were unsatisfied by the change process. We were left with a lot of uncertainty, and even though we were invested in the purpose of the change, the results took a while to achieve. That is why it was important for us to do better the next time. And we did!

In this post, I’m going to share the approach we took for our most recent restructure, which was three months ago, and the results that shortly followed.

*For the purpose of this post, a team is defined as a collection of people who all share the same purpose/vision; a squad is a collection of people working together on the same project.

Who we are

The Grafana Observability team’s mission is to build workflows for effective troubleshooting on top of Grafana by making it easy to correlate telemetry signals — primarily logs, metrics, and traces. We are responsible for query and results experience, linking data sources, observability core data sources (OSS and Enterprise), and Explore.

Our challenge

By the end of 2021, we had doubled in size and responsibility due to departmental restructuring. The reason behind that was to increase job satisfaction and engagement for all our colleagues so that we’d be energized and excited for the future. Members of the Observability team were focused on sharing knowledge, defining new ways of working, and establishing our new brand.

Very early on, it became obvious that the team was facing some challenges due to our size — 12-person stand-ups are no joke, even when they’re virtual — and the variety of responsibilities we had: plugins, o11y workflow, Explore, and more.

Some issues that we observed were:

  • Lengthy squad meetings
  • Reduced value gain/add as a collective
  • More async communication, which reduced engagement
  • Increased WIP (work in progress)
  • Reduced collaboration
  • Increased context switching as a result a high variability in work

These became a common subject of conversation so, we needed to do something about them.

Our mission and approach

Our main goals in tackling our team’s challenges were:

  • Invest more in observability high-value areas, such as o11y workflow, Explore, and other high priority roadmap items
  • Maintain a low level of dependencies and a high level of autonomy
  • Define clear purpose, roles, and responsibilities for each squad to help with job satisfaction and team performance
  • Maintain high quality in our projects
  • Maintain good customer support levels through timely resolution of issues, escalations, and requests

In the long term, we also wanted to invest more in community engagement and contribution. Working as part of the Grafana community is one of the most exciting aspects of our jobs, and the feedback is invaluable.

Rather than having one leader come up with changes on their own and then expect us to implement them, our goal was to work together as a team to thoughtfully find solutions.

Below are the steps we took. The process took about two and a half months from start to implementation.

Step 1: Define the problem and goals

As with any complex problem, we started by drafting a design doc. This way, we could ensure easy async contribution from all team members and everyone would have input. We started by defining the problem and our goals before diving deep into our solution options. 

The general theme of our proposals was to divide into two or three squads. What differed was the way to split the work. Autonomy is something we’re very passionate about at Grafana Labs, and the latest virtual offsite enabled us to explore what it meant to us. We all agreed that we wanted clear responsibilities in each squad, with well-defined focus areas to invest in and a clear purpose. It was easy to say but hard to achieve for a team with approximately 30 responsibility areas.

Step 2: Use a whiteboard

To facilitate the brainstorming, we used a virtual whiteboard to first define all our responsibilities, then generate different ways to group them. Everyone in the team was invited to contribute, and we even invited higher-level leadership to take part to ensure we analyzed the problem from all angles.

Step 3: Explore options

We ended up exploring 10 proposals, some stronger than others. Naturally, when analyzing pros and cons, we started discounting the options with a low value/effort index. We did this through async collaboration on the design doc via comments and chats. We ended up bringing three proposals into a final session.

Step 4: Take a final vote

In this synchronous voting session, we discussed each of the available solutions — which were defined on the virtual whiteboard — and each person voted for the one they preferred. We slowly made eliminations until the list was down to two remaining favorites. We collectively decided to split into three squads: Observability Metrics, Observability Logs & Traces, and Observability Experience (Explore and more).

Great!  . . .  But now what?

What was left was to place people in squads, review and refine ways of working, refine our backlogs, review our quarter commitments, and gather feedback. Easy!

Step 5: Create squads 

For this purpose, we put together a form where each individual got to rank their squad preference. A solution-driven individual like me might say, “This will never work, you can’t please everyone” — and we didn’t. We ended up pleasing 90% of our colleagues by placing them in their top choice, and 10% in their second. 

This highlighted three things for me: I’m lucky we didn’t have too much conflict; trusting my team to come up with ideas for how to address their problems pays off; people were already gravitating towards the three focus areas we selected for our new structure. Win, win, win!

Step 6: Focus on ways of working

What I’m talking about here are team values and policies, processes, tools, and everything in between. This is a constant work in progress, but to begin, we reviewed what we had in place for the larger squad and agreed on the basics: responsibilities, ceremonies, starting processes, when to review and improve.

We acknowledged that this list is just a jumping off point, expecting both our ways of working and structure to evolve with time. After all, Grafana Labs keeps growing and we are keen to grow with it. 🪴

Our results

Since we’ve reconfigured the team and taken steps to address our challenges, we’ve observed the following outcomes:

  • Higher focus and engagement in squad ceremonies
  • Freshly energized members in roadmap conversations
  • Clearer career progression for members
  • Higher performance as a result of the shared goal and opportunity to focus on what’s most important
  • Increased investment in knowledge share within each squad
  • Focused investment in Explore and other key areas
  • Accelerated delivery of key features (watch out for GrafanaCon 2022 announcements!)

We’ve definitely achieved our goals and learned a lot in the process.

Below you can see a diagram of our new structure. We complemented this with a design doc that highlights the process we followed, proposals analysis, and async discussions we had in order to ensure transparency in regards to this change now and in the future. Additionally, we drafted a new squad handbook for each of our three new squads, detailing their roles, responsibilities, and ways of working.

Diagram of the new structure
Diagram of the new structure

When it came to our team’s work in progress, we outlined a simple strategy about how to handle transferring work between squads in cases where responsibilities changed. We agreed that work goes with the current owner, and inter-squad handover will be completed at the first natural increment. Our backlog, meanwhile, was pretty easy to refine, and within 1-2 weeks we had also agreed upon big-picture commitments for the next quarter. That may sound too good to be true, but the smoothness of it all demonstrates how the new structure is a natural fit for our team.

Feedback

During our restructuring process, the team members submitted feedback through online forms twice: right after the final decision session, and a few weeks after the transition to the new structure.

Team survey results
Team survey results

Initial feedback indicated that 90% of team members were happy/very happy with the approach. Comments included, “Companies need to remain agile as they grow, and I support Grafana’s growth” and “I love the collaborative approach to deciding our team’s structure and future.”

A few weeks after we’d applied the changes and had been working in our new squads, the data showed that 100% of our colleagues were happy/very happy with the current structure, and 100% were happy/very happy with the approach we took for this restructure.

Compared to our last restructure, which received only 60% satisfactory scores, I’d say this is pretty good :)

Where we are now

The Grafana Observability team is more energized than ever. We’ve become more collaborative in our work, and long-term roadmap conversations are much more engaging. We’ve drastically revamped our ways of working in some squads (hello, mob PR reviews and regular backlog grooming), and we’ve introduced more opportunities for leadership and growth by opening up two Technical Leader opportunities for the two additional squads. Our goals to be more involved in the community seem much more realistic, and we’re already planning for the next big opportunities to grow the team and invest in more key features.

Lessons learned

If your own team is running into challenges due to its size and structure — or anything else — don’t be afraid to make changes. Here are the simple tips I’d recommend you keep in mind:

  • Trust your team 👐 They know better than anyone what the problems are and what you should explore
  • Take your time Compared to what we did, formulating a restructuring plan at the leadership level would have been faster, but it could have impacted team member engagement and likely would not have achieved such high satisfaction scores.
  • Ask for feedback during the process
  • Keep improving

TL;DR We’re not saying we have discovered THE solution to team restructures, but in many scenarios, working with the impacted team to define the new structure will result in less disruption, better engagement, and valuable feedback. I call it, “Leading with people.” ✌️

Learn more about the Grafana Labs team and our remote-first culture, and find out how you can become an official Grafanista by visiting our careers page.