How Unity migrated to Grafana Mimir and scaled its metrics backend to 150 million active series
Company: Unity
Industry: Media & Entertainment (video gaming)
Copenhagen-based Unity is a leader in real-time 3D content creation. It offers game development and multiplayer services, and applications built with its software.
Challenge
Unity lacked a centralized monitoring solution. Teams chose monitoring tools independently, resulting in the deployment of multiple observability tools, including various instances to support Prometheus and Grafana. This fragmentation led to significant operational overhead, requiring extensive training for engineers to ensure proper implementation of custom metrics.
Goal
Unity’s SRE team set out to establish a unified observability framework to streamline monitoring across all services, reduce operational overhead, and improve scalability and resource utilization.
Solution
After adopting multiple open source tools, including Kubernetes and Prometheus, Unity transitioned to Grafana Mimir in 2023, which significantly improved scalability and performance.
Impact
- Scalability and performance: The migration to Mimir helped Unity manage approximately 200 million time series and ingest 6 million samples per second across 80 tenants. The process was fairly straightforward, and any issues they ran into were quickly resolved.
- Reliability and advanced features: Mimir’s out-of-order sample ingestion, OpenTelemetry support, and native histograms for Prometheus improved data management, while transitioning to an auth-token service streamlined team onboarding across environments.
- Optimized logging and tracing: Their adoption of Grafana Loki and Grafana Tempo improved log and trace correlation, respectively. And by using their Grafana data source controller, Unity could quickly create separate data sources for each tenant, facilitating easier access to metrics, logs, and traces.
“Migrating to Mimir offered better scalability and performance based on our testing at lower resource cost, which is a really good thing to have and the migration was straightforward.”
Lukas Monkevicius, Senior Site Reliability Engineer
What’s next
Going forward, they plan to increase adoption of Grafana projects and features like Grafana Alloy and native histograms, further enhancing operational efficiency and user experience across diverse tenant environments.
Conclusion
The migration to Mimir significantly improved scalability and performance for Unity’s observability platform. And integrating Loki and Tempo has streamlined signal correlation so they can resolve issues faster and manage data more efficiently.
Speakers
Resources
Get started with the Grafana Cloud Free tier →- 10k metrics
- 50GB logs
- 50GB traces
- 50GB profiles
- and more