AI/ML

AI-based tools for better observability

Grafana Cloud’s AI-powered tools and machine learning capabilities help teams automate tedious tasks, identify anomalies faster, and optimize costs and engineering hours.
The actually useful Grafana Cloud Free plan
  • 50GB traces
  • 10k metrics
  • 50GB logs of telemetry
  • 14-day retention
  • 3 active users
Reduce toil icon

Reduce toil

Reduce repetitive manual tasks with machine learning to minimize the toil of maintaining healthy services.

Minimize time icon

Minimize time

Get faster incident response times with automated checks to help identify anomalies and automated workflows to correlate issues.

Save money icon

Save money

Cut down on the added cost of scale with AI-powered tools like Adaptive Metrics, which aggregates unused or partially used metrics.

Reduce metrics volume to reduce costs

Easily identify if metrics are unused or partially used and aggregate them into lower cardinality versions with Adaptive Metrics. On average, companies have been able to reduce metrics costs by 35%.

  • Receive daily updated recommendations that adapt to your usage patterns
  • Apply or skip aggregations as needed with rule management directly in the UI
  • Identify and isolate critical metrics that can be excluded from aggregations
Architecture diagram for Adaptive Metrics

Sift investigations UI in Grafana Cloud

Improve MTTR

Sift, Grafana Cloud’s diagnostic assistant, leverages Grafana Machine Learning to filter through data and identify the most critical incident details like error patterns or Kubernetes container failures

  • Get automated checks – from error pattern logs and slow requests to Kube crashes and noisy neighbors – that make analyzing your telemetry during investigations easy
  • Find log error explanations that can help guide incident response
  • Automatically detect and correlate anomalies across application and infrastructure signals with Asserts.

AI observability for your stack

AI Observability in Grafana Cloud helps teams monitor and optimize generative AI applications with insights into LLMs, vector databases, and GPUs.

  • Monitor user interactions with LLMs, such as prompts and completions, to better understand user intent and model performance.
  • Track and visualize token usage for LLM interactions to optimize resource allocation.
  • Monitor and analyze costs associated with LLMs in real time for effective budget management, forecasting, and cost-saving.
  • Capture metadata for each LLM request, including request parameters, response times, model versions, and other details.
  • Track the latency of each request to identify bottlenecks and enable prompt issue resolution.
  • Monitor the performance of your vector database query response times and throughput to ensure efficient processing and retrieval of vector data queries.
Visualizations for AI Observability in Grafana Cloud

Visualization of outlier detection in Grafana Cloud

Predict and detect anomalies and outliers

Grafana Cloud’s machine learning tools simplify complex systems and improve operational efficiency.


Run applications better and faster

Replace tedious tasks with GenAI features that reduce toil.

Flame graph AI workflow demo

AI/ML innovation at Grafana Labs

Grafana Labs integrates observability with advanced AI and machine learning through its unique approach to innovation: developing in the open, building AI support into open source solutions, and offering zero vendor lock-ins. Developers can easily build AI/ML-powered experiences using Grafana’s LLM (Large Language Model) plugin and leveraging its open source libraries and examples.

As LLMs become more prevalent in customers’ day-to-day work, Grafana Labs is also developing various ways to monitor and observe LLMs. Our engineers have explored ways to use open source tools to observe LLMs, including the OpenLIT SDK. This SDK produces OpenTelemetry traces and metrics for the LLM calls including latency, cost, and the number of tokens generated. The data can be sent to Grafana Cloud (via an integration that is open source and available on GitHub) and visualized using the AI observability solution. The team also built GPU monitoring using eBPF, which helps AI developers get fine-grained information about their workloads without manual instrumentation. This is currently available as a branch in Grafana Beyla, Grafana Labs’ open source eBPF project.


“When we use Adaptive Metrics, I just have to download the recommendations, run a script, apply it, and I’m done. That’s maybe five minutes that I can apply the changes compared to the few hours I was spending trying to deploy to Prometheus. It’s an instant fix that we can implement and not worry about affecting anyone.”
Lydia Clarke
DevOps EngineerChief, Sailpoint

To use the AI-based tools in Grafana Cloud, you have three options:

Grafana Cloud Free

No payment. Ever.
  • 50GB profiles.
  • 3 active monthly users.
  • Always free. Never expires. Upgrade at any time.
Easiest way to get started

Grafana Cloud Pro

Pay as you go
Best suited for growing teams that need to scale above 50 GBs of profiles ingest per month, access 30 days of retention, and unlock 8x5 support.

Grafana Cloud Pro

Premium bundle
Best suited for teams that need to scale above 50 GB of profiles ingest per month, access custom retention, and unlock 24x7 support.