Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Grafana Agent Operator architecture
Grafana Agent Operator works by watching for Kubernetes custom resources that specify how to collect telemetry data from your Kubernetes cluster and where to send it. Agent Operator manages corresponding Grafana Agent deployments in your cluster by watching for changes against the custom resources.
Grafana Agent Operator works in two phases—it discovers a hierarchy of custom resources and it reconciles that hierarchy into a Grafana Agent deployment.
Custom resource hierarchy
The root of the custom resource hierarchy is the GrafanaAgent
resource—the primary resource Agent Operator looks for. GrafanaAgent
is called the root because it
discovers other sub-resources, MetricsInstance
and LogsInstance
. The GrafanaAgent
resource endows them with Pod attributes defined in the GrafanaAgent specification, for example, Pod requests, limits, affinities, and tolerations, and defines the Grafana Agent image. You can only define Pod attributes at the GrafanaAgent
level. They are propagated to MetricsInstance and LogsInstance Pods.
The full hierarchy of custom resources is as follows:
GrafanaAgent
MetricsInstance
PodMonitor
Probe
ServiceMonitor
LogsInstance
PodLogs
The following table describes these custom resources:
Custom resource | description |
---|---|
GrafanaAgent | Discovers one or more MetricsInstance and LogsInstance resources. |
MetricsInstance | Defines where to ship collected metrics. This rolls out a Grafana Agent StatefulSet that will scrape and ship metrics to a remote_write endpoint. |
ServiceMonitor | Collects cAdvisor and kubelet metrics. This configures the MetricsInstance / Agent StatefulSet |
LogsInstance | Defines where to ship collected logs. This rolls out a Grafana Agent DaemonSet that will tail log files on your cluster nodes. |
PodLogs | Collects container logs from Kubernetes Pods. This configures the LogsInstance / Agent DaemonSet. |
Most of the Grafana Agent Operator resources have the ability to reference a ConfigMap or a Secret. All referenced ConfigMaps or Secrets are added into the resource hierarchy.
When a hierarchy is established, each item is watched for changes. Any changed
item causes a reconcile of the root GrafanaAgent
resource, either
creating, modifying, or deleting the corresponding Grafana Agent deployment.
A single resource can belong to multiple hierarchies. For example, if two
GrafanaAgents
use the same Probe, modifying that Probe causes both
GrafanaAgents
to be reconciled.
To set up monitoring, Grafana Agent Operator works in the following two phases:
- Builds (discovers) a hierarchy of custom resources.
- Reconciles that hierarchy into a Grafana Agent deployment.
Agent Operator also performs sharding and replication and adds labels to every metric.
How Agent Operator builds the custom resource hierarchy
Grafana Agent Operator builds the hierarchy using label matching on the custom resources. The following figure illustrates the matching. The GrafanaAgent
picks up the MetricsInstance
and LogsInstance
that match the label instance: primary
. The instances pick up the resources the same way.
To validate the Secrets
The generated configurations are saved in Secrets. To download and validate them manually, use the following commands:
$ kubectl get secrets <???>-logs-config -o json | jq -r '.data."agent.yml"' | base64 --decode
$ kubectl get secrets <???>-config -o json | jq -r '.data."agent.yml"' | base64 --decode
How Agent Operator reconciles the custom resource hierarchy
When a resource hierarchy is created, updated, or deleted, a reconcile occurs.
When a GrafanaAgent
resource is deleted, the corresponding Grafana Agent
deployment will also be deleted.
Reconciling creates the following cluster resources:
- A Secret that holds the Grafana Agent configuration is generated.
- A Secret that holds all referenced Secrets or ConfigMaps from the resource hierarchy is generated. This ensures that Secrets referenced from a custom resource in another namespace can still be read.
- A Service is created to govern the StatefulSets that are generated.
- One StatefulSet per Prometheus shard is created.
PodMonitors, Probes, and ServiceMonitors are turned into individual scrape jobs which all use Kubernetes Service Discovery (SD).
Sharding and replication
The GrafanaAgent resource can specify a number of shards. Each shard results in the creation of a StatefulSet with a hashmod + keep relabel_config per job:
- source_labels: [__address__]
target_label: __tmp_hash
modulus: NUM_SHARDS
action: hashmod
- source_labels: [__tmp_hash]
regex: CURRENT_STATEFULSET_SHARD
action: keep
This allows for horizontal scaling capabilities, where each shard will handle roughly 1/N of the total scrape load. Note that this does not use consistent hashing, which means changing the number of shards will cause anywhere between 1/N to N targets to reshuffle.
The sharding mechanism is borrowed from the Prometheus Operator.
The number of replicas can be defined, similarly to the number of shards. This
creates deduplicate shards. This must be paired with a remote_write
system that
can perform HA deduplication. Grafana Cloud and Mimir provide this out of the
box, and the Grafana Agent Operator defaults support these two systems.
The total number of created metrics pods will be the product of numShards * numReplicas
.
Added labels
Two labels are added by default to every metric:
cluster
, representing theGrafanaAgent
deployment. Holds the value of<GrafanaAgent.metadata.namespace>/<GrafanaAgent.metadata.name>
.__replica__
, representing the replica number of the Agent. This label works out of the box with Grafana Cloud and Cortex’s HA deduplication.
The shard number is not added as a label, as sharding is designed to be transparent on the receiver end.