Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Deploy Operator resources
To start collecting telemetry data, you need to roll out Grafana Agent Operator custom resources into your Kubernetes cluster. Before you can create the custom resources, you must first apply the Agent Custom Resource Definitions (CRDs) and install Agent Operator, with or without Helm. If you haven’t yet taken these steps, follow the instructions in one of the following topics:
Follow the steps in this guide to roll out the Grafana Agent Operator custom resources to:
- Scrape and ship cAdvisor and kubelet metrics to a Prometheus-compatible metrics endpoint.
- Collect and ship your Pods’ container logs to a Loki-compatible logs endpoint.
The hierarchy of custom resources is as follows:
GrafanaAgent
MetricsInstance
PodMonitor
Probe
ServiceMonitor
LogsInstance
PodLogs
To learn more about the custom resources Agent Operator provides and their hierarchy, see Grafana Agent Operator architecture.
Note
Agent Operator is currently in beta and its custom resources are subject to change.
Before you begin
Before you begin, make sure that you have deployed the Grafana Agent Operator CRDs and installed Agent Operator into your cluster. See Install Grafana Agent Operator with Helm or Install Grafana Agent Operator for instructions.
Deploy the GrafanaAgent resource
In this section, you’ll roll out a GrafanaAgent
resource. See Grafana Agent Operator architecture for a discussion of the resources in the GrafanaAgent
resource hierarchy.
Note
Due to the variety of possible deployment architectures, the official Agent Operator Helm chart does not provide built-in templates for the custom resources described in this guide. You must configure and deploy these manually as described in this section. We recommend templating and adding the following manifests to your own in-house Helm charts and GitOps flows.
To deploy the GrafanaAgent
resource:
Copy the following manifests to a file:
apiVersion: monitoring.grafana.com/v1alpha1 kind: GrafanaAgent metadata: name: grafana-agent namespace: default labels: app: grafana-agent spec: image: grafana/agent:v0.43.4 integrations: selector: matchLabels: agent: grafana-agent-integrations logLevel: info serviceAccountName: grafana-agent metrics: instanceSelector: matchLabels: agent: grafana-agent-metrics externalLabels: cluster: cloud logs: instanceSelector: matchLabels: agent: grafana-agent-logs --- apiVersion: v1 kind: ServiceAccount metadata: name: grafana-agent namespace: default --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: grafana-agent rules: - apiGroups: - "" resources: - nodes - nodes/proxy - nodes/metrics - services - endpoints - pods - events verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses verbs: - get - list - watch - nonResourceURLs: - /metrics - /metrics/cadvisor verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: grafana-agent roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: grafana-agent subjects: - kind: ServiceAccount name: grafana-agent namespace: default
In the first manifest, the
GrafanaAgent
resource:- Specifies an Agent image version.
- Specifies
MetricsInstance
andLogsInstance
selectors. These search forMetricsInstances
andLogsInstances
in the same namespace with labels matchingagent: grafana-agent-metrics
andagent: grafana-agent-logs
, respectively. - Sets a
cluster: cloud
label for all metrics shipped to your Prometheus-compatible endpoint. Change this label to your cluster name. To search forMetricsInstances
orLogsInstances
in a different namespace, use theinstanceNamespaceSelector
field. To learn more about this field, see theGrafanaAgent
CRD specification.
Customize the manifests as needed and roll them out to your cluster using
kubectl apply -f
followed by the filename.This step creates a
ServiceAccount
,ClusterRole
, andClusterRoleBinding
for theGrafanaAgent
resource.Deploying a
GrafanaAgent
resource on its own does not spin up Agent Pods. Agent Operator creates Agent Pods onceMetricsInstance
andLogsIntance
resources have been created. Follow the instructions in the Deploy a MetricsInstance resource and Deploy LogsInstance and PodLogs resources sections to create these resources.
Disable feature flags reporting
To disable the reporting usage of feature flags to Grafana, set disableReporting
field to true
.
Disable support bundle generation
To disable the support bundles functionality, set the disableSupportBundle
field to true
.
Deploy a MetricsInstance resource
Next, you’ll roll out a MetricsInstance
resource. MetricsInstance
resources define a remote_write
sink for metrics and configure one or more selectors to watch for creation and updates to *Monitor
objects. These objects allow you to define Agent scrape targets via Kubernetes manifests:
To deploy a MetricsInstance
resource:
Copy the following manifest to a file:
apiVersion: monitoring.grafana.com/v1alpha1 kind: MetricsInstance metadata: name: primary namespace: default labels: agent: grafana-agent-metrics spec: remoteWrite: - url: your_remote_write_URL basicAuth: username: name: primary-credentials-metrics key: username password: name: primary-credentials-metrics key: password # Supply an empty namespace selector to look in all namespaces. Remove # this to only look in the same namespace as the MetricsInstance CR serviceMonitorNamespaceSelector: {} serviceMonitorSelector: matchLabels: instance: primary # Supply an empty namespace selector to look in all namespaces. Remove # this to only look in the same namespace as the MetricsInstance CR. podMonitorNamespaceSelector: {} podMonitorSelector: matchLabels: instance: primary # Supply an empty namespace selector to look in all namespaces. Remove # this to only look in the same namespace as the MetricsInstance CR. probeNamespaceSelector: {} probeSelector: matchLabels: instance: primary
Replace the
remote_write
URL and customize the namespace and label configuration as necessary.This step associates the
MetricsInstance
resource with theagent: grafana-agent
GrafanaAgent
resource deployed in the previous step. TheMetricsInstance
resource watches for creation and updates to*Monitors
with theinstance: primary
label.Once you’ve rolled out the manifest, create the
basicAuth
credentials using a Kubernetes Secret:apiVersion: v1 kind: Secret metadata: name: primary-credentials-metrics namespace: default stringData: username: 'your_cloud_prometheus_username' password: 'your_cloud_prometheus_API_key'
If you’re using Grafana Cloud, you can find your hosted Loki endpoint username and password by clicking Details on the Loki tile on the Grafana Cloud Portal. If you want to base64-encode these values yourself, use data
instead of stringData
.
Once you’ve rolled out the MetricsInstance
and its Secret, you can confirm that the MetricsInstance
Agent is up and running using kubectl get pod
. Since you haven’t defined any monitors yet, this Agent doesn’t have any scrape targets defined. In the next section, you’ll create scrape targets for the cAdvisor and kubelet endpoints exposed by the kubelet
service in the cluster.
Create ServiceMonitors for kubelet and cAdvisor endpoints
Next, you’ll create ServiceMonitors for kubelet and cAdvisor metrics exposed by the kubelet
service. Every Node in your cluster exposes kubelet and cAdvisor metrics at /metrics
and /metrics/cadvisor
, respectively. Agent Operator creates a kubelet
service that exposes these Node endpoints so that they can be scraped using ServiceMonitors.
To scrape the kubelet and cAdvisor endpoints:
Copy the following kubelet ServiceMonitor manifest to a file, then roll it out in your cluster using
kubectl apply -f
followed by the filename.apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: instance: primary name: kubelet-monitor namespace: default spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 60s metricRelabelings: - action: keep regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total sourceLabels: - __name__ port: https-metrics relabelings: - sourceLabels: - __metrics_path__ targetLabel: metrics_path - action: replace targetLabel: job replacement: integrations/kubernetes/kubelet scheme: https tlsConfig: insecureSkipVerify: true namespaceSelector: matchNames: - default selector: matchLabels: app.kubernetes.io/name: kubelet
Copy the following cAdvisor ServiceMonitor manifest to a file, then roll it out in your cluster using
kubectl apply -f
followed by the filename.apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: instance: primary name: cadvisor-monitor namespace: default spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true honorTimestamps: false interval: 60s metricRelabelings: - action: keep regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total sourceLabels: - __name__ path: /metrics/cadvisor port: https-metrics relabelings: - sourceLabels: - __metrics_path__ targetLabel: metrics_path - action: replace targetLabel: job replacement: integrations/kubernetes/cadvisor scheme: https tlsConfig: insecureSkipVerify: true namespaceSelector: matchNames: - default selector: matchLabels: app.kubernetes.io/name: kubelet
These two ServiceMonitors configure Agent to scrape all the kubelet and cAdvisor endpoints in your Kubernetes cluster (one of each per Node). In addition, it defines a job
label which you can update (it is preset here for compatibility with Grafana Cloud’s Kubernetes integration). It also provides an allowlist containing a core set of Kubernetes metrics to reduce remote metrics usage. If you don’t need this allowlist, you can omit it, however, your metrics usage will increase significantly.
When you’re done, Agent should now be shipping kubelet and cAdvisor metrics to your remote Prometheus endpoint. To check this in Grafana Cloud, go to your dashboards, select Integration - Kubernetes, then select Kubernetes / Kubelet.
Deploy LogsInstance and PodLogs resources
Next, you’ll deploy a LogsInstance
resource to collect logs from your cluster Nodes and ship these to your remote Loki endpoint. Agent Operator deploys a DaemonSet of Agents in your cluster that will tail log files defined in PodLogs
resources.
To deploy the LogsInstance
resource into your cluster:
Copy the following manifest to a file, then roll it out in your cluster using
kubectl apply -f
followed by the filename.apiVersion: monitoring.grafana.com/v1alpha1 kind: LogsInstance metadata: name: primary namespace: default labels: agent: grafana-agent-logs spec: clients: - url: your_remote_logs_URL basicAuth: username: name: primary-credentials-logs key: username password: name: primary-credentials-logs key: password # Supply an empty namespace selector to look in all namespaces. Remove # this to only look in the same namespace as the LogsInstance CR podLogsNamespaceSelector: {} podLogsSelector: matchLabels: instance: primary
This
LogsInstance
picks upPodLogs
resources with theinstance: primary
label. Be sure to set the Loki URL to the correct push endpoint. For Grafana Cloud, this will look similar tologs-prod-us-central1.grafana.net/loki/api/v1/push
, however check the Grafana Cloud Portal to confirm by clicking Details on the Loki tile.Also note that this example uses the
agent: grafana-agent-logs
label, which associates thisLogsInstance
with theGrafanaAgent
resource defined earlier. This means that it will inherit requests, limits, affinities and other properties defined in theGrafanaAgent
custom resource.To create the Secret for the
LogsInstance
resource, copy the following Secret manifest to a file, then roll it out in your cluster usingkubectl apply -f
followed by the filename.apiVersion: v1 kind: Secret metadata: name: primary-credentials-logs namespace: default stringData: username: 'your_username_here' password: 'your_password_here'
If you’re using Grafana Cloud, you can find your hosted Loki endpoint username and password by clicking Details on the Loki tile on the Grafana Cloud Portal. If you want to base64-encode these values yourself, use
data
instead ofstringData
.Copy the following
PodLogs
manifest to a file, then roll it to your cluster usingkubectl apply -f
followed by the filename. The manifest defines your logging targets. Agent Operator turns this into Agent configuration for the logs subsystem, and rolls it out to the DaemonSet of logging Agents.Note
The following is a minimal working example which you should adapt to your production needs.apiVersion: monitoring.grafana.com/v1alpha1 kind: PodLogs metadata: labels: instance: primary name: kubernetes-pods namespace: default spec: pipelineStages: - docker: {} namespaceSelector: matchNames: - default selector: matchLabels: {}
This example tails container logs for all Pods in the
default
namespace. You can restrict the set of matched Pods by using thematchLabels
selector. You can also set additionalpipelineStages
and createrelabelings
to add or modify log line labels. To learn more about thePodLogs
specification and available resource fields, see the PodLogs CRD.The above
PodLogs
resource adds the following labels to log lines:namespace
service
pod
container
job
(set toPodLogs_namespace/PodLogs_name
)__path__
(the path to log files, set to/var/log/pods/*$1/*.log
where$1
is__meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name
)
To learn more about this configuration format and other available labels, see the Promtail Scraping documentation. Agent Operator loads this configuration into the
LogsInstance
agents automatically.
The DaemonSet of logging agents should be tailing your container logs, applying default labels to the log lines, and shipping them to your remote Loki endpoint.
Summary
You’ve now rolled out the following into your cluster:
- A
GrafanaAgent
resource that discovers one or moreMetricsInstance
andLogsInstances
resources. - A
MetricsInstance
resource that defines where to ship collected metrics. - A
ServiceMonitor
resource to collect cAdvisor and kubelet metrics. - A
LogsInstance
resource that defines where to ship collected logs. - A
PodLogs
resource to collect container logs from Kubernetes Pods.
What’s next
You can verify that everything is working correctly by navigating to your Grafana instance and querying your Loki and Prometheus data sources.
Tip: You can deploy multiple GrafanaAgent resources to isolate allocated resources to the agent pods. By default, the GrafanaAgent resource determines the resources of all deployed agent containers. However, you might want different memory limits for metrics versus logs.