Monitor an app with Kubernetes Monitoring

Introduction

This tutorial shows how to deploy an instrumented three-tier (data layer, app logic layer, load-balancing layer) web application into a Kubernetes cluster, and leverage Grafana Cloud’s built-in Kubernetes Monitoring feature to monitor the application.

In this tutorial, you will:

Deploy the TNS sample app into your Kubernetes cluster.
Deploy a prebuilt dashboard to your Grafana Cloud instance to visualize the app’s performance metrics.
Roll out Grafana Agents to collect metrics, logs, and events from the Kubernetes cluster.
Configure these agents to collect metrics, logs, and traces (including exemplars) from the TNS app.
Learn how to navigate from metrics, to logs, to traces and back using Grafana’s powerful correlation features.
Learn how to use the Kubernetes Cluster Navigator to explore your cluster’s running workloads, jumping from Pods to dashboards and logs.

Before you begin

To complete this tutorial, you need access to:

A Kubernetes, K3s, or OpenShift cluster
A Grafana Cloud stack, optionally with exemplar support enabled.

Deploy and configure Grafana Agent

Follow this procedure to roll out Grafana Agent into your Kubernetes cluster. To make deploying Agent easier, Grafana Cloud provides preconfigured manifests for you to download and modify.

To deploy Grafana Agent:

Navigate to your Grafana Cloud instance.
Click Observability in the left-side menu and select Kubernetes.
Click Start sending data, then click Install dashboards and alert rules to install the prebuilt set of Kubernetes dashboards and alerts.
Click Agent configuration instructions and check that the items listed under Prerequisites are met.
Under Metrics & Events, replace default in the Namespace field with your namespace.
The Agent ConfigMap is updated with your namespace.
Click Copy to clipboard and paste the Agent ConfigMap in an editor.
By default, the ConfigMap only scrapes cluster metrics endpoints like the /cadvisor and /kubelet endpoints. You need to configure the ConfigMap to scrape /metrics endpoints of Pods deployed in your cluster by following the next step.

In the ConfigMap, add the following scrape job stanza:

. . .
          relabel_configs:
              - action: keep
                regex: kube-state-metrics
                source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_name
        # New scrape job below
        - job_name: integrations/kubernetes/pod-metrics
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - action: drop
              regex: kube-state-metrics
              source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_name
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: namespace
            - source_labels: [__meta_kubernetes_pod_name]
              action: replace
              target_label: pod
            - source_labels: ['__meta_kubernetes_namespace', '__meta_kubernetes_pod_label_name']
              action: 'replace'
              separator: '/'
              target_label: 'job'
              replacement: '$1'
            - source_labels: ['__meta_kubernetes_pod_container_name']
              action: 'replace'
              target_label: 'container'
. . .

This scrape job attempts to scrape all containers (and ports) running in your cluster at /metrics, drops any kube-state-metrics (since you’re already picking these up in another scrape job), and performs some relabeling (setting job, pod, namespace, etc. labels).

To learn more about configuring scrape jobs, see the Prometheus scrape config documentation. You can adjust this generic catchall stanza to restrict scraping to a given namespace, workload label, drop additional metrics, and more using different config directives.

Configure Grafana Agent to send exemplars to Grafana Cloud. To do this, add the following to the ConfigMap:

. . .
      configs:
      - name: integrations
        remote_write:
        - url: <your_prometheus_metrics_endpoint>
          basic_auth:
            username: <your_prometheus_metrics_user>
            password: <your_prometheus_metrics_api_key>
          # Add the following line
          send_exemplars: true
. . .

You must contact Support to enable exemplars in your Grafana Cloud instance.

Deploy ConfigMap into your cluster.
Follow the remaining steps in the K8s Monitoring instructions to deploy the following into the namespace that you specified:
- An Agent StatefulSet
- kube-state-metrics
- An Agent ConfigMap & DaemonSet to tail container logs

Deploy the Agent to collect traces by following the Ship Kubernetes traces using Grafana Agent guide.

Be sure to fill in the required remote_write credentials. Your Tempo endpoint URL should look something like tempo-us-central1.grafana.net:443.

Important: Without an Agent to collect traces, the demo app will not start, so be sure to complete this step.

When you have finished deploying the telemetry collectors, your running K8s Pods should look something like this:

NAME                                      READY   STATUS    RESTARTS   AGE
grafana-agent-0                           1/1     Running   0          3m
grafana-agent-logs-lcpjd                  1/1     Running   0          2m44s
grafana-agent-logs-pc9sp                  1/1     Running   0          2m44s
grafana-agent-logs-qtjzq                  1/1     Running   0          2m44s
grafana-agent-traces-7775575d6d-qcrmq     1/1     Running   0          21s
ksm-kube-state-metrics-58ccd7456c-487c9   1/1     Running   0          2m50s

Deploy the TNS app

With the telemetry collectors up and running, you can now deploy the TNS demo app into your Kubernetes cluster. The TNS GitHub repository contains Kubernetes manifests (written in Jsonnet) that deploys the app’s required components. You do not need to learn Jsonnet to follow this guide and to inspect the manifests and code before deploying the components. The repo also contains more information about the app and how it has been instrumented to work with Grafana Cloud. It also contains the app source code.

To deploy the TNS app:

To deploy the app, run the following command:
bash
```
kubectl apply -f https://raw.githubusercontent.com/grafana/tns/main/production/k8s-yamls-cloud/app-full.yaml
```
You can inspect the YAML manifests before deploying the app into your cluster. This deploys the app Pods and Services into the tns-cloud namespace of your cluster and creates the namespace if it doesn’t exist.
Inspect deployment status using kubectl:
bash
```
 kubectl get all -n tns-cloud
```

Forward a port to a local web browser:

kubectl port-forward -n tns-cloud service/app 8080:80

Navigate to https://localhost:8080 in your web browser to see the demo app in action.

With the instrumented demo app and load generator up and running, you can now navigate to Grafana Cloud to query your app logs, visualize its metrics, and inspect its trace data.

Correlate metrics, logs, and traces

At this point, the demo app and load generator are instrumented, up and running, and your telemetry collectors are forwarding metrics, log, trace, and event data to Grafana Cloud. Before exploring some of Grafana Cloud’s built-in features, install a custom prebuilt app dashboard that demonstrates some core Grafana features.

To correlate telemetry data:

In your Grafana instance, click Dashboards.
Click New and select Import in the dropdown.
Enter 16491 in the ID field, and click Load.
Click Import to import the dashboard.
Navigate to the dashboard.
The following dashboard displays:
{{% admonition type=“note” %}} If you do not see the yellow dots (exemplars), make sure that exemplars are enabled. {{% /admonition %}} For more information, refer to Configure Grafana Agent.
Click an exemplar to jump to a trace for a particularly slow request:
From here, jump to logs to view the problematic span:
Click Show context on a log line to see the surrounding log context.

Navigate between metrics, logs, and traces

The ability to jump from metrics, to traces, to logs, and back is an extremely powerful feature that helps you quickly resolve production issues and reduce mean time to recovery (MTTR). You can also jump to logs from metrics using the Explore view.

To navigate between metrics, logs, and traces:

Enter the following PromQL query, ensuring that you select the correct Prometheus data source:
```
sum by (status_code) (rate(tns_request_duration_seconds_count{job=~"tns-cloud/app.*"}[$__rate_interval]))
```
The following graph displays:
Click Split at the top of the window, and select the correct Loki data source in the dropdown:
Grafana carries over the labels selected in the PromQL query and pre-populates a LogQL query with the same labels, so that you can quickly jump from Prometheus metrics queries to the corresponding Loki logs data relevant to the metrics graph you’re analyzing:

To learn more about these features, see the following videos:

Navigate cluster workloads

Another way to explore your Kubernetes workloads is to use the cluster navigation feature.

To navigate cluster workloads:

In Grafana, click Observability in the left-side menu and select Kubernetes.
The Namespaces view displays, where you can navigate clusters.
Click into a particular namespace to begin exploring workloads.
From here, you can see any firing alerts, the health of running Pods, and can further drill down into a given ReplicaSet.
Click into a Pod to quickly see additional Pod info, its logs, and latest Kubernetes cluster events.
Click View compute resources on the top right of the screen to quickly jump to the Pod’s resource usage dashboard:

Summary

Instead of copy and pasting Pod names and labels from a terminal and running kubectl get, kubectl describe, and kubectl logs, you can now navigate and jump straight to the relevant observability data, all in Grafana Cloud.

To learn more about Grafana Cloud Kubernetes monitoring, see the Grafana Kubernetes Monitoring documentation.

Feedback

Relevant sources: