About Linux server integration pre-built dashboards
The Linux server integration provides a variety of pre-built dashboards that you can use right away to begin troubleshooting issues. In this step of the journey, you’ll become familiar with these pre-built dashboards and learn how to use them to address various problems.
While this journey only focuses on configuring the Linux server integration on one machine, the following dashboard images shows the kinds of data you’ll see when you monitor multiple machines.
Did you know?
If you don’t see any logs or metrics, try switching the data source using the drop-down at the top of the dashboard.
This dashboard offers a comprehensive overview of the system, including:
- Metadata: Displays information such as the kernel version, OS release, uptime, and more.
- CPU utilization: Displays the current usage of the CPU.
- Memory utilization: Shows how much memory is being used.
- Disk utilization: Indicates the usage level of disk space.
- Network utilization: Provides information about network usage.
Use this dashboard to:
- Gain a high-level understanding of the operational status of your Linux system.
This dashboard shows information about the CPU and the system clock.
CPU and system clock dashboard for a Linux node
Use this dashboard to:
- Examine CPU utilization issues. It allows you to see overall CPU usage on the machine, but it won’t identify the specific process responsible for high usage.
- Monitor the system’s average load time across three time intervals: 1 minute, 5 minutes, and 15 minutes. This information reflects the demand on the CPUs.
- Confirm that the system clock is synchronized with an NTP server.
This dashboard displays information about filesystems and disks, including available space per filesystem, overall disk space usage, read and write activity over time, and average wait times.
Use this dashboard to:
- Monitor disk capacity and determine when disks are nearing full utilization.
- Find filesystems that have errors.
- Detect when the input/output (I/O) load on the system is excessive, potentially leading to application latency.
This dashboard provides an overview of the entire fleet, which includes the following information:
- Metadata, such as the integration version and data retrieval status
- List of fleet members
- Performance metrics for the fleet, including CPU, memory, disk usage, and network activity
Use this dashboard to:
- Obtain a quick overview of the environment
- Identify outliers or unusual activities, and investigate specific nodes for further details
This dashboard provides an overview of the entire fleet, which includes the following information:
- Metadata, such as the integration version and data retrieval status
- List of fleet members
- Performance metrics for the fleet, including CPU, memory, disk usage, and network activity
Use this dashboard to:
- Obtain a quick overview of the environment
- Identify outliers or unusual activities, and investigate specific nodes for further details
This dashboard provides details about used and available memory, including:
- Overview: Displays the percentage of total memory currently in use, both in real-time and over time.
- Virtual memory statistics:
- Pages in/out: Indicates when memory is moved into RAM or pushed out to disk.
- Page faults: Shows instances when a process tries to access memory that is not available (typically because it is stored on disk). While some page faults are normal, a high number might indicate that the system is overloaded with high-memory processes conflicting with one another.
- Out of Memory (OOM) Killer: Lists processes that are terminated when the system runs out of memory, including when the swap space on disk is also depleted. Any occurrences of the OOM Killer signal a problem.
- Memory statistics: Provides an overview of overall memory usage by processes, and how memory is allocated and freed.
Use this dashboard to:
- Track the memory usage of applications, including insights from the OOM Killer, which usually indicates a problem.
- Analyze paging information, which can reveal if the system is overwhelmed by applications demanding more memory than is physically available. This situation can lead to excessive disk access, as writing to disk is slower than writing to RAM, potentially degrading application performance and slowing down operations.
This dashboard offers insights into the network, including:
- Network overview: Displays the status and usage of network devices connected to the system, encompassing both physical devices (such as Ethernet and Wi-Fi) and virtual devices (such as Docker networks, the loopback device, network bridges, and NAT translation).
- Network sockets: Provides a summary of opened, closed, and in-use network connections. This includes both TCP (commonly used for larger and more reliable data transmission, like HTTP) and UDP (typically used for real-time data transmission that can handle occasional packet loss, such as audio and video streaming).
- Network netstat: Presents an overview of packets being transmitted and received, including the error rate. A high error rate may signal problems within the system or the network.
Use this dashboard to:
- Assess the functioning status of network devices. For instance, if your Ethernet connection isn’t operational, it indicates a problem.
- Detect transmission errors. A high TCP error rate, for example, suggests that there may be a malfunction somewhere in the network.
Troubleshooting options
At this point in your journey, you can explore the following paths:
Grafana fundamentals (tutorial)
5 quick ways to uplevel your use of Grafana (conference lightning talk)