A practical guide to checking memory usage in Linux. Covers essential commands (ps, top, htop), explains VSZ vs RSS, and discusses the memory visibility isolation issues within Docker containers."
When a Linux server starts acting sluggish, the first suspect is usually memory. Whether you are debugging a standalone server or a Kubernetes pod, knowing exactly how to pinpoint the “memory hogs” is a critical skill.
This post covers the quick commands to find the culprit, explains what the metrics actually mean, and dives into why memory analysis inside Docker containers can be deceptive.
1. The Quick Fix: Finding the Culprit
If you need to identify the process consuming the most memory immediately, you don’t need to install anything.
Method 1: The Script-Friendly Way (ps)
The ps command is available on almost every Linux system. To list the top 10 memory-consuming processes:
ps aux --sort=-%mem | head -n 11
Breakdown:
ps aux: Lists all processes for all users.--sort=-%mem: Sorts by memory percentage in descending order (note the minus sign-).head -n 11: Shows the top 11 lines (1 header line + top 10 processes).
Method 2: The Interactive Way (htop / top)
While ps is great for snapshots, interactive tools are better for monitoring.
- htop (Recommended): If installed, use this. You can simply click the MEM% column header to sort, or press F6 to select the sort criteria. It provides a much clearer visual representation of bars and colors.
- top (Built-in): If you don’t have htop, run top. By default, it sorts by CPU. Press Shift + M inside the interface to switch to Memory sorting.
2. Understanding the Metrics: VSZ vs. RSS
When you run the commands above, you will see two confusing columns: VSZ and RSS. Understanding the difference is vital.
| Metric | Full Name | Definition |
|---|---|---|
| VSZ | Virtual Memory Size | The total amount of memory a process has asked for (allocated). This includes memory that has been mapped but not yet used. |
| RSS | Resident Set Size | The actual amount of physical RAM the process is currently using. This is the number you usually care about. |
The Catch: RSS is often an overestimate because it includes Shared Libraries. If 10 processes all use libc, the memory used by libc is counted in the RSS of all 10 processes.
Pro Tip: If you need absolute precision (e.g., “How much RAM will I get back if I kill this process?”), you should look at PSS (Proportional Set Size), which divides shared memory among processes. Tools like smem can show this.
3. The System-Level View: Don’t Panic
Before hunting individual processes, check the global system state:
free -h
A common source of confusion for new Linux users is the “free” column. You might see:
total used free shared buff/cache available
Mem: 15Gi 4.5Gi 200Mi 1.0Gi 10Gi 10Gi
- Free: 200Mi. (Panic? No.)
- Available: 10Gi. (Relax.)
Linux follows the philosophy that “unused RAM is wasted RAM.” It automatically uses free memory to cache disk files (buff/cache) to speed up I/O. If applications need more RAM, the kernel instantly reclaims it from the cache. Always look at the “available” column.
When Memory Truly Runs Out: The OOM Killer
If “available” memory hits zero, the Linux kernel invokes the OOM (Out of Memory) Killer. It sacrifices a process to save the system.
To check if your application (e.g., a Python script or Java app) was a victim:
dmesg | grep -i "out of memory"
# Or check the logs
grep -i "killed" /var/log/kern.log
4. The Container “Lie”: Memory in Docker & Kubernetes
If you are running these commands inside a Docker container or a Kubernetes Pod, what you see might not be real.
The Visibility Problem
You might set a Kubernetes limit of 512MB for your pod. However, if you run free -h or top inside that pod, you will likely see 64GB (or whatever the host node has).
Why?
Containers are not Virtual Machines. They utilize Namespaces for isolation and Cgroups for resource limitation.
- Cgroups strictly enforce the limit. If you use >512MB, the kernel kills you.
- Namespaces isolate processes and networks, but they generally share the kernel’s /proc filesystem.
Tools like free and top read from /proc/meminfo. Since /proc is often a direct mount from the host kernel, the container “sees” the host’s total resources, even though it can’t use them.
The Consequence: This causes issues with runtimes like the JVM (Java). Older Java versions would see 64GB RAM, create a massive Heap, and immediately get killed by the container Cgroup limit.
How to Fix It?
- Trust Cgroups, not /proc: Check specific Cgroup files (e.g., /sys/fs/cgroup/memory/) for the real usage and limits.
- LXCFS: In production Kubernetes clusters, we often use LXCFS. This is a FUSE-based filesystem that “masks” /proc files. When a process inside a container reads /proc/meminfo, LXCFS intercepts the call and returns values consistent with the container’s Cgroup limits.
A Note on Disk Space (df -h)
Similarly, running df -h inside a container usually shows the host’s disk size and usage.
This is because the default Docker storage driver, overlay2, operates at the file level. It shares the underlying filesystem (ext4/xfs) of the host. There is no separate “virtual disk” created for the container. While you can limit writable space using XFS project quotas, standard tools like df will essentially show you the underlying host partition.
Conclusion
- Quick Check: Use
htoporps aux --sort=-%mem. - Analysis: Focus on RSS for physical memory usage, but remember it includes shared memory.
- Containers: Be skeptical of system tools inside Docker. The container might claim it has 64GB of RAM and 1TB of disk, but Cgroups (the enforcement layer) usually disagrees.