How to troubleshoot high memory usage in linux

There are situations where your server shows unintended behaviors like a slow response from websites, failures with services, inability to start applications, server crashes etc. Certain minimum resource requirements should be satisfied depending on the running applications in the server for it to deliver the best performance.

However, it is possible for any server to over consume the allocated resources at times of high traffic or from high demand. Linux is a very efficient operating system designed to work on all available resources and it offers us some possibility for adjusting the OS configuration parameters which control the server memory usage.

Below is a discussion about finding the memory leaks in your server and to guide you to better manage the assigned server memory.

Identifying the “Out of Memory” scenario.

You may not notice any issues with the memory availability at the time of your investigation however there is a possibility for such an incident on a previous time stamp. Linux kernel manages the server memory by killing tasks/processes based on some process criteria and release the memory footprint occupied by the killed process.

A log entry will be generated for every such process terminated by the kernel and the logs are usually available in /var/log/ location.

You can use the below grep command to search in all log files in /var/log/ location for an out of memory error.

grep -i -r 'out of memory' /var/log/
Mar 1 01:24:05 srv kernel: [3966348.114233] Out of memory: Kill process 48305 (php-cgi) score 21 or sacrifice child
Mar 1 01:24:08 srv kernel: [3966350.061954] Out of memory: Kill process 48318 (php-cgi) score 21 or sacrifice child

You can confirm the memory insufficiency if you receive a log entry like the one above. The log tells us that the kernel has terminated a php-cgi process with process ID 48305 and out of memory score 21.

Check the current memory usage in the server. You can use the command “free” to find the current memory usage in the server.

root@srv:~# free -m 
              total        used        free      shared  buff/cache   available
Mem:     1981         720         319         138         940         916
Swap:      524          84         440

The command will show you the current RAM and swap usages in MB.

The history of memory usage for the day can be found by using the “sar” command.

root@srv [~]# sar -r
Linux 2.6.32-754.9.1.el6.x86_64 (srv.eurovps.com)       03/05/2019      _x86_64_        (2 CPU)
12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
12:10:01 AM   1672764   1407160     45.69    100356    749140   1618364     44.90
12:20:01 AM   1289208   1790716     58.14    106580   1130096   1599588     44.38
12:30:01 AM   1248100   1831824     59.48    109184   1144680   1621332     44.98
12:40:01 AM   1267972   1811952     58.83    111460   1155828   1604104     44.51
12:50:01 AM   1254556   1825368     59.27    113888   1159632   1599480     44.38
01:00:01 AM   1092296   1987628     64.53    116020   1164540   1802228     50.00
01:10:01 AM   1212168   1867756     60.64    118204   1169516   1633940     45.33
‘’ ‘’ ‘’ ‘’ ‘’ ‘’ 
‘’ ‘’ ‘’ ‘’ ‘’ 
‘’ ‘’ ‘’ ‘’ 
‘’ ‘’ ‘’ 
‘’ ‘’ 
‘’
Average:    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
Average:      1465222   1614702        52.43          179213    834889      1655342     45.93<pre>

A RAM upgrade is necessary if the server shows consistent high memory usage or the average usage for the day is more than 90% as such a high usage can deplete the available free memory at times for a busy server.

Another handy tool to identify the Memory consuming processes is the “top” command, which will give you the option to sort the running processes based on its resource usages.

<pre>root@srv [~]# top -c
top - 18:41:45 up 109 days, 18:03,  5 users,  load average: 1.30, 1.24, 1.24
Tasks: 544 total,   2 running, 541 sleeping,   0 stopped,   1 zombie
Cpu(s):  8.3%us,  1.4%sy,  0.2%ni, 89.8%id,  0.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  12296096k total, 11531800k used,   764296k free,   586732k buffers
Swap: 16777212k total,   209160k used, 16568052k free,  2471072k cached
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                    
2376001 root      20   0  230m 101m 4588 R 31.4  0.8   1:22.06 spamd child                                                                                                                 
2448741 root      30  10 19472 1056  792 S  2.6  0.0   0:00.08 /usr/sbin/lveps -c 1 -p -d -n -o id:10,ep:10,pno:10,pid:15,tno:5,tid:15,cpu:7,mem:15,com:256                                
 953572 mysql     20   0 17.5g 5.3g 5336 S  1.7 44.9 102:38.10 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=web21
 844549 root      30  10  424m  59m 4532 S  1.3  0.5  52:13.20 /opt/alt/python27/bin/python2.7 /usr/share/lve-stats/lvestats-server.py start --pidfile /var/run/lvestats.pid               
1351883 root      20   0 1127m 638m 1768 S  1.3  5.3  37:37.24 /usr/local/cpanel/3rdparty/bin/clamd                                                                                        
 844526 root      30  10  462m  35m 1204 S  1.0  0.3  30:54.20 /opt/alt/python27/bin/python2.7 /usr/share/lve-stats/lvestats-server.py start --pidfile /var/run/lvestats.pid               
   3109 nscd      20   0 2725m 3956 2260 S  0.7  0.0 188:08.43 /usr/sbin/nscd                                                                                                              
2243573 nobody    20   0  189m  78m 1140 S  0.7  0.7   4:22.20 litespeed (lshttpd - #01)                                                                                                   
2448384 mailnull  20   0 78756 7736 3348 S  0.7  0.1   0:00.06 /usr/sbin/exim -bd -q1h -oP /var/spool/exim/exim-daemon.pid

Check the %MEM column of the output and identify the processes which show consistent high memory usage.

You can follow the below key patterns to sort the processes based on its memory usage.

Enter the command top Press SHIFT+o to get the top command options. Press N and enter

Identifying Memory Leaks

Tackling the out of memory situation from a memory leak will be handier if you could find what caused the processes to demand more memory. You can find the server time at which this ‘out of memory’ was reported, here it is “Mar 1 01:24:05”. Use grep command to search this timestamp in the log files for your application servers like Apache, MySQL etc.

For a cPanel server, you can use the grep command to search the website access logs to see any suspicious/abusive access to the website to cause this resource exhaustion.

grep  -ir “01/Mar/2019:01:2”  /usr/local/apache/domlogs/

Some suspicious activities identifiable from the access logs are

High access from specific IP addresses.
High access to unavailable resources/files etc.
High number of HTTP POST requests.
High number of failed access attempts like login.

Based on the observations, you can proceed blocking the IP sources in firewall and server resources can be saved by blocking such invalid access.

Use the command mysqladmin proc stat to identify any MySQL queries hanging for a long time to cause a high memory usage for them.

root@srv [~]# mysqladmin proc stat
+--------+------+-----------+----+---------+------+-------+------------------+----------+
| Id     | User | Host      | db | Command | Time | State | Info             | Progress |
+--------+------+-----------+----+---------+------+-------+------------------+----------+
| 137324 | root | localhost |    | Query   | 0    | init  | show processlist | 0.000    |
+--------+------+-----------+----+---------+------+-------+------------------+----------+
Uptime: 370827  Threads: 1  Questions: 20484133  Slow queries: 0  Opens: 1456  Flush tables: 1  Open tables: 747  Queries per second avg: 55.239

Check the columns Time, db, state , find the queries with a high “Time” entry and check them with your website developer.

Enable “Slow Query Logging” in MySQL and fix the long queries with your website developer.

Memory Overcommit

Usually, the Linux server will allow more memory to be reserved for a process than its actual requirement, this is based on the assumption that no process will use all the memory allowed for it which can be used for other processes. It is possible to configure the Linux system how it handles the memory overcommit and the configuration can be applied using the sysctl utility.

All sysctl control parameters can be listed using the command sysctl -a and the parameter of our interest is vm.overcommit_memory.

[root@srv ~]# sysctl -a | grep vm.overcommit_memory
vm.overcommit_memory = 0

vm.overcommit_memory can have 3 values 0,1 and 2.

0 Allow overcommit based on the estimate “if we have enough RAM”
1 Always allow overcommit
2 Deny overcommit if the system doesn’t have the memory.

Another sysctl parameter to consider is the overcommit_ratio, which defines what percentage of physical memory in addition to the swap space can be considered when allowing processes to overcommit.

Take the case if we set vm.overcommit_memory to 2, the kernel will not allow overcommit exceeding the swap space plus vm.overcommit_ratio of the total RAM space vm.overcommit_memory=2 vm.overcommit_ratio=50% RAM=8GB SWAP=4GB

With the above-mentioned configuration, overcommit is possible for 4GB SWAP + 50% of 8GB RAM

Follow the below steps to make modifications to the sysctl parameters.

sysctl -w vm.overcommit_memory=2
sysctl -w vm.overcommit_ratio=100

These changes will not survive a reboot and it is required to make changes in sysctl configuration to make it permanent.

The configuration file is /etc/sysctl.conf, open the file with any of the text editors like vi or nano and edit the entries.

credit: https://www.eurovps.com/faq/how-to-troubleshoot-high-memory-usage-in-linux/#memory-overcommit

Identifying the “Out of Memory” scenario.

Identifying Memory Leaks

Memory Overcommit

vm.overcommit_memory can have 3 values 0,1 and 2.

Related posts: