Linux CPU usage and Process Execution History

Linux CPU usage and Process Execution History

 

Is there any way to see what process(es) caused the most CPU usage?

I have AMAZON EC2 Linux which CPU utilization reaches 100 percent and make me to reboot the system. I cannot even login through SSH (Using putty).

Is there any way to see what causes such a high CPU usage and which process caused that ?

I know about sar and top command but I could not find process execution history anywhere. Here is the image from Amazon EC2 monitoring tool, but I would like to know which process caused that :

 

enter image description here

 

I have also tried ps -eo pcpu,args | sort -k 1 -r | head -100 but no luck finding such a high CPU usage.

 

There are a couple of possible ways you can do this. Note that its entirely possible its many processes in a runaway scenario causing this, not just one.

The first way is to setup pidstat to run in the background and produce data.

pidstat -u 600 >/var/log/pidstats.log & disown $!

This will give you a quite detailed outlook of the running of the system at ten minute intervals. I would suggest this be your first port of call since it produces the most valuable/reliable data to work with.

There is a problem with this, primarily if the box goes into a runaway cpu loop and produces huge load — your not guaranteed that your actual process will execute in a timely manner during load (if at all) so you could actually miss the output!

The second way to look for this is to enable process accounting. Possibly more of a long term option.

accton on

This will enable process accounting (if not already added). If it was not running before this will need time to run.

Having been ran, for say 24 hours – you can then run such a command (which will produce output like this)

# sa --percentages --separate-times
     108  100.00%       7.84re  100.00%       0.00u  100.00%       0.00s  100.00%         0avio     19803k
       2    1.85%       0.00re    0.05%       0.00u   75.00%       0.00s    0.00%         0avio     29328k   troff
       2    1.85%       0.37re    4.73%       0.00u   25.00%       0.00s   44.44%         0avio     29632k   man
       7    6.48%       0.00re    0.01%       0.00u    0.00%       0.00s   44.44%         0avio     28400k   ps
       4    3.70%       0.00re    0.02%       0.00u    0.00%       0.00s   11.11%         0avio      9753k   ***other*
      26   24.07%       0.08re    1.01%       0.00u    0.00%       0.00s    0.00%         0avio      1130k   sa
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     28544k   ksmtuned*
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     28096k   awk
      14   12.96%       0.00re    0.01%       0.00u    0.00%       0.00s    0.00%         0avio     29623k   man*
       7    6.48%       7.00re   89.26%       0.00u    0.00%       0.00s    

The columns are ordered as such:

  1. Number of calls
  2. Percentage of calls
  3. Amount of real time spent on all the processes of this type.
  4. Percentage.
  5. User CPU time
  6. Percentage
  7. System CPU time.
  8. Average IO calls.
  9. Percentage
  10. Command name

What you'll be looking for is the process types that generate the most User/System CPU time.

This breaks down the data as the total amount of CPU time (the top row) and then how that CPU time has been split up. Process accounting only accounts properly when its on when processes spawn, so its probably best to restart the system after enabling it to ensure all services are being accounted for.

This, by no means actually gives you a definite idea what process it might be that is the cause of this problem, but might give you good feel. As it could be a 24 hour snapshot theres a possibility of skewed results so bear that in mind. It also should always log since its a kernel feature and unlike pidstat will always produce output even during heavy load.

The last option available also uses process accounting so you can turn it on as above, but then use the program "lastcomm" to produce some statistics of processes executed around the time of the problem along with cpu statistics for each process.

lastcomm | grep "May  8 22:[01234]"
kworker/1:0       F    root     __         0.00 secs Tue May  8 22:20
sleep                  root     __         0.00 secs Tue May  8 22:49
sa                     root     pts/0      0.00 secs Tue May  8 22:49
sa                     root     pts/0      0.00 secs Tue May  8 22:49
sa                   X root     pts/0      0.00 secs Tue May  8 22:49
ksmtuned          F    root     __         0.00 secs Tue May  8 22:49
awk                    root     __         0.00 secs Tue May  8 22:49

This might give you some hints too as to what might be causing the problem.

 
 

Atop is a particularly handy daemon for looking at drill-downs to the process level and by default archives this data for 28 days. Besides presenting an awesome real-time monitoring interface, you can specify those log files to open and step through them.

The article gives some idea of the capabilities, and you can find more in the manpage.

It's truly a wonderful piece of software.

 
 

Programs such as psmon and monit maybe helpful for you. Those can monitor the processes running on your system and if any threshold (CPU usage, memory usage…) gets exceeded, you can set them send you an e-mail report about what's going on.

It's also possible to automatically restart the misbehaving processes.

   
 

One solution is writing a script that is run via one minute cron or in a sleep loop, and sends you an email/scp job/dump to an ebs volume… with relevant output (dmesg, pstree -pa and ps aux, probably vmstat) the instant it finds the load average over a certain limit…

 

 

Read more

How to migrate Raspberry Pi 5 OS from micro SD to NVME m.2 SSD

首先我買了Raspberry Pi CM5後來買了Raspberry Pi CM5 I/O board來當個人電腦使用,系統是安裝在256GB SD卡上運行的很好。用久了在開啟較肥的程式像Web Browser或LiberOffice會有慢半拍的反應,而有了升級NVME m.2 SSD念頭。 因為Raspberry Pi 5支援的最快PCIe gen3 x 4就不去考慮快的Gen4 or Gen5 m.2 SSD。找了ADATA出的 LEGEND 710入門級的產品,會利用HMB(Host Memory Buffer)來加速I/O速度,因為是Raspberry Pi OS kernel會認不得而無法正常使用 事先在SD卡的/boot/firmware/cmdline.txt 加入 kernel command line參數如下,然後重開機m.

By Phillips Hsieh

How to document Home Lab and Network

運維機房和跨域的網路,會遇到各式需求與問題,用對工具才能分析問題,個人覺得最重要的是使用能處理問題的工具。 推薦目前想學和正在使用的平台與軟體,協助將公司/家用機房文件化 佈告欄任務管理 Focalboard 白板可管理任務指派 網路架構文件編寫 netbox 精細管理網路設備與連接線路 IP 資源管理 phpipam 專注網路IP分配 邏輯塊文件編寫 draw.io 視覺化概念圖 機房設備管理 ITDB 管理設備生命週期與使用者

By Phillips Hsieh

如何在Raspberry Pi4上安裝Proxmox for ARM64

第一步 準備好Raspberry Pi 4 / CM4 4GB RAM,這裡要留意CM4如果是買有內建eMMC storage會限制不能使用SD卡開機而限制本地空間容量,如果沒有NAS外接空間或使用USB開機的話,建議買CM4 Lite插上大容量SD卡 第二步 去Armbian官網下載最小化Debian bookworm image https://www.armbian.com/rpi4b/ Armbian 25.2.2 Bookworm Minimal / IOT 然後寫入SD/USB開機碟,寫入方法參考官方文件 https://github.com/raspberrypi/usbboot/blob/master/Readme.md Note: 官方提供的預先設定系統方法,可以在Armbian初次啟動自動化完成系統設定。連結在此 https://docs.armbian.com/User-Guide_Autoconfig/

By Phillips Hsieh

世界越快心越慢

在晚飯後的休息時間,我特別享受在客廳瀏灠youtube上各樣各式創作者的影音作品。很大不同於傳統媒體,節目多是針對大多數族群喜好挑選的,在youtube上我會依心情看無腦的動畫、一些旅拍記錄、新聞時事談論。 尤其在看了大量的Youtube的分享後,我真的感受到會限制我的是我的無知,特別是那些我想都沒想過的實際應用,在學習後大大幫助到我的生活和工作層面。 休息在家時,我喜歡想一些沒做過的菜,動手去設計生活和工作上的解決方案,自己是真的很難閒著沒事做。 如創作文章,陪養新的習慣都能感覺到成長的喜悅,是不同於吃喝玩樂的快樂的。 創作不去限制固定的形式,文字是創作、影像聲音也是創作,記錄生活也是創作,我想留下的就是創造—》實現—》回憶,這樣子的循環過程,在留下的足跡面看到自己一路上的成長、失敗、絕望、重新再來。 雖然大部份的時候去做這些創作也不明白有什麼特別的意義,但不去做也不會留下什麼,所以呀不如反事都去試試看,也許能有不一樣的水花也許有意想不到的結果,投資自己永遠不會是失敗的決定,不是嗎?先問問自己再開始計畫下一步,未來沒人說得準。 像最近看youtube仍大一群人在為DOS開

By Phillips Hsieh