Python 线程高CPU占用排查总结

1. 背景

  • 线上进程CPU占用异常偏高。
  • 希望快速定位到是哪一个线程导致的问题,并进一步追踪到具体Python代码位置

2. 排查步骤

(1)确认CPU异常的进程

通过常规 tophtop 工具,找到高CPU占用的进程PID,比如 2608


(2)确认线程视角下的情况

执行

1
top -H -p 2608

1
ps -mp 2608 -o THREAD,tid,time

切换到线程级别视角,查看哪个TID(线程ID)CPU占用最高。

注意:

  • top -H是动态实时监控。
  • ps是静态一次性快照。

(3)将线程TID转成十六进制

因为Python解释器内部使用十六进制TID,需要转换,比如:

1
printf "%x\n" 线程号

例如线程ID是 3245,转换后是 0xca5


(4)用GDB attach到目标进程

1
gdb -p 2608

成功后,执行:

1
thread apply all bt

或者针对单一线程,切换到指定线程:

1
2
thread 线程号
bt

pstack输出的是cpython的调用,查看很不优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
[root@luffy ~]# pstack 2607
Thread 3 (Thread 0x7fc78e1a0700 (LWP 2608)):
#0 0x00000000004243ec in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2759
#1 0x0000000000420098 in function_code_fastcall (co=co@entry=0x7fc795c31300, args=<optimized out>, args@entry=0x7fc795ca6068, nargs=nargs@entry=0, globals=globals@entry=0x7fc795c5cc30) at Objects/call.c:283
#2 0x000000000043b186 in _PyFunction_FastCallDict (func=0x7fc795b8e050, args=args@entry=0x7fc795ca6068, nargs=0, kwargs=kwargs@entry=0x7fc795c1bd70) at Objects/call.c:322
#3 0x000000000043c763 in PyObject_Call (callable=callable@entry=0x7fc795b8e050, args=args@entry=0x7fc795ca6050, kwargs=kwargs@entry=0x7fc795c1bd70) at Objects/call.c:226
#4 0x0000000000422469 in do_call_core (kwdict=0x7fc795c1bd70, callargs=0x7fc795ca6050, func=0x7fc795b8e050) at Python/ceval.c:4645
#5 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3191
#6 0x0000000000420098 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#7 0x000000000043b316 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415
#8 0x00000000004280e2 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616
#9 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3110
#10 0x0000000000420098 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#11 0x000000000043b316 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415
#12 0x00000000004280e2 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616
#13 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3110
#14 0x0000000000420098 in function_code_fastcall (co=co@entry=0x7fc78e2541e0, args=<optimized out>, args@entry=0x7fc78e19fe20, nargs=nargs@entry=1, globals=globals@entry=0x7fc78e249640) at Objects/call.c:283
#15 0x000000000043b186 in _PyFunction_FastCallDict (func=func@entry=0x7fc78e1d1710, args=args@entry=0x7fc78e19fe20, nargs=nargs@entry=1, kwargs=kwargs@entry=0x0) at Objects/call.c:322
#16 0x000000000043e3e9 in _PyObject_FastCallDict (kwargs=0x0, nargs=1, args=0x7fc78e19fe20, callable=0x7fc78e1d1710) at Objects/call.c:98
#17 _PyObject_Call_Prepend (callable=0x7fc78e1d1710, obj=<optimized out>, args=0x7fc795ca6050, kwargs=0x0) at Objects/call.c:906
#18 0x000000000043c688 in PyObject_Call (callable=0x7fc795c9d320, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:245
#19 0x0000000000580bb3 in t_bootstrap (boot_raw=boot_raw@entry=0x7fc795b989c0) at ./Modules/_threadmodule.c:994
#20 0x0000000000537187 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:174
#21 0x00007fc795689ea5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fc794ca9b0d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fc78d99f700 (LWP 2609)):
#0 0x00007fc794ca0b43 in select () from /lib64/libc.so.6
#1 0x000000000057f742 in pysleep (secs=<optimized out>) at ./Modules/timemodule.c:1843
#2 time_sleep (self=self@entry=0x7fc78e2578f0, obj=<optimized out>) at ./Modules/timemodule.c:371
#3 0x000000000043be56 in _PyMethodDef_RawFastCallKeywords (kwnames=0x0, nargs=0, args=0x7fc78e1c0d98, self=0x7fc78e2578f0, method=0x8e51e0 <time_methods+288>) at Objects/call.c:647
#4 _PyCFunction_FastCallKeywords (func=0x7fc78e25b0f0, args=args@entry=0x7fc78e1c0d98, nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/call.c:732
#5 0x000000000042801d in call_function (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>) at Python/ceval.c:4568
#6 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3093
#7 0x0000000000420098 in function_code_fastcall (co=co@entry=0x7fc795c25390, args=<optimized out>, args@entry=0x7fc795ca6068, nargs=nargs@entry=0, globals=globals@entry=0x7fc795c5cc30) at Objects/call.c:283
#8 0x000000000043b186 in _PyFunction_FastCallDict (func=0x7fc78e233560, args=args@entry=0x7fc795ca6068, nargs=0, kwargs=kwargs@entry=0x7fc78e2564b0) at Objects/call.c:322
#9 0x000000000043c763 in PyObject_Call (callable=callable@entry=0x7fc78e233560, args=args@entry=0x7fc795ca6050, kwargs=kwargs@entry=0x7fc78e2564b0) at Objects/call.c:226
#10 0x0000000000422469 in do_call_core (kwdict=0x7fc78e2564b0, callargs=0x7fc795ca6050, func=0x7fc78e233560) at Python/ceval.c:4645
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3191
#12 0x0000000000420098 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#13 0x000000000043b316 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415
#14 0x00000000004280e2 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616
#15 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3110
#16 0x0000000000420098 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#17 0x000000000043b316 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415
#18 0x00000000004280e2 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616
#19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3110
#20 0x0000000000420098 in function_code_fastcall (co=co@entry=0x7fc78e2541e0, args=<optimized out>, args@entry=0x7fc78d99ee20, nargs=nargs@entry=1, globals=globals@entry=0x7fc78e249640) at Objects/call.c:283
#21 0x000000000043b186 in _PyFunction_FastCallDict (func=func@entry=0x7fc78e1d1710, args=args@entry=0x7fc78d99ee20, nargs=nargs@entry=1, kwargs=kwargs@entry=0x0) at Objects/call.c:322
#22 0x000000000043e3e9 in _PyObject_FastCallDict (kwargs=0x0, nargs=1, args=0x7fc78d99ee20, callable=0x7fc78e1d1710) at Objects/call.c:98
#23 _PyObject_Call_Prepend (callable=0x7fc78e1d1710, obj=<optimized out>, args=0x7fc795ca6050, kwargs=0x0) at Objects/call.c:906
#24 0x000000000043c688 in PyObject_Call (callable=0x7fc795bb83c0, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:245
#25 0x0000000000580bb3 in t_bootstrap (boot_raw=boot_raw@entry=0x7fc78e239ea0) at ./Modules/_threadmodule.c:994
#26 0x0000000000537187 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:174
#27 0x00007fc795689ea5 in start_thread () from /lib64/libpthread.so.0
#28 0x00007fc794ca9b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fc795ce6740 (LWP 2607)):
#0 0x00007fc794ca0b43 in select () from /lib64/libc.so.6
#1 0x000000000057f742 in pysleep (secs=<optimized out>) at ./Modules/timemodule.c:1843
#2 time_sleep (self=self@entry=0x7fc78e2578f0, obj=<optimized out>) at ./Modules/timemodule.c:371
#3 0x000000000043be56 in _PyMethodDef_RawFastCallKeywords (kwnames=0x0, nargs=0, args=0x7fc795bf35c8, self=0x7fc78e2578f0, method=0x8e51e0 <time_methods+288>) at Objects/call.c:647
#4 _PyCFunction_FastCallKeywords (func=0x7fc78e25b0f0, args=args@entry=0x7fc795bf35c8, nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/call.c:732
#5 0x000000000042801d in call_function (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>) at Python/ceval.c:4568
#6 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3093
#7 0x00000000004ed8a7 in PyEval_EvalFrameEx (throwflag=0, f=0x7fc795bf3450) at Python/ceval.c:547
#8 _PyEval_EvalCodeWithName (_co=_co@entry=0x7fc795b71660, globals=globals@entry=0x7fc795c651b0, locals=locals@entry=0x7fc795b6ce10, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=kwargs@entry=0x0, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0, name=name@entry=0x0, qualname=qualname@entry=0x0) at Python/ceval.c:3930
#9 0x00000000004ed9d0 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=locals@entry=0x7fc795b6ce10, globals=globals@entry=0x7fc795c651b0, _co=_co@entry=0x7fc795b71660) at Python/ceval.c:3959
#10 PyEval_EvalCode (co=co@entry=0x7fc795b71660, globals=globals@entry=0x7fc795c5cc30, locals=locals@entry=0x7fc795c5cc30) at Python/ceval.c:524
#11 0x0000000000528648 in run_mod (arena=0x7fc795c73770, flags=0x7ffc619def20, locals=0x7fc795c5cc30, globals=0x7fc795c5cc30, filename=0x7fc795b6ce10, mod=0x1cb9610) at Python/pythonrun.c:1035
#12 PyRun_FileExFlags (fp=0x1d04d10, filename_str=<optimized out>, start=<optimized out>, globals=0x7fc795c5cc30, locals=0x7fc795c5cc30, closeit=1, flags=0x7ffc619def20) at Python/pythonrun.c:988
#13 0x000000000052882d in PyRun_SimpleFileExFlags (fp=0x1d04d10, filename=<optimized out>, closeit=1, flags=0x7ffc619def20) at Python/pythonrun.c:429
#14 0x000000000042fe0d in pymain_run_file (p_cf=0x7ffc619def20, filename=0x1c7e460 L"/root/vscode-work/daily_code/busy_thread.py", fp=0x1d04d10) at Modules/main.c:456
#15 pymain_run_filename (cf=0x7ffc619def20, pymain=0x7ffc619df000) at Modules/main.c:1646
#16 pymain_run_python (pymain=0x7ffc619df000) at Modules/main.c:2907
#17 pymain_main (pymain=pymain@entry=0x7ffc619df000) at Modules/main.c:3068
#18 0x00000000004300dd in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:3103
#19 0x00007fc794bcd555 in __libc_start_main () from /lib64/libc.so.6
#20 0x0000000000429de4 in _start ()

(5)加载Python调试脚本,友好查看Python堆栈

1
2
source /root/.pyenv/versions/3.7.10/lib/python3.7/python-gdb.py
py-bt

这样就能看到Python级别的调用栈了

1
2
3
4
5
6
7
8
9
10
11
12
(gdb) source /root/.pyenv/versions/3.7.10/lib/python3.7/python-gdb.py
(gdb) py-bt
Traceback (most recent call first):
File "/root/vscode-work/daily_code/busy_thread.py", line 18, in busy_thread
pass # 死循环,占满CPU
File "/root/.pyenv/versions/3.7.10/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/.pyenv/versions/3.7.10/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/root/.pyenv/versions/3.7.10/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
(gdb)

3. 本次最终结果

  • 最后通过 py-bt 成功定位到了代码位置:

    • /root/vscode-work/daily_code/busy_thread.py 第18行
    • 处于 busy_thread() 函数中
    • 正在执行 pass 死循环,占满CPU

4. 经验总结

项目 经验
问题归因 高CPU占用问题,常由死循环、频繁IO、竞争锁等引起
工具链掌握 熟练使用 top -Hps -mpgdb -ppy-bt
注意事项 记得TID需要转16进制;Python要有对应的python-gdb.py脚本
小技巧 py-bt能直接定位到Python代码层,比C层bt友好多了

补充

我自己用pyenv安装的python3.7没带 python-gdb.py

正常来说,python-gdb.py 是 Python 源码安装后在 lib-dynload/ 旁边的,像 /usr/lib/debug/.../usr/lib/python3.x/ 目录中。
但是如果你用 pyenv 裸安装,默认是不会带这个调试脚本的(除非你自己加了参数 --with-pydebug 编译)。

解决方法是:

手动下载一份 python-gdb.py 脚本过来用。

Python 3.7.10 对应的 python-gdb.py 官方地址在 GitHub源码里:
👉 https://github.com/python/cpython/blob/3.7/Tools/gdb/libpython.py

步骤:

1
2
cd /root/.pyenv/versions/3.7.10/lib/python3.7/
wget https://raw.githubusercontent.com/python/cpython/3.7/Tools/gdb/libpython.py -O python-gdb.py

然后在 gdb 里执行:

1
2
(gdb) source /root/.pyenv/versions/3.7.10/lib/python3.7/python-gdb.py
(gdb) py-bt

就可以拿到 Python 代码级的调用栈了!!


Python 线程高CPU占用排查总结
https://luffy997.github.io/2024/07/15/Python-线程高CPU占用排查总结/
作者
Luffy997
发布于
2024年7月15日
许可协议