当我们希望观察java程序运行的堆栈情况,多次执行jstack,或者通过cpu 采用发现某个线程都在同一个函数函数.
这可能是两种情况引起的:
- 在这个线程中,这个函数是高频函数.
- 线程当前阻塞在这个函数上.
Linux中没有真正的线程,java线程是由进程来模拟实现的. 这种进程又称作:轻量级进程(LWP).
因此可以通过strace -p 追踪其运行情况,来区分以上两种情况.
但是通过ps -ef
或者jps
只能看到java程序的pid. 如果需要追踪线程,可以jstack
查找线程的pid.
线程第一个行中的 nid
字段,就是该线程pid的16进制表示. 转化为10进制即可.
例如下面,线程显示在执行io上,读数据相对这个操作是比重多的. 但是是频繁执行io,还是文件系统出现问题,卡在io上,这是无法区分. 就可以这样追踪一下.
查找线程nid
[wjh@node1 ~]$ jps
1287939 AppMockRealDedupDelete
1287937 Launcher
1293889 Jps
1098227 Main
1099286 Launcher
1098463 RemoteMavenServer36
[wjh@node1 ~]$ jstack 1287939
2021-09-26 16:11:56
Full thread dump OpenJDK 64-Bit Server VM (11.0.12+7-LTS mixed mode, sharing):
Threads class SMR info:
_java_thread_list=0x00007f6570000e90, length=15, elements={
0x00007f66a8018000, 0x00007f66a8399000, 0x00007f66a839b000, 0x00007f66a83a4800,
0x00007f66a83a6800, 0x00007f66a83a8800, 0x00007f66a83b2800, 0x00007f66a83b4800,
0x00007f66a83d3000, 0x00007f66a8719800, 0x00007f66a8b66000, 0x00007f66a8bc0800,
0x00007f6648001000, 0x00007f65dc001000, 0x00007f65b0001800
}
"main" #1 prio=5 os_prio=0 cpu=2159223.33ms elapsed=2533.45s tid=0x00007f66a8018000 nid=0x13a705 runnable [0x00007f66b24d7000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base@11.0.12/Native Method)
at java.io.FileInputStream.read(java.base@11.0.12/FileInputStream.java:279)
at java.io.BufferedInputStream.fill(java.base@11.0.12/BufferedInputStream.java:252)
at java.io.BufferedInputStream.read1(java.base@11.0.12/BufferedInputStream.java:292)
at java.io.BufferedInputStream.read(java.base@11.0.12/BufferedInputStream.java:351)
- locked <0x000000051d48ad88> (a java.io.BufferedInputStream)
at java.io.FilterInputStream.read(java.base@11.0.12/FilterInputStream.java:107)
at com.afonddream.algorithm.real.ReadRealData.loadData(ReadRealData.java:25)
at com.afonddream.algorithm.app.AppMockRealDedupDelete.mockDedup(Unknown Source)
at com.afonddream.algorithm.app.AppMockRealDedupDelete.mock(Unknown Source)
at com.afonddream.algorithm.app.AppMockRealDedupDelete.main(Unknown Source)
转化为10进制
[wjh@node1 ~]$ python
Python 2.7.18 (default, Jan 22 2021, 11:25:37)
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 0x13a705
1287941
>>>
追踪线程执行情况
原来程序在频繁执行io函数.
[wjh@node1 ~]$ strace -p 1287941
strace: Process 1287941 attached
read(112, "", 38797312) = 0
munmap(0x7f6561afe000, 38801408) = 0
mmap(NULL, 38801408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6561afe000
read(112, "", 38797312) = 0
munmap(0x7f6561afe000, 38801408) = 0
mmap(NULL, 38801408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6561afe000
read(112, "", 38797312) = 0
munmap(0x7f6561afe000, 38801408) = 0
mmap(NULL, 38801408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6561afe000
...