Top 10 Dtrace scripts

本文介绍了十个最实用的DTrace脚本,这些脚本能够帮助我们深入了解和解决Solaris 10服务器上出现的问题,包括获取进程利用率统计、文件系统刷新活动、I/O大小分布等。

Top Ten DTrace (D) Scripts

Introduction

DTrace is a comprehensive and flexible dynamic tracing facility built into the Solaris Operating System. DTrace allows dynamic instrumentation of a running Solaris system, which can assist with answering questions like "which process is chewing up CPU 38," or "which user is causing the cross-call activity on CPU 6," or "which setuid binaries are being executed?"

DTrace uses a scripting language called "D," which uses a syntax very similar to C and Awk. Several amazing D scripts have been developed and distributed through the Internet, so I thought I would share my favorite D scripts in a Letterman like "Top 10" format:

Number 10: Print Utilization statistics per process

Brendan Gregg developed prustat to display the top processes sorted by CPU, Memory, Disk or Network utilization:

$ prustat -c -t 10 5

PID %CPU %Mem %Disk %Net COMM
7176 0.88 0.70 0.00 0.00 dtrace
7141 0.00 0.43 0.00 0.00 sshd
7144 0.11 0.24 0.00 0.00 sshd
3 0.34 0.00 0.00 0.00 fsflush
7153 0.03 0.19 0.00 0.00 bash
99 0.00 0.22 0.00 0.00 nscd
7146 0.00 0.19 0.00 0.00 bash
52 0.00 0.17 0.00 0.00 vxconfigd
7175 0.07 0.09 0.00 0.00 sh
98 0.00 0.16 0.00 0.00 kcfd

This script is super useful for getting a high level understanding of what is happening to a Solaris server. Golden!

Number 9: File System Flush Activity

On Solaris systems, the pagedaemon is reponsible for scanning the page cache and adjusting the MMU reference bit of each dirty page it finds. When the fsflush daemon runs, it scans the page cache looking for pages with the MMU reference bit set, and schedules these pages to be written to disk. The fsflush.d D script provides a detailed breakdown of pages scanned, and the number of nanoseconds that were required to scan "SCANNED" pages:

$ fsflush.d
SCANNED EXAMINED LOCKED MODIFIED COALESCE RELEASES TIME(ns)
4254 4255 1 1 0 0 2695024
4254 4255 1 0 0 0 1921518
4254 4255 6 0 0 0 1989044
4254 4255 1 0 0 0 2401266
4254 4255 4 1 0 0 2562138
4254 4255 89 4 0 0 2425988
4254 3744 80 25 0 0 2394895
4254 4255 28 8 0 0 1776222
4254 4255 216 8 0 0 2350826
4254 4255 108 7 0 0 2356146

Now you might be wondering why "SCANNED" is less than "EXAMINED?" This is due to a bug in fsflush, and a bug report was filed to address this anomaly. Tight!

Number 8: Seek Sizes

Prior to Solaris 10, determining if an application accessed data in a sequential or random pattern required reviewing mounds of truss(1m) and vxtrace(1m) data. With the introduction of DTrace and Brendan Gregg's seeksize.d D script, this question is trivial to answer:

$ seeksize.d
Sampling... Hit Ctrl-C to end.
^C

PID CMD
7312 dd if=/dev/dsk/c1t1d0s2 of=/dev/null bs=1048576

value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1762
1 | 0

0 sched

value ------------- Distribution ------------- count
-1048576 | 0
-524288 |@@@@ 1
-262144 | 0
-131072 | 0
-65536 | 0
-32768 | 0
-16384 | 0
-8192 | 0
-4096 | 0
-2048 | 0
-1024 | 0
-512 | 0
-256 | 0
-128 | 0
-64 | 0
-32 | 0
-16 | 0
-8 | 0
-4 | 0
-2 | 0
-1 | 0
0 | 0
1 | 0
2 | 0
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 |@@@@ 1
256 |@@@@ 1
512 |@@@@ 1
1024 |@@@@ 1
2048 | 0
4096 | 0
8192 | 0
16384 |@@@@ 1
32768 |@@@@ 1
65536 |@@@@@@@@ 2
131072 | 0
262144 | 0
524288 |@@@@ 1
1048576 | 0

This script measures the seek distance between consecutive reads and writes, and provides a histogram with the seek distances. For applications that are using sequential access patterns (e.g., dd in this case), the distribution will be small. For applications accessing data in a random nature (e.g, sched in this example), you will see a wide distribution. Shibby!

Number 7: Print Overall Paging Activity

Prior to the introduction of DTrace, it was difficult to extract data on which files and disk devices were active at a specific point in time. With the introduction of fspaging.d, you can get a detailed view of which files are being accessed:

$ fspaging.d
Event Device Path RW Size Offset
get-page /lib/sparcv9/libc.so.1 8192
get-page /usr/lib/sparcv9/libdtrace.so.1 8192
get-page /lib/sparcv9/libc.so.1 8192
get-page /lib/libc.so.1 8192
put-page /etc/svc/volatile/system-system-log:default.log 8192
put-page /etc/svc/volatile/svc_nonpersist.db 8192
put-page /etc/svc/volatile/svc_nonpersist.db 8192
put-page /etc/svc/volatile/init-next.state 8192
put-page /etc/svc/volatile/network-ssh:default.log 8192
put-page /etc/svc/volatile/network-pfil:default.log 8192

This is a super useful script! Niiiiiiiiiiice!

Number 6: Getting System Wide errno Information

When system calls have problems executing, they usually return a value to indicate success or failure, and set the global "ERRNO" variable to a value indicating what went wrong. To get a system wide view of which system calls are erroring out, we can use Brendan Gregg's errinfo D script:

$ errinfo -c
Sampling... Hit Ctrl-C to end.
^C
EXEC SYSCALL ERR COUNT DESC
ttymon read 11 1 Resource temporarily unavailable
utmpd ioctl 25 2 Inappropriate ioctl for device
init ioctl 25 4 Inappropriate ioctl for device
nscd lwp_kill 3 13 No such process
fmd lwp_park 62 48 timer expired
nscd lwp_park 62 48 timer expired
svc.startd lwp_park 62 48 timer expired
vxesd accept 4 49 interrupted system call
svc.configd lwp_park 62 49 timer expired
inetd lwp_park 62 49 timer expired
svc.startd portfs 62 490 timer expired

This will display the process, system call, and errno number and description from /usr/src/sys/errno.h! Jeah!

Number 5: I/O per process

Several Solaris utilities provide a summary of the time spent waiting for I/O (which is a meaningless metric), but fail to provide facilities to easily correlate I/O activity with a process. With the introduction of psio.pl, you can see exactly which processes are responsible for generating I/O:

$ psio.pl
UID PID PPID %I/O STIME TTY TIME CMD
root 7312 7309 70.6 16:00:59 pts/2 02:36 dd if=/dev/dsk/c1t1d0s2 of=/dev/null bs=1048576
root 0 0 0.0 10:24:18 ? 00:02 sched
root 1 0 0.0 10:24:18 ? 00:03 /sbin/init
root 2 0 0.0 10:24:18 ? 00:00 pageout
root 3 0 0.0 10:24:18 ? 00:51 fsflush
root 7 1 0.0 10:24:20 ? 00:06 /lib/svc/bin/svc.startd
root 9 1 0.0 10:24:21 ? 00:14 /lib/svc/bin/svc.configd

Once you find I/O intensive processes, you can use fspaging, iosnoop, and rwsnoop to get additional information:

$ iosnoop -n
MAJ MIN UID PID D BLOCK SIZE COMM PATHNAME
136 8 0 990 R 341632 8192 dtrace /lib/sparcv9/ld.so.1
136 8 0 990 R 341568 8192 dtrace /lib/sparcv9/ld.so.1
136 8 0 990 R 14218976 8192 dtrace /lib/sparcv9/libc.so.1
[ ... ]

$ iosnoop -e
DEVICE UID PID D BLOCK SIZE COMM PATHNAME
dad1 0 404 R 481712 8192 vxsvc /lib/librt.so.1
dad1 0 3 W 516320 3072 fsflush /var/adm/utmpx
dad1 0 3 W 18035712 8192 fsflush /var/adm/wtmpx
[ ... ]

$ rwsnoop
UID PID CMD D BYTES FILE
100 902 sshd R 42 /devices/pseudo/clone@0:ptm
100 902 sshd W 80
100 902 sshd R 65 /devices/pseudo/clone@0:ptm
100 902 sshd W 112
100 902 sshd R 47 /devices/pseudo/clone@0:ptm
100 902 sshd W 96
0 404 vxsvc R 1024 /etc/inet/protocols
[ ... ]

Smooooooooooth!

Number 4: I/O Sizes Per Process

As a Solaris administrator, we are often asked to identify application I/O sizes. This information can be acquired for a single process with truss(1m), or system wide with Brendan Gregg's bitesize.d D script:

$ bitesize.d
Sampling... Hit Ctrl-C to end.

3 fsflush

value ------------- Distribution ------------- count
512 | 0
1024 |@ 1
2048 | 0
4096 |@@ 2
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 39
16384 | 0

7312 dd if=/dev/dsk/c1t1d0s2 of=/dev/null bs=1048576

value ------------- Distribution ------------- count
16 | 0
32 | 2
64 | 0
128 | 0
256 | 0
512 | 2
1024 | 0
2048 | 0
4096 | 0
8192 | 0
16384 | 0
32768 | 0
65536 | 0
131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 76947
262144 | 0

If only Dorothy could see this!

Number 3: TCP Top

Snoop(1m) and ethereal are amazing utilities, and provide a slew of options to filter data. When you don't have time to wade through snoop data or download and install ethereal, you can use tcptop to get an overview of TCP activity on a system:

$ tcptop 5

2005 Jul 19 14:09:06, load: 0.01, TCPin: 2679 Kb, TCPout: 12 Kb

UID PID LADDR LPORT RADDR RPORT SIZE NAME
0 7138 192.168.1.3 44084 192.18.108.40 21 544 ftp
0 352 192.168.1.3 22 192.168.1.8 49805 1308 sshd
100 7134 192.168.1.3 44077 192.168.1.1 22 1618 ssh
0 7138 192.168.1.3 44089 24.98.83.96 51731 2877524 ftp

Now this is some serious bling!

Number 2: Who's paging and DTrace enhanced vmstat

With Solaris 9, the "-p" option was added to vmstat to break paging activity up into "executable," "anonymous" and "filesystem" page types:

$ vmstat -p 5
memory page executable anonymous filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf
1738152 832320 5 9 0 0 0 0 0 0 0 0 0 1 0 0
1683280 818800 0 2 0 0 0 0 0 0 0 0 0 0 0 0
1683280 818800 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This was super useful information, but unfortunatley doesn't provide the executable responsible for the paging activity. With the introduction of whospaging.d, you can get paging activity per process:

$ whospaging.d

Who's waiting for pagein (milliseconds):
Who's on cpu (milliseconds):
svc.configd 0
sendmail 0
svc.startd 0
sshd 0
nscd 1
dtrace 3
fsflush 14
dd 1581
sched 3284

Once we get the process name that is reponsible for the paging activity, we can use dvmstat to break down the types of pages the application is paging (similar to vmstat -p, but per process!):

$ dvmstat -p 0
re maj mf fr epi epo api apo fpi fpo sy
0 0 0 13280 0 0 0 0 0 13280 0
0 0 0 13504 0 0 0 0 0 13504 0
0 0 0 13472 0 0 0 0 0 13472 0
0 0 0 13472 0 0 0 0 0 13472 0
0 0 0 13248 0 0 0 0 0 13248 0
0 0 0 13376 0 0 0 0 0 13376 0
0 0 0 13664 0 0 0 0 0 13664 0

Once we have an idea of which pages are being paged in or out, we can use iosnoop, rwsnoop and fspaging.d to find out which files or devices the application is writing to! Since these rockin' scripts go hand in hand, I am placing them together. Shizam!

And without further ado, number 1 goes to ... (*drum roll*)

1. I/O Top

After careful thought, I decided to make iotop and rwtop #1 on my top ten list. I have long dreamed of a utility that could tell me which applications were actively generating I/O to a given file, device or file system. With the introduction of iotop and rwtop, my wish came true:

$ iotop 5

2005 Jul 19 13:33:15, load: 0.24, disk_r: 95389 Kb, disk_w: 0 Kb

UID PID PPID CMD DEVICE MAJ MIN D BYTES
0 99 1 nscd dad1 136 8 R 16384
0 7037 7033 find dad1 136 8 R 2266112
0 7036 7033 dd sd7 32 58 R 15794176
0 7036 7033 dd sd6 32 50 R 15826944
0 7036 7033 dd sd5 32 42 R 15826944
0 7036 7033 dd vxio21000 100 21000 R 47448064

$ rwtop 5
2005 Jul 24 10:47:26, load: 0.18, app_r: 9 Kb, app_w: 8 Kb

UID PID PPID CMD D BYTES
100 922 920 bash R 3
100 922 920 bash W 15
100 902 899 sshd R 1223
100 926 922 ls R 1267
100 902 899 sshd W 1344
100 926 922 ls W 2742
100 920 917 sshd R 2946
100 920 917 sshd W 4819
0 404 1 vxsvc R 5120

Conclusion

This article introduced several scripts that have helped me debug and solve problems on the Solaris 10 servers I support. For additional details on using dtrace to troubleshoot problems, you can review the documentation on Brendan Gregg's website. For additional documentation on DTrace, check out the DTrace users guide (now in hard cover format!). If you ave questions or comments on the article, please feel free to E-mail the author.

References

本项目采用C++编程语言结合ROS框架构建了完整的双机械臂控制系统,实现了Gazebo仿真环境下的协同运动模拟,并完成了两台实体UR10工业机器人的联动控制。该毕业设计在答辩环节获得98分的优异成绩,所有程序代码均通过系统性调试验证,保证可直接部署运行。 系统架构包含三个核心模块:基于ROS通信架构的双臂协调控制器、Gazebo物理引擎下的动力学仿真环境、以及真实UR10机器人的硬件接口层。在仿真验证阶段,开发了双臂碰撞检测算法和轨迹规划模块,通过ROS控制包实现了末端执行器的同步轨迹跟踪。硬件集成方面,建立了基于TCP/IP协议的实时通信链路,解决了双机数据同步和运动指令分发等关键技术问题。 本资源适用于自动化、机械电子、人工智能等专业方向的课程实践,可作为高年级课程设计、毕业课题的重要参考案例。系统采用模块化设计理念,控制核心与硬件接口分离架构便于功能扩展,具备工程实践能力的学习者可在现有框架基础上进行二次开发,例如集成视觉感知模块或优化运动规划算法。 项目文档详细记录了环境配置流程、参数调试方法和实验验证数据,特别说明了双机协同作业时的时序同步解决方案。所有功能模块均提供完整的API接口说明,便于使用者快速理解系统架构并进行定制化修改。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
【微电网】【创新点】基于非支配排序的蜣螂优化算法NSDBO求解微电网多目标优化调度研究(Matlab代码实现)内容概要:本文围绕基于非支配排序的蜣螂优化算法(NSDBO)在微电网多目标优化调度中的应用展开研究,提出了一种改进的智能优化算法以解决微电网系统中经济性、环保性和能源效率等多重目标之间的权衡问题。通过引入非支配排序机制,NSDBO能够有效处理多目标优化中的帕累托前沿搜索,提升解的多样性和收敛性,并结合Matlab代码实现仿真验证,展示了该算法在微电网调度中的优越性能和实际可行性。研究涵盖了微电网典型结构建模、目标函数构建及约束条件处理,实现了对风、光、储能及传统机组的协同优化调度。; 适合人群:具备一定电力系统基础知识和Matlab编程能力的研究生、科研人员及从事微电网、智能优化算法应用的工程技术人员;熟悉优化算法与能源系统调度的高年级本科生亦可参考。; 使用场景及目标:①应用于微电网多目标优化调度问题的研究与仿真,如成本最小化、碳排放最低与供电可靠性最高之间的平衡;②为新型智能优化算法(如蜣螂优化算法及其改进版本)的设计与验证提供实践案例,推动其在能源系统中的推广应用;③服务于学术论文复现、课题研究或毕业设计中的算法对比与性能测试。; 阅读建议:建议读者结合文中提供的Matlab代码进行实践操作,重点关注NSDBO算法的核心实现步骤与微电网模型的构建逻辑,同时可对比其他多目标算法(如NSGA-II、MOPSO)以深入理解其优势与局限,进一步开展算法改进或应用场景拓展。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值