linux监控多线程io

最新推荐文章于 2024-08-14 15:41:15 发布

zhangxinrun_业余erlang

最新推荐文章于 2024-08-14 15:41:15 发布

阅读量3.7k

点赞数 1

分类专栏： Linux系统文章标签： linux 多线程 io java 工具 file

Linux系统专栏收录该内容

184 篇文章

订阅专栏

本文探讨了Linux下多线程Java程序的IO监控方法，对比了几种常用的IO监控工具，并深入分析了/proc/pid/io文件中各项指标的含义。最终通过实践发现多线程程序的所有IO操作均汇总到主线程下。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这几天在做多线程程序io的监控，找了很多资料，试验了各种io监控工具，包括iotop，vmstat，nmon，iopp，dstat等，没有特别理想的监控工具（这些工具各有优点，主要是不能满足我的监控需要~）。

linux是以进程为单位分配资源的，linux 2.6内核版本已经支持进程io的监控，目录在 /proc/pid/io下，主要有以下几项：

+rchar: 323934931
+wchar: 323929600
+syscr: 632687
+syscw: 632675
+read_bytes: 0
+write_bytes: 323932160
+cancelled_write_bytes: 0

对它的含义有如下解释：

+Description
+———–
+rchar
+—–
+I/O counter: chars read
+The number of bytes which this task has caused to be read from storage. This
+is simply the sum of bytes which this process passed to read() and pread().
+It includes things like tty IO and it is unaffected by whether or not actual
+physical disk IO was required (the read might have been satisfied from
+pagecache)
+
+wchar
+—–
+I/O counter: chars written
+The number of bytes which this task has caused, or shall cause to be written
+to disk. Similar caveats apply here as with rchar.
+
+syscr
+—–
+I/O counter: read syscalls
+Attempt to count the number of read I/O operations, i.e. syscalls like read()
+and pread().
+
+syscw
+—–
+I/O counter: write syscalls
+Attempt to count the number of write I/O operations, i.e. syscalls like
+write() and pwrite().
+
+read_bytes
+———-
+I/O counter: bytes read
+Attempt to count the number of bytes which this process really did cause to
+be fetched from the storage layer. Done at the submit_bio() level, so it is
+accurate for block-backed filesystems.
+
+write_bytes
+———–
+I/O counter: bytes written
+Attempt to count the number of bytes which this process caused to be sent to
+the storage layer. This is done at page-dirtying time.
+
+cancelled_write_bytes
+———————
+The big inaccuracy here is truncate. If a process writes 1MB to a file and
+then deletes the file, it will in fact perform no writeout. But it will have
+been accounted as having caused 1MB of write.
+In other words: The number of bytes which this process caused to not happen,
+by truncating pagecache. A task can cause “negative” IO too. If this task
+truncates some dirty pagecache, some IO which another task has been accounted
+for (in it’s write_bytes) will not be happening. We _could_ just subtract that
+from the truncating task’s write_bytes, but there is information loss in doing
+that.

监控一个进程的io情况只要隔几秒读一次这个文件，然后计算io差值，除以间隔时间，就可以计算出io速度。

因为我要监控多线程的io情况，linux下java程序的线程对应成linux下LWP（light weight process）即轻量级进程，监控java多线程对应的linux下进程的io情况，一开始我收集了所有线程io的情况，加到一起，把这个结果与用nmon监控的io数进行比较，结果是我的结果远远大于用nomn监控到的结果。

网上搜了很多资料，没有找到原因，最后读了iopp。c的源代码，得到启发，最后终于发现，一个java多线程程序中，所有多线程的io都会记录到主线程的io下面，换个说法是子线程的io与主线程的io保持同步，不论这个子线程是否进行了io。

有了这个结论，我的问题很快得到解决，监控到主线程的io，这个结果就是这个多线程程序的io情况，或者是把主线程下的所有子线程的io都统计出来，然后除以线程个数，就是这个多线程程序的io。

本文来自优快云博客，转载请标明出处：http://blog.youkuaiyun.com/xinbobdog008/archive/2010/08/27/5843511.aspx