使用iostat分析问题

2019年11月25日2019年11月25日 ephuizilinux, storage

文章来源: How I Use Iostat and Vmstat for Performance Analysis

考虑到svctm在iostat是不保证准确的. 分析io await相关问题需要和blktrace结合起来分析.

1. Is the I/O heavy?

Check the sum of w/s and r/s. The larger, the heavier I/O.
Also check %util, the more, the heavier.
- If it is close to 100, then the I/O is definitely significant. It should be noted that during writing,
- if the disk is the bottleneck (%util is 100% for a long time), but the applications keep writing, as long as the dirty pages exceeds 30% of memory, the system will block all the write system call, no matter sync or async, and focuses on writing to the disk. Once this occurs, the entire system is slow as hell.
- check wa in vmstat log.

2. How many processes are doing I/O concurrently?

Check b in vmstat log. If the value is large, then the concurrence is at a high level.

3. Is the I/O sequential or random?

Check rrqm/s and r/s.

If rrqm/s is large, then there’re many sequential write.
If r/s is large, then random writes.
Same for wrqm/s and w/s.
Also Check avgrq-sz. The larger, the more likely to be sequential. It would be better to get the distribution of different sizes of I/O.

4. Are the I/O requests bursting or balanced?

Note that if the data indicates that bursting does exist, it may not be caused by application’s behavior, but by the buffering mechanism of the OS.

5. What is the read/write ratio?

It is easy to get from w/s and r/s. Useful if the device has different performance for read and write.

6. How about latency and throughput?

rkB/s, wkB/s for throughput
If the I/O is heavy but throughput is low, it is likely that most of the I/O are random. Recheck that.
Burst may also affect the latency.

7. Find the bottleneck

The bottleneck could be:
- the device
- the CPU
- the I/O scheduler
- the file system
- the application
- or other.
If the %util is approaching 100%, the disk is likely to be the bottleneck.
If the I/O is much random, you should also check the application. I don’t think you can figure out the OS as the bottleneck, since the data is below the layer of I/O scheduler and FS. iostat is at the device level. In order to get more info, you can try strace together.

## CPU Burst

Syed Muhammad Muzammil
Sep 25th, 2013

CPU Burst :- “The amount of time the process uses the processor before it is no longer ready”.
Types Of CPU burst :-
1. Long burst : (“Process is CPU bound”)
2. Short burst : (“Process I/O bound”)

I/O Burst :- “Input/Output burst is that after completion the input burst CPU do process on that job”.
Explanation :- CPU burst is like a car and input Input burst is like a pedestrian , because CPU speed is much faster than Input Output burst , we can not reduce the speed of CPU burst but we can increase the Input Output speed.

Think of a “burst” as a brief stretch of “run as fast as you can go until you can’t.”

A CPU bursts when it is executing instructions; an I/O system bursts when it services requests to fetch information. The idea is that each component operates until it can’t.

A CPU can run instructions from cache until it needs to fetch more instructions or data from memory. That ends the CPU burst and starts the I/O burst. The I/O burst read or writes data until the requested data is read/written or the space to store it cache runs out. That ends an I/O burst.

The magic of an OS is the act of managing and scheduling these activities to maximize the use of the resources and minimize wait and idle time.

In real life, a lot of this activity can be managed in parallel, but in OS design you should consider the dependencies between I/O and CPU activities and events and make sure that each subsystem can burst as effectively as possible.