The Page Cache
The page cache is the main disk cache used by the Linux kernel. In most cases, the kernel refers to the page cache when reading from or writing to disk. New pages are added to the page cache to satisfy User Mode processes’s read requests. If the page is not already in the cache, a new entry is added to the cache and filled with the data read from the disk. If there is enough free memory, the page is kept in the cache for an indefinite period of time and can then be reused by other processes without accessing the disk.
Similarly, before writing a page of data to a block device, the kernel verifies whether the corresponding page is already included in the cache; if not, a new entry is added to the cache and filled with the data to be written on disk. The I/O data transfer does not start immediately: the disk update is delayed for a few seconds, thus giving a chance to the processes to further modify the data to be written (in other words, the kernel implements deferred write operations).
页面缓存
页面缓存是Linux内核使用的主要磁盘缓存。在大多数情况下,内核在读取或写入磁盘时引用页面缓存。新页面将添加到页面缓存中,以满足用户模式进程的读取请求。如果页面尚未在缓存中,则会向缓存添加一个新条目,并填充从磁盘读取的数据。如果有足够的可用内存,页面将无限期地保存在缓存中,然后可以由其他进程重用,而无需访问磁盘。
类似地,在将数据页写入块设备之前,内核验证相应页面是否已经包含在高速缓存中;如果没有,则将新条目添加到缓存中并填充要写入磁盘的数据。 I / O数据传输不会立即启动:磁盘更新会延迟几秒钟,从而使进程有机会进一步修改要写入的数据(换句话说,内核实现延迟写入操作)。
硬盘
最终文件总还是要储存在硬盘上的嘛。
$ fdisk -l
Disk /dev/cciss/c0d0: 146.7 GB, 146778685440 bytes
255 heads, 63 sectors/track, 17844 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
可以看到几个名词:heads/sectors/cylinders,分别就是磁头/扇区/柱面,每个扇区512byte(现在新的硬盘每个扇区有4K)了。
硬盘容量就是headssectorscylinders512=2556317844512=146771896320b=146.7G
注意:硬盘的最小存储单位就是扇区了,而且硬盘本身并没有block的概念。
操作系统的文件系统
文件系统不是一个扇区一个扇区的来读数据,太慢了,所以有了block(块)的概念,它是一个块一个块的读取的,block才是文件存取的最小单位。
先来知道是哪种文件系统
$ df -T
/dev/cciss/c0d0p5 ext3 112738028 81733116 25185772 77% /
OK,ext3文件系统 。
$ tune2fs -l /dev/cciss/c0d0p5 | grep “Block size”
Block size: 4096
一个block是4K,也就是说我所使用的文件系统中1个块是由连续的8个扇区组成。
简单的说扇区是对硬盘而言,块是对文件系统而言。
从应用程序包括用户界面的角度来看,存取信息的最小单位是Byte(字节);
从磁盘的物理结构来看存取信息的最小单位是扇区,一个扇区是512字节;
从操作系统对硬盘的存取管理来看,存取信息的最小单位是簇,簇是一个逻辑概念,一个簇可以是2、4、8、16、32或64个连续的扇区。一个簇只能被一个文件占用,哪怕是只有1个字节的文件,在磁盘上存储时也要占用一个簇,这个簇里剩下的扇区是无用的。例如用NTFS文件系统格式化的时候默认是8个扇区组成一个簇,即4096字节。所以你如果保存了一个只有1字节的文件(例如字母N),它在磁盘上实际也要占用4096字节(4K),所以“簇”也可以理解为操作系统存取信息的最小单位。
文件的最小访问单位是4KB,数据库为一个1至多个文件最小访问单位
概念
____扇区:磁盘的最小存储单位;
____磁盘块:文件系统读写数据的最小单位;
____页:内存的最小存储单位;
联系
____一个磁盘块由连续几个(2^n)扇区组成;
____页的大小为磁盘块大小的2^n倍;
查看
____页大小查看: getconf PAGE_SIZE,常见为4K;
____磁盘块大小查看:stat /boot/|grep “IO Block”,常见为4K;
____扇区大小查看:fdisk -l,常见为512Byte;
Linux Page Cache Basics
Under Linux, the Page Cache accelerates many accesses to files on non volatile storage. This happens because, when it first reads from or writes to data media like hard drives, Linux also stores data in unused areas of memory, which acts as a cache. If this data is read again later, it can be quickly read from this cache in memory. This article will supply valuable background information about this page cache.
Page Cache or Buffer Cache
The term, Buffer Cache, is often used for the Page Cache. Linux kernels up to version 2.2 had both a Page Cache as well as a Buffer Cache. As of the 2.4 kernel, these two caches have been combined. Today, there is only one cache, the Page Cache.[1]
Functional Approach
Memory Usage
Under Linux, the number of megabytes of main memory currently used for the page cache is indicated in the Cached column of the report produced by the free -m command.
[root@testserver ~]# free -m total used free shared buffers cached Mem: 15976 15195 781 0 167 9153 -/+ buffers/cache: 5874 10102 Swap: 2000 0 1999
Writing
If data is written, it is first written to the Page Cache and managed as one of its dirty pages. Dirty means that the data is stored in the Page Cache, but needs to be written to the underlying storage device first. The content of these dirty pages is periodically transferred (as well as with the system calls sync or fsync) to the underlying storage device. The system may, in this last instance, be a RAID controller or the hard disk directly.
The following example will show the creation of a 10-megabyte file, which will first be written to the Page Cache. The number of dirty pages will be increased by doing so, until they have been written to the underlying solid-state drive (SSD), in this case manually in response to the sync command.
wfischer@pc:~$ dd if=/dev/zero of=testfile.txt bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0,0121043 s, 866 MB/s wfischer@pc:~$ cat /proc/meminfo | grep Dirty Dirty: 10260 kB wfischer@pc:~$ sync wfischer@pc:~$ cat /proc/meminfo | grep Dirty Dirty: 0 kB
Up to Version 2.6.31 of the Kernel: pdflush
Up to and including the 2.6.31 version of the Linux kernel, the pdflush threads ensured that dirty pages were periodically written to the underlying storage device.
As of Version 2.6.32: per-backing-device based writeback
Since pdflush had several performance disadvantages, Jens Axboe developed a new, more effective writeback mechanism for Linux Kernel version 2.6.32. [2]
This approach provides threads for each device, as the following example of a computer with an SSD (/dev/sda) and a hard disk (/dev/sdb) shows.
root@pc:~# ls -l /dev/sda brw-rw---- 1 root disk 8, 0 2011-09-01 10:36 /dev/sda root@pc:~# ls -l /dev/sdb brw-rw---- 1 root disk 8, 16 2011-09-01 10:36 /dev/sdb root@pc:~# ps -eaf | grep -i flush root 935 2 0 10:36 ? 00:00:00 [flush-8:0] root 936 2 0 10:36 ? 00:00:00 [flush-8:16]
Reading
File blocks are written to the Page Cache not just during writing, but also when reading files. For example, when you read a 100-megabyte file twice, once after the other, the second access will be quicker, because the file blocks come directly from the Page Cache in memory and do not have to be read from the hard disk again. The following example shows that the size of the Page Cache has increased after a good, 200-megabytes video has been played.
user@adminpc:~$ free -m total used free shared buffers cached Mem: 3884 1812 2071 0 60 1328 -/+ buffers/cache: 424 3459 Swap: 1956 0 1956 user@adminpc:~$ vlc video.avi [...] user@adminpc:~$ free -m total used free shared buffers cached Mem: 3884 2056 1827 0 60 1566 -/+ buffers/cache: 429 3454 Swap: 1956 0 1956 user@adminpc:~$
If Linux needs more memory for normal applications than is currently available, areas of the Page Cache that are no longer in use will be automatically deleted.
Optimizing the Page Cache
Automatically storing file blocks in the Page Cache is generally quite advantageous. Some data, such as log files or MySQL dump files, are often no longer needed once they have been written. Therefore, such file blocks often use space in the Page Cache unnecessarily. Other file blocks, whose storage would be more advantageous, might be prematurely deleted from the Page Cache by newer log files or MySQL.[3]
Periodic execution of logrotate with gzip compression can help with log files, somewhat. When a 500-megabyte log file is compressed into 10 megabytes by logrotate and gzip, the original log file becomes invalid along with its cache space. 490 megabytes in the Page Cache will then become available by doing so. The danger of a continuously growing log file displacing more useful file blocks from the Page Cache is reduced thereby.
Therefore, it is completely reasonable, if some applications would normally not store certain files and file blocks in the cache. There is already such a patch available for rsync.[4]
ref:
https://www.oreilly.com/library/view/understanding-the-linux/0596005652/ch15s01.html
https://blog.youkuaiyun.com/my_bai/article/details/73331360
https://blog.youkuaiyun.com/ZK_J1994/article/details/72676862
https://blog.youkuaiyun.com/yangguosb/article/details/79877191