void __free_pages(struct page *page, unsigned int order)
{
if (put_page_testzero(page)) {//检查页框是否还有进程在使用,就是检查_count变量的值是否为0
if (order == 0) //如果是1个页框,则放回每CPU高速缓存中
free_hot_cold_page(page, false);
else //如果是多个页框,则放回伙伴系统
__free_pages_ok(page, order);
}
}
__free_pages 有个概念order 为0时,free_hot_cold_page,其中的cold page 和 hot page 的解释如下,参考
- https://lwn.net/Articles/14768/
- https://blog.youkuaiyun.com/u012489236/article/details/107397096
One generally thinks of a system’s RAM as being the fastest place to keep data. But memory is slow; the real speed comes from working out of the onboard cache in the processor itself. Much effort has, over the years, gone into trying to optimize the kernel’s cache behavior and avoiding the need to go to main memory. The new page allocation system is just another step in that direction.
The processor cache contains memory which has been accessed recently. The kernel often has a good idea of which pages have seen recent accesses and are thus likely to be present in cache. The hot-n-cold patch tries to take advantage of that information by adding two per-CPU free page lists (for each memory zone). When a processor frees a page that is suspected to be “hot” (i.e. represented in that processor’s cache), it gets pushed onto the hot list; others go onto the cold list. The lists have high and low limits; after all, if the hot list grows larger than the processor’s cache, the chances of those pages actually being hot start to get pretty small.
When the kernel needs a page of memory, the new allocator normally tries to get that page from the processor’s hot list. Even if the page is simply going to be overwritten, it’s still better to use a cache-warm page. Interestingly, though, there are times when it makes sense to use a cold page instead. If the page is to be used for DMA read operations, it will be filled by the device performing the operation and the cache will be invalidated anyway. So 2.5.45 includes a new GPF_COLD page allocation flag for the situations where using a cold page makes more sense.
The use of per-CPU page lists also cuts down on lock contention, which also helps performance. When pages must be moved between the hot/cold lists and the main memory allocator, they are transferred in multi-page chunks, which also cuts down on lock contention and makes things go faster.
Andrew Morton has benchmarked this patch, and included a number of results with one of the patchsets. Performance benefits vary from a mere 1-2% on the all-important kernel compilation time to 12% on the SDET test. That was enough, apparently, to convince Linus.