page _refcount和_mapcount字段

文章详细介绍了Linux内核中_page结构的_refcount和_mapcount字段,它们用于跟踪页面的使用状态。_refcount表示页面的引用次数,当为0时可能被释放;_mapcount则表示页面被映射到进程的次数。在不同场景如alloc_pages、add_to_page_cache_lru、pagecache以及进程间共享时,这两个计数器会相应增加。read/write系统调用生成的pagecache_mapcount通常是-1,因为它们不直接映射到用户空间页表。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

linux page有两个非常重要的引用计数字段_refcount和_mapcount,都是atomic_t类型,其中,_refcount表示内核中应用该page的次数。当_refcount = 0时,表示该page为空闲或者将要被释放。当_refcount > 0,表示该page页面已经被分配且内核正在使用,暂时不会被释放。

_refcount

内核中常用的加减_refcount应用计数的API:get_page()和put_page()


static inline void get_page(struct page *page)
{
    page = compound_head(page);
    /*
     * Getting a normal page or the head of a compound page
     * requires to already have an elevated page->_refcount.
     */
    VM_BUG_ON_PAGE(page_ref_count(page) <= 0, page);
    page_ref_inc(page);
}

static inline void put_page(struct page *page)
{
    page = compound_head(page);

    /*
     * For private device pages we need to catch refcount transition from
     * 2 to 1, when refcount reach one it means the private device page is
     * free and we need to inform the device driver through callback. See
     * include/linux/memremap.h and HMM for details.
     */
    if (IS_HMM_ENABLED && unlikely(is_device_private_page(page) ||
        unlikely(is_device_public_page(page)))) {
        put_zone_device_private_or_public_page(page);
        return;
    }

    if (put_page_testzero(page))
        __put_page(page);
}

增加_refcount场景
1)alloc_pages 分配成功_refcount = 1

2)设置page->private的时候

3)加入到address_space

注意page处于LRU的时候_refcount = 1

上面三个路径可以通过write写数据场景结合源码分析,我们知道write系统调用可以使用page cache加速写性能,我们就以该场景看下page->_refcount引用计数的变化情况。write写数据场景的page cache创建是mm/filemap.c : grab_cache_page_write_begin

/*
 * Find or create a page at the given pagecache position. Return the locked
 * page. This function is specifically for buffered writes.
 */
struct page *grab_cache_page_write_begin(struct address_space *mapping,
					pgoff_t index, unsigned flags)
{
	struct page *page;
	int fgp_flags = FGP_LOCK|FGP_WRITE|FGP_CREAT;
    ...

no_page:
	if (!page && (fgp_flags & FGP_CREAT)) {
        ..
		page = __page_cache_alloc(gfp_mask);
		if (!page)
			return NULL;

		if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
			fgp_flags |= FGP_LOCK;

		/* Init accessed so avoid atomic mark_page_accessed later */
		if (fgp_flags & FGP_ACCESSED)
			__SetPageReferenced(page);

		err = add_to_page_cache_lru(page, mapping, index, gfp_mask);
		if (unlikely(err)) {
			put_page(page);
			page = NULL;
			if (err == -EEXIST)
				goto repeat;
		}

		/*
		 * add_to_page_cache_lru locks the page, and for mmap we expect
		 * an unlocked page.
		 */
		if (page && (fgp_flags & FGP_FOR_MMAP))
			unlock_page(page);
	}

	return page;
}
  1. __page_cache_alloc通过alloc_page创建页面,page刚刚创建_refcount = 1
  2. add_to_page_cache_lru将page分别加入lru链表和address_space,这两个步骤中都会增加_refcount,该函数返回后_refcount = 3

add_to_page_cache_lru


int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
				pgoff_t offset, gfp_t gfp_mask)
{
	void *shadow = NULL;
	int ret;

	__SetPageLocked(page);
	ret = __add_to_page_cache_locked(page, mapping, offset,
					 gfp_mask, &shadow);
	if (unlikely(ret))
		__ClearPageLocked(page);
	else {
		...
		if (!(gfp_mask & __GFP_WRITE) && shadow)
			workingset_refault(page, shadow);
		lru_cache_add(page);
	}
	return ret;
}


static int __add_to_page_cache_locked(struct page *page,
				      struct address_space *mapping,
				      pgoff_t offset, gfp_t gfp_mask,
				      void **shadowp)
{
    ...
	get_page(page);
    ...
error:
	page->mapping = NULL;
	/* Leave page->index set: truncation relies upon it */
	put_page(page);
	return error;
}

void lru_cache_add(struct page *page)
{
    ...
	get_page(page);
    ...
}

_add_to_page_cache_locked和lru_cache_add都会增加_refcount引用计数。

4)page被映射到其他用户进程pte时,_refcount引用技数会加1。

例如子进程创建时共享父进程的地址空间,设置父进程的pte页表项内容到子进程中并增加该页的_refcount计数:

do_fork->copy_process->copy_mm->dup_mmap->copy_page_range->...->copy_pte_range->copy_one_pte函数。

5) 对于PG_swapable的页面,_add_to_swap_cache会增加_refcount引用计数

6)内核对页面进程操作的一些关键路径上也会增加_refcount。比如内核的follow_page和get_user_pages

_mapcount

_mapcount表示这个页面被进程映射的个数,即已经映射了多少个用户pte页表。

  • _mapcount = -1,表示没有pte映射到该页面中
  • _mapcount = 0,表示只有父进程映射了页面,匿名页面刚分配时,_mapcount = 0
  • _mapcount > 0,表示除了父进程外还有其他进程映射了这个页面,同样以子进程创建共享父进程地址空间为例,设置父进程的pte页表项到子进程中并增加该页面的_mapcount。

get_page增加_refcount,page_dump_rmap增加_mapcount

问题:read/write系统调用产生的page cache _mapcount是多少

其实是-1。因为这些page cache是单纯内核alloc_pages产生的,本质上是没有映射用户空间页表的。

101.664615: Call trace: 101.664617: dump_backtrace+0xf0/0x140 101.664627: show_stack+0x18/0x28 101.664631: dump_stack_lvl+0x70/0xa4 101.664639: __kernel_unpoison_pages+0x16c/0x1ac 101.664647: post_alloc_hook+0x184/0x1ac 101.664653: prep_new_page+0x28/0x188 101.664658: get_page_from_freelist+0x1bb8/0x1d54 101.664664: __alloc_pages+0xe8/0x2cc 101.664670: __erofs_allocpage+0x88/0xb8 101.664677: z_erofs_do_read_page+0x8f0/0xcb4 101.664683: z_erofs_readahead+0x1f8/0x378 101.664690: read_pages+0x80/0x38c 101.664695: page_cache_ra_unbounded+0x1c4/0x238 101.664701: page_cache_ra_order+0x2c4/0x350 101.664706: do_sync_mmap_readahead+0x27c/0x56c 101.664713: filemap_fault+0x1c0/0xa78 101.664718: handle_mm_fault+0x558/0x20c0 101.664723: do_page_fault+0x20c/0x4b0 101.664730: do_translation_fault+0x38/0x54 101.664736: do_mem_abort+0x58/0x118 101.664742: el0_da+0x48/0x84 101.664748: el0t_64_sync_handler+0x98/0xbc 101.664753: el0t_64_sync+0x1a8/0x1ac 101.664761: page:fffffffe24132000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x984c80 101.664770: flags: 0x2000000000000000(zone=1|kasantag=0x0) 101.664779: page_type: 0xffffffff() 101.664787: raw: 2000000000000000 dead000000000100 dead000000000122 0000000000000000 101.664796: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 101.664802: page dumped because: pagealloc: corrupted page details 101.664809: page_owner info is not present (never set?) 101.667247: Unable to handle kernel paging request at virtual address ffffff891b25a00c 101.667254: Mem abort info: 101.667257: ESR = 0x0000000096000021 101.667261: EC = 0x25: DABT (current EL), IL = 32 bits 101.667265: SET = 0, FnV = 0 101.667268: EA = 0, S1PTW = 0 101.667272: FSC = 0x21: alignment fault 101.667275: Data abort info:
最新发布
03-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值