Linux内核学习之 -- ARMv8中的Current宏

原创已于 2024-11-12 16:11:31 修改 · 1.6k 阅读

25 ·

CC 4.0 BY-SA版权

文章标签：

#linux #学习 #运维

于 2024-09-16 19:24:29 首次发布

文章目录

环境
current宏

环境

linux 4.19

current宏

定义在arch/arm64/include/asm/current.h：

#define current get_current()
...
...
static __always_inline struct task_struct *get_current(void)
{
	unsigned long sp_el0;

	asm ("mrs %0, sp_el0" : "=r" (sp_el0));

	return (struct task_struct *)sp_el0;
}

从代码中可以看到，这个current宏是通过读取sp_el0的值来获取当前进程对应的struct task_struct。

在x86下，是把thread_info放在内核栈最底端，然后先要找到thread_info再找到struct task_struct。归根结底是因为x86寄存器太少，而像struct task_struct这么重要且经常使用的结构体，应该放到某个容易寻找的位置或者寄存器中最合适，方便快速访问。这是《Linux内核设计与实现》这本书中介绍的方法，通过地址偏移来获取thread_info。但是现在已经不这么做了，等会将介绍现在是如何做的。

在ARMv8中，可以通过sp_el0来存放当前进程struct task_struct的位置的，请看：ThreadInfo结构和内核栈的两种关系。看完这篇文章后，可以知道：

存在两种thread_info架构：①与x86一样，放在内核栈最底端，这样做可以通过地址偏移来获取到内核栈thread_info，然后再通过thread_info获取到struct task_struct；②开启CONFIG_THREAD_INFO_IN宏之后，thread_info放在struct task_struct中，struct task_struct
ARMv8使用的是第二种架构，所以需要通过某个能够获取到的信息来保存struct task_struct的地址
在ARMv8中，使用sp_el0来保存当前进程的struct task_struct
这篇文章还提到了，进程切换时也会切换sp_el0，时刻保持sp_el0存储的是当前进程的struct task_struct地址。

可以看到，对于arm64来讲，current宏是从sp_el0寄存器中获取的。但是有一点不理解，就是sp_el0不是用户态的程序使用的吗？啥时候被换成用来保存struct task_struct了呢？这就涉及到两个问题：

何时更改sp_el0指向的地址？也就是说什么时候使sp_el0指向当前进程的struct task_struct？
在使用sp_el0保存struct task_struct之前，这个结构体保存在哪里？

首先说一说x86现在的做法，已经不再是之前那样了。现在在x86体系下，Linux内核定义了名为current_task的Per-CPU变量，每个CPU上当前运行的进程task_struct指针都保存current_task变量中。

为了解决ARMv8是怎么做的，需要先从第1个问题入手，就是什么时候sp_el0保存的内容进行切换。在用户态下是不能使用current宏的，而且用户态下sp_el0有自己的用途，就是指向进程地址空间的栈。所以sp_el0指向的地址的切换，一定是在用户态切换到内核态时完成的。

用户态切内核态就几种方法，同步异常、中断等等，这些都离不开中断向量表。关于ARMv8的系统调用/中断等，可以看一下我写的另外一篇博客：Linux内核学习之 – ARMv8架构的系统调用笔记。我们以系统调用el0_sync为例(当时分析系统调用的时候基于linux 4.19内核，现在5.15内核已经不叫这个函数名了)：

el0_sync:
	kernel_entry 0
	mrs	x25, esr_el1			// read the syndrome register， 寄存器esr_el1是在权限级EL1下可以访问的系统寄存器，该寄存器的相关状态就表明了异常发生的具体原因。
	lsr	x24, x25, #ESR_ELx_EC_SHIFT	// exception class, lsr: 逻辑右移指令，实现将寄存器进行右移操作, 將x25寄存器的值右移ESR_ELx_EC_SHIFT位后赋值给x24寄存器
	cmp	x24, #ESR_ELx_EC_SVC64		// SVC in 64-bit state
	b.eq	el0_svc					// b.eq:表示条件分支指令，当某个条件满足时，跳转到某个地址
	cmp	x24, #ESR_ELx_EC_DABT_LOW	// data abort in EL0
	b.eq	el0_da
	cmp	x24, #ESR_ELx_EC_IABT_LOW	// instruction abort in EL0
	b.eq	el0_ia
	cmp	x24, #ESR_ELx_EC_FP_ASIMD	// FP/ASIMD access
	b.eq	el0_fpsimd_acc
	cmp	x24, #ESR_ELx_EC_SVE		// SVE access
	b.eq	el0_sve_acc
	cmp	x24, #ESR_ELx_EC_FP_EXC64	// FP/ASIMD exception
	b.eq	el0_fpsimd_exc
	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
	b.eq	el0_sys
	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
	b.eq	el0_sp_pc
	cmp	x24, #ESR_ELx_EC_PC_ALIGN	// pc alignment exception
	b.eq	el0_sp_pc
	cmp	x24, #ESR_ELx_EC_UNKNOWN	// unknown exception in EL0
	b.eq	el0_undef
	cmp	x24, #ESR_ELx_EC_BREAKPT_LOW	// debug exception in EL0
	b.ge	el0_dbg
	b	el0_inv

其中有个macro宏，叫kernel_entry，这是系统调用的入口处理函数，也是大部分中断/同步异常的入口处理函数：

arch/arm64/kernel/entry.S：
	.macro	kernel_entry, el, regsize = 64
	.if	\regsize == 32
	mov	w0, w0				// zero upper 32 bits of x0
	.endif
	stp	x0, x1, [sp, #16 * 0]			// 此时sp已经被硬件自动切换到ELx状态的栈地址(x > 0)，比如系统调用，已经切换到内核态EL1
	stp	x2, x3, [sp, #16 * 1]			// 就系统调用而言，现在要做的是保存用户态的x0~x29寄存器的数据
	stp	x4, x5, [sp, #16 * 2]
	stp	x6, x7, [sp, #16 * 3]
	stp	x8, x9, [sp, #16 * 4]
	stp	x10, x11, [sp, #16 * 5]
	stp	x12, x13, [sp, #16 * 6]
	stp	x14, x15, [sp, #16 * 7]
	stp	x16, x17, [sp, #16 * 8]
	stp	x18, x19, [sp, #16 * 9]
	stp	x20, x21, [sp, #16 * 10]
	stp	x22, x23, [sp, #16 * 11]
	stp	x24, x25, [sp, #16 * 12]
	stp	x26, x27, [sp, #16 * 13]
	stp	x28, x29, [sp, #16 * 14]

	.if	\el == 0
	clear_gp_regs
	mrs	x21, sp_el0
	ldr_this_cpu	tsk, __entry_task, x20	// Ensure MDSCR_EL1.SS is clear,这里有个静态全局变量：__entry_task，是个perCPU变量。
	ldr	x19, [tsk, #TSK_TI_FLAGS]	// since we can unmask debug
	disable_step_tsk x19, x20		// exceptions when scheduling.

	apply_ssbd 1, x22, x23

	.else
	add	x21, sp, #S_FRAME_SIZE
	get_thread_info tsk
	/* Save the task's original addr_limit and set USER_DS */
	ldr	x20, [tsk, #TSK_TI_ADDR_LIMIT]
	str	x20, [sp, #S_ORIG_ADDR_LIMIT]
	mov	x20, #USER_DS
	str	x20, [tsk, #TSK_TI_ADDR_LIMIT]
	/* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */
	.endif /* \el == 0 */
	mrs	x22, elr_el1
	mrs	x23, spsr_el1
	stp	lr, x21, [sp, #S_LR]

	/*
	 * In order to be able to dump the contents of struct pt_regs at the
	 * time the exception was taken (in case we attempt to walk the call
	 * stack later), chain it together with the stack frames.
	 */
	.if \el == 0
	stp	xzr, xzr, [sp, #S_STACKFRAME]
	.else
	stp	x29, x22, [sp, #S_STACKFRAME]
	.endif
	add	x29, sp, #S_STACKFRAME

#ifdef CONFIG_ARM64_SW_TTBR0_PAN
	/*
	 * Set the TTBR0 PAN bit in SPSR. When the exception is taken from
	 * EL0, there is no need to check the state of TTBR0_EL1 since
	 * accesses are always enabled.
	 * Note that the meaning of this bit differs from the ARMv8.1 PAN
	 * feature as all TTBR0_EL1 accesses are disabled, not just those to
	 * user mappings.
	 */
alternative_if ARM64_HAS_PAN
	b	1f				// skip TTBR0 PAN
alternative_else_nop_endif

	.if	\el != 0
	mrs	x21, ttbr0_el1
	tst	x21, #TTBR_ASID_MASK		// Check for the reserved ASID
	orr	x23, x23, #PSR_PAN_BIT		// Set the emulated PAN in the saved SPSR
	b.eq	1f				// TTBR0 access already disabled
	and	x23, x23, #~PSR_PAN_BIT		// Clear the emulated PAN in the saved SPSR
	.endif

	__uaccess_ttbr0_disable x21
1:
#endif

	stp	x22, x23, [sp, #S_PC]

	/* Not in a syscall by default (el0_svc overwrites for real syscall) */
	.if	\el == 0
	mov	w21, #NO_SYSCALL
	str	w21, [sp, #S_SYSCALLNO]
	.endif

	/*
	 * Set sp_el0 to current thread_info.
	 */
	.if	\el == 0
	msr	sp_el0, tsk			// 把得到的tsk,也就是该进程的struct task_struct放在sp_el0中了。
	.endif

	/*
	 * Registers that may be useful after this macro is invoked:
	 *
	 * x21 - aborted SP
	 * x22 - aborted PC
	 * x23 - aborted PSTATE
	*/
	.endm

关注一下其中一行汇编：

	msr	sp_el0, tsk			// 把得到的tsk,也就是该进程的struct task_struct放在sp_el0中了。

msr操作sp_el0，把tsk的值存到sp_el0中，所以很显然，tsk目前保存的就是当前进程的struct task_struct。现在解决了第一个问题：sp_el0何时保存了当前进程的struct task_struct。现在问题的关键在于第二点，tsk从哪里得到的呢？这需要看另外一行汇编：

arch/arm64/kernel/entry.S：
	ldr_this_cpu	tsk, __entry_task, x20	// Ensure MDSCR_EL1.SS is clear,这里有个静态全局变量：__entry_task，是个perCPU变量。

arch/arm64/include/asm/assembler.h：
	/*
	 * @dst: Result of READ_ONCE(per_cpu(sym, smp_processor_id()))
	 * @sym: The name of the per-cpu variable
	 * @tmp: scratch register
	 */
	.macro ldr_this_cpu dst, sym, tmp
	adr_l	\dst, \sym
alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
	mrs	\tmp, tpidr_el1			
alternative_else
	mrs	\tmp, tpidr_el2
alternative_endif
	ldr	\dst, [\dst, \tmp]			/* 这一句有点看不明白，后续再研究研究，这一句等价于ldr tsk [__entry_task, tpidr_el1]*/
	.endm

关于这几个alternative函数，没找到啥资料，只有官方资料：

Syntax of the Framework's Macro
The macro syntax is similar to an if-then-else statement and is prefixed with the word alternative_. 
For example, the alternative_if is similar to the if statement, the alternative_if_not is similar to 
the if not, the alternative_else is similar to an else statement, and so on. The if macro marks the 
beginning of a code section, and the else macro starts a new code section. Finally, an endif macro 
ends the clause.

大致意思就是，在macro中，这几个函数和if，elss，endif没啥区别。所以这一段是通过这个if else判断是在EL1(内核态)还是EL2(虚拟机)，从而获取到对应等级的tpidr_elx，这个寄存器会保存当前运行进程的pid。

一开始没看懂最后一句什么意思，因为被一些博客误导了。很多博客都说这个寄存器存放的是运行在cpu上的线程的id，所以一直在想这与ldr这条汇编指令有什么关系。但其实不是的，真正存放的是percpu的offset值，会在函数启动时进行初始化，会在其他笔记中介绍一下。

根据这个macro宏的注释就知道，这个宏函数会根据cpu的id，返回一个per_cpu的变量。在这里，我们返回的就是当前cpu的__entry_task变量。**该变量存放进程的进程描述符地址.**这样就解决了第二个问题，在sp_el0保存task_struct之前，task_struct存放在哪里。

接下来就涉及到__entry_task这个变量本身的问题，这个单独写了一篇博客，记录了一下percpu变量的学习笔记：LInux内核学习 – perCPU变量