欢迎回到Kernel中……(看了2.6.29内核,做了些修订,关于it指令的)
继续说__vet_atags函数,这个函数仍旧定义在arch/arm/kernel/head-common.s文件中:
/* Determine validity of the r2 atags pointer. The heuristic requires
* that the pointer be aligned, in the first 16k of physical RAM and
* that the ATAG_CORE marker is first and present. Future revisions
* of this function may be more lenient with the physical address and
* may also be able to move the ATAGS block if necessary.
*
* r8 = machinfo
*
* Returns:
* r2 either valid atags pointer, or zero
* r5, r6 corrupted
*/
__vet_atags:
tst r2, #0x3 @ aligned?
bne 1f
ldr r5, [r2, #0] @ is first tag ATAG_CORE?
subs r5, r5, #ATAG_CORE_SIZE
bne 1f
ldr r5, [r2, #4]
ldr r6, =ATAG_CORE
cmp r5, r6
bne 1f
mov pc, lr @ atag pointer is ok
1: mov r2, #0
mov pc, lr
ENDPROC(__vet_atags)
由TI的芯片手册知,OMAP3430的SDRAM地址空间自0x80000000起,计1GB空间。函数开始的注释明确要求ATAG表需要在RAM物理地址前16KB,即0x80000000-0x80004000范围内,r2的0x80000100满足此要求。简单的判断了一下ATAG表的起始地址是否对齐,是否以ATAG_CORE作为开头标志,该函数返回。
__create_page_tables函数,这是要分析的重点函数,该函数就在arch/arm/kernel/head.s文件末尾。
先来看一个宏定义:
.macro pgtbl, rd
ldr /rd, =(KERNEL_RAM_PADDR - 0x4000)
.endm
KERNEL_RAM_PADDR的定义为:
#define KERNEL_RAM_PADDR (PHYS_OFFSET + TEXT_OFFSET)
PHYS_OFFSET在文件arch/arm/plat-omap/include/mach/memory.h中(怎么找到这里的,其实是arch/arm/kernel/head.s中包含了arch/arm/include/asm/memory.h中又包含了arch/arm/plat-omap/include/mach/memory.h的缘故……)定义为0x80000000,而TEXT_OFFSET则需要在Makefile中寻找到答案。arch/arm/Makefile中定义
……
textofs-y := 0x00008000
……
TEXT_OFFSET := $(textofs-y)
可见,TEXT_OFFSET的值为0x00008000,再回到宏pgtbl,可计算出KERNEL_RAM_PADDR的值为0x80000000+0x00008000=0x80008000最后赋给寄存器值为0x80008000-0x4000=0x80004000。
__create_page_tables函数的第一条语句为:
pgtbl r4 @ page table address
根据前面计算,寄存器r4赋值0x80004000,该地址是页表的起始地址,在kernel的入口地址(0x80008000)之前16KB。
之后,对该页表进行初始化操作。
第一步、将这16KB空间清零:
/*
* Clear the 16K level 1 swapper page table
*/
mov r0, r4
mov r3, #0
add r6, r0, #0x4000
1: str r3, [r0], #4
str r3, [r0], #4
str r3, [r0], #4
str r3, [r0], #4
teq r0, r6
bne 1b
第二步、填写内核所在页的页目录项
ldr r7, [r10, #PROCINFO_MM_MMUFLAGS] @ mm_mmuflags
这里需要再来回顾一下proc_info_list数据结构(在arch/arm/include/asm/procinfo.h文件中定义):
struct proc_info_list {
unsigned int cpu_val;
unsigned int cpu_mask;
unsigned long __cpu_mm_mmu_flags; /* used by head.S */
unsigned long __cpu_io_mmu_flags; /* used by head.S */
unsigned long __cpu_flush; /* used by head.S */
const char *arch_name;
const char *elf_name;
unsigned int elf_hwcap;
const char *cpu_name;
struct processor *proc;
struct cpu_tlb_fns *tlb;
struct cpu_user_fns *user;
struct cpu_cache_fns *cache;
};
指令ldr r7, [r10, #PROCINFO_MM_MMUFLAGS]取到了结构中的数据__cpu_mm_mmu_flags,送入寄存器r7,该参数在arch/arm/mm/proc-v7.s文件中定义如下:
.long PMD_TYPE_SECT | /
PMD_SECT_BUFFERABLE | /
PMD_SECT_CACHEABLE | /
PMD_SECT_AP_WRITE | /
PMD_SECT_AP_READ
计算下来,应该是:0b0000110000001110,即0x0c0e。
参照如下表格:arm采用的是段式页表,按照VMSA(Virtual Memory System Architecture)规定,每个表项描述1MB空间,16KB可存放4K个表项,覆盖4GB虚拟地址空间。关于页表的详细说明,可参阅ARM Architecture Reference Manual DDI0406B的B3章节。
mov r6, pc, lsr #20 @ start of kernel section
当前pc的高12位,即内核所在物理地址的索引放入寄存器r6;
orr r3, r7, r6, lsl #20 @ flags + kernel base
索引号与寄存器r7中的标志字节相或,合并后的描述符放入寄存器r3;
str r3, [r4, r6, lsl #2] @ identity mapping
每个描述符占4字节,所以内核的起始页目录项地址应该在[r4]+[r6]x4处,将r3的描述符内容送入其中;
第三步、根据当前内核所划分的虚拟地址空间,建立内核页表项,Linux默认内核在最高1G,因此内核起始虚拟地址一般为0xc0000000。
add r0, r4, #(KERNEL_START & 0xff000000) >> 18
KERNEL_START的定义如下:
#define KERNEL_START KERNEL_RAM_VADDR
KERNEL_RAM_VADDR的定义如下:
#define KERNEL_RAM_VADDR (PAGE_OFFSET + TEXT_OFFSET)
PAGE_OFFSET的定义,如果别处没有关怀过,那就应该是arch/arm/include/asm/memory.h文件中的如下定义了:
/*
* Page offset: 3GB
*/
#ifndef PAGE_OFFSET
#define PAGE_OFFSET UL(0xc0000000)
#endif
计算所得,的值应为0xc0008000,这是Kernel起始的虚拟地址,截取高14位,取高8位有效与表头地址相加,作为基址送入寄存器r0;这里之所以截取高14位,是由于section页表,可记录4K个表项,正好取虚拟地址的高12位作为页表的索引,每个表项占用4字节,则索引号需乘4,为14位。
str r3, [r0, #(KERNEL_START & 0x00f00000) >> 18]!
寄存器r3内容,是前面已经计算过的包含物理地址索引的页描述符,将其存放入寄存器r0增加偏移虚拟地址高12位中的低4位所指表项中,并更新r0的内容。
ldr r6, =(KERNEL_END - 1)
寄存器r6保留内核代码长度;
add r0, r0, #4
寄存器r0指向下一页表项;
add r6, r4, r6, lsr #18
根据内核长度计算出内核页表的最后入口;
1: cmp r0, r6
比较内核页表是否已填写完成;
add r3, r3, #1 << 20
物理地址索引加一;
it ls
小于等于判断,it指令是if-then块,具体内容,请参阅RealView编译工具《汇编器指南》(发现2.6.29内核把这句去掉了,草!);
strls r3, [r0], #4
如果小于等于成立,填写该表项;
bls 1b
如果小于等于则跳回标号1;
第四步、填写内存开始时的1MB段,因为这里有boot传来的内核启动参数。
/*
* Then map first 1MB of ram in case it contains our boot params.
*/
add r0, r4, #PAGE_OFFSET >> 18
orr r6, r7, #(PHYS_OFFSET & 0xff000000)
.if (PHYS_OFFSET & 0x00f00000)
orr r6, r6, #(PHYS_OFFSET & 0x00f00000)
.endif
str r6, [r0]
刚好一页,不用判断了。
最后,返回了……
mov pc, lr
回到arch/arm/kernel/head.s文件中填充完页表后的初始化程序:
/*
* The following calls CPU specific code in a position independent
* manner. See arch/arm/mm/proc-*.S for details. r10 = base of
* xxx_proc_info structure selected by __lookup_machine_type
* above. On return, the CPU will be ready for the MMU to be
* turned on, and r0 will hold the CPU control register value.
*/
ldr r13, __switch_data @ address to jump to after
@ mmu has been enabled
badr lr, __enable_mmu @ return (PIC) address
准备好了返回地址,调用如下:
ARM( add pc, r10, #PROCINFO_INITFUNC )
前面的初始化保证,寄存器r10指向procinfo,在文件arch/arm/kernel/asm-offsets.c文件中宏PROCINFO_INITFUNC的定义如下:
DEFINE(PROCINFO_INITFUNC, offsetof(struct proc_info_list, __cpu_flush));
到这里,是该轻易莲步关注arch/arm/include/asm/procinfo.h文件中的数据结构proc_info_list的时候了:
struct proc_info_list {
unsigned int cpu_val;
unsigned int cpu_mask;
unsigned long __cpu_mm_mmu_flags; /* used by head.S */
unsigned long __cpu_io_mmu_flags; /* used by head.S */
unsigned long __cpu_flush; /* used by head.S */
const char *arch_name;
const char *elf_name;
unsigned int elf_hwcap;
const char *cpu_name;
struct processor *proc;
struct cpu_tlb_fns *tlb;
struct cpu_user_fns *user;
struct cpu_cache_fns *cache;
};
此数据结构中,__cpu_flush一项,在arch/arm/mm/proc-v7.s中,定义为一条调用指令:b __v7_setup
因此add pc, r10, #PROCINFO_INITFUNC这一句辗转调用了__v7_setup函数。
/*
* __v7_setup
*
* Initialise TLB, Caches, and MMU state ready to switch the MMU
* on. Return in r0 the new CP15 C1 control register setting.
*
* We automatically detect if we have a Harvard cache, and use the
* Harvard cache control instructions insead of the unified cache
* control instructions.
*
* This should be able to cover all ARMv7 cores.
*
* It is assumed that:
* - cache type register is implemented
*/
__v7_setup:
adr r12, __v7_setup_stack @ the local stack
stmia r12, {r0-r5, r7, r9, r11, lr}
寄存器入栈__v7_setup_stack是个11个word的局部栈,就在本函数后面定义,stmia表明此处用的是升序栈;
bl v7_flush_dcache_all
清除数据缓存;这个函数定义在arch/arm/mm/cache-v7.s文件中(这里为什么不调用v7_flush_cache_all()将数据、指令缓冲区全部清除,而只清除数据缓冲区呢?);
ldmia r12, {r0-r5, r7, r9, r11, lr}
寄存器出栈;
/*
* On OMAP3 devices the auxilary control register can be accessed
* only is secure mode using SMI /PPA. The IBE bit is enabled at the
* u-boot level using SMI service. So no need to set that bit again.
*/
#ifndef CONFIG_ARCH_OMAP3430
#ifdef CONFIG_ARM_ERRATA_430973
mrc p15, 0, r10, c1, c0, 1 @ read aux control register
orr r10, r10, #(1 << 6) @ set IBE to 1
mcr p15, 0, r10, c1, c0, 1 @ write aux control register
#endif
#endif
清除分支预测缓冲区,对OMAP3来京不需要。
mov r10, #0
#ifdef HARVARD_CACHE
mcr p15, 0, r10, c7, c5, 0 @ I+BTB cache invalidate
#endif
哈佛结构的cache,v7不支持。
dsb
数据同步屏障是一种特殊的内存屏障。 只有当此指令执行完毕后,才会执行程序中位于此指令后的指令。 当满足以下条件时,此指令才会完成:
位于此指令前的所有显式内存访问均完成。
位于此指令前的所有高速缓存、跳转预测和 TLB 维护操作全部完成。
#ifdef CONFIG_MMU
mcr p15, 0, r10, c8, c7, 0 @ invalidate I + D TLBs
置指令和数据TLB无效(寄存器r10在前面已清零);
mcr p15, 0, r10, c2, c0, 2 @ TTB control register
TLB控制寄存器清零;
orr r4, r4, #TTB_RGN_OC_WB @ mark PTWs outer cacheable, WB
寄存器r4仍然保存着页表起始地址0x80004000,TTB_RGN_OC_WB被定义为3<<3,对应于转换表基址寄存器的RNG位,此配置为使能TLB缓存,写回模式,写时不分配空间;
mcr p15, 0, r4, c2, c0, 0 @ load TTB0
mcr p15, 0, r4, c2, c0, 1 @ load TTB1
将该参数写入转换表基址寄存器0和1;
mov r10, #0x1f @ domains 0, 1 = manager
mcr p15, 0, r10, c3, c0, 0 @ load domain access register
区域访问许可控制器配置为,D0、D1区域的访问不与TLB中的访问许可位校验,D3区域的访问会与TLB中的访问许可位校验(为什么这样配置,我现在还不清楚);
#endif
#if defined(CONFIG_ARCH_OMAP3)
@ OMAP3: L2EN bit accessed in nonsecure mode
@ L2 cache is enabled in the aux control register
mrc p15, 0, r0, c1, c0, 1
#ifdef CONFIG_CPU_L2CACHE_DISABLE
bic r0, r0, #0x2 @ disable L2 Cache
#else
orr r0, r0, #0x2 @ enable L2 Cache
#endif
mcr p15, 0, r0, c1, c0, 1 @ Enable the L2EN banked bit as well
#endif
使能L2 Cache;
#ifdef CONFIG_ARCH_OMAP34XX
#ifdef CONFIG_CPU_LOCKDOWN_TO_64K_L2
mov r10, #0xfc
mcr p15, 1, r10, c9, c0, 0
#endif
#ifdef CONFIG_CPU_LOCKDOWN_TO_128K_L2
mov r10, #0xf0
mcr p15, 1, r10, c9, c0, 0
#endif
#ifdef CONFIG_CPU_LOCKDOWN_TO_256K_L2
mov r10, #0x00
mcr p15, 1, r10, c9, c0, 0
#endif
以上是OMAP自定义的寄存器,配置了OMAP的L2 Cache大小,OMAP3430是256K;
#ifdef CONFIG_CPU_USER_L2_PLE_ACCESS
mov r10, #0x3
mcr p15, 0, r10, c11, c1, 0
使能应用程序访问PLE(preloading engine)寄存器通道0、1
#endif
#endif
adr r5, v7_crval
ldmia r5, {r5, r6}
mrc p15, 0, r0, c1, c0, 0 @ read control register
bic r0, r0, r5 @ clear bits them
orr r0, r0, r6 @ set them
mov pc, lr @ return to head.S:__ret
见如下定义:
v7_crval:
ARM( crval clear=0x0120c302, mmuset=0x00c0387d, ucset=0x00c0187c )
THUMB( crval clear=0x0120c302, mmuset=0x40c0387d, ucset=0x40c0187c )
控制寄存器值读入寄存器r0,清了一些位,又置了一些位,关键是,寄存器r0的值,已经使能了mmu,但是很显然,这里还没有写入CP15,这个激动人心的操作将由__enable_mmu函数来完成。
ENDPROC(__v7_setup)
打完收工!
结案陈词,这段程序关闭了指令和数据缓冲区,配置TLB寄存器及相应的控制寄存器,以及,计算好了mmu控制寄存器的初始化参数,就等着打开mmu,进入一个虚幻的世界了。