74.474853] ACPI: Low-level resume complete
[ 74.480221] PM: Restoring platform NVS memory
[ 74.489112] Enabling non-boot CPUs ...
[ 74.493513] x86: Booting SMP configuration:
[ 74.498341] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 74.507285] cache: parent cpu1 should not be sleeping
[ 74.513420] CPU1 is up
[ 74.516196] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 74.525792] cache: parent cpu2 should not be sleeping
[ 74.532148] CPU2 is up
[ 74.534929] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 74.543873] cache: parent cpu3 should not be sleeping
[ 74.549929] CPU3 is up
[ 74.552669] smpboot: Booting Node 0 Processor 4 APIC 0x4
[ 74.562247] cache: parent cpu4 should not be sleeping
[ 74.568451] CPU4 is up
[ 74.571326] smpboot: Booting Node 0 Processor 5 APIC 0x5
[ 74.580175] cache: parent cpu5 should not be sleeping
[ 74.586215] CPU5 is up
[ 74.588939] smpboot: Booting Node 0 Processor 6 APIC 0x6
[ 74.598292] cache: parent cpu6 should not be sleeping
[ 74.604653] CPU6 is up
[ 74.607406] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 74.616139] cache: parent cpu7 should not be sleeping
[ 74.622206] CPU7 is up
[ 74.624915] smpboot: Booting Node 0 Processor 8 APIC 0x8
[ 74.634096] cache: parent cpu8 should not be sleeping
[ 74.640432] CPU8 is up
[ 74.643389] smpboot: Booting Node 0 Processor 9 APIC 0x9
[ 74.651888] cache: parent cpu9 should not be sleeping
[ 74.657977] CPU9 is up
[ 74.660680] smpboot: Booting Node 0 Processor 10 APIC 0xa
[ 74.669746] cache: parent cpu10 should not be sleeping
[ 74.676259] CPU10 is up
[ 74.679161] smpboot: Booting Node 0 Processor 11 APIC 0xb
[ 74.687636] cache: parent cpu11 should not be sleeping
[ 74.693824] CPU11 is up
[ 74.696603] smpboot: Booting Node 0 Processor 12 APIC 0xc
[ 74.702671] general protection fault: 0000 [#1] SMP NOPTI
[ 74.708739] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G O 4.19.0-desktop-amd64 #3100
[ 74.718575] Hardware name: Suma W3330H0/22DB4, BIOS CWTQ051207 06/02/2021
[ 74.726191] RIP: 0010:switch_mm_irqs_off+0x3ec/0x4f0
[ 74.731763] Code: 7d 08 49 83 c5 18 4c 89 fa 31 f6 e8 ce 56 99 00 49 8b 45 00 48 85 c0 75 e5 e9 73 ff ff ff b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e9 86 fc ff ff e9 81 00 00 00 48 c7 c2 e0 19 02 00 31 c0 65
[ 74.752755] RSP: 0018:ffffa98a4329be30 EFLAGS: 00010046
[ 74.758616] RAX: 0000000000000001 RBX: ffff8e3a8be02200 RCX: 0000000000000049
[ 74.766619] RDX: 0000000000000000 RSI: ffff8e3a8be02200 RDI: ffff8e3a8ba2d880
[ 74.774612] RBP: ffffffffbd07ca60 R08: ffff8e3a9ee22ae0 R09: 0000000000000000
[ 74.782605] R10: 0000000000000000 R11: 0000001113e35491 R12: 0000000000000008
[ 74.790598] R13: ffff8e3a8ba2d880 R14: ffffffffbd07ca60 R15: ffff8e3a8be02200
[ 74.798598] FS: 0000000000000000(0000) GS:ffff8e3a9ee00000(0000) knlGS:0000000000000000
[ 74.807662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 74.814103] CR2: 0000000000000000 CR3: 000000062da0a000 CR4: 00000000003406e0
[ 74.822096] Call Trace:
[ 74.824865] __schedule+0x260/0x840
[ 74.828794] schedule_idle+0x1e/0x40
[ 74.832812] do_idle+0x165/0x250
[ 74.836450] cpu_startup_entry+0x6f/0x80
[ 74.840854] start_secondary+0x1a4/0x200
[ 74.845264] secondary_startup_64+0xa4/0xb0
[ 74.849963] Modules linked in: dm_mod bnep fuse cfg80211 st sr_mod amd64_edac_mod joydev cdrom edac_mce_amd kvm_amd bluetooth drbg ansi_cprng ecdh_generic rfkill nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd kvm soundcore irqbypass sg pcspkr efi_pstore efivars k10temp ccp pcc_cpufreq evdev acpi_cpufreq mincores(O) i2c_dev vfs_monitor(O) binfmt_misc efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_generic zstd_decompress xxhash amdgpu chash gpu_sched hid_generic usbhid hid sd_mod radeon i2c_algo_bit ahci ttm libahci drm_kms_helper libata r8169 realtek crc32c_intel scsi_mod drm libphy button
[ 74.926832] ---[ end trace a4da77854ee63e9f ]---
[ 74.932012] RIP: 0010:switch_mm_irqs_off+0x3ec/0x4f0
[ 74.937582] Code: 7d 08 49 83 c5 18 4c 89 fa 31 f6 e8 ce 56 99 00 49 8b 45 00 48 85 c0 75 e5 e9 73 ff ff ff b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e9 86 fc ff ff e9 81 00 00 00 48 c7 c2 e0 19 02 00 31 c0 65
[ 74.958575] RSP: 0018:ffffa98a4329be30 EFLAGS: 00010046
[ 74.964434] RAX: 0000000000000001 RBX: ffff8e3a8be02200 RCX: 0000000000000049
[ 74.972419] RDX: 0000000000000000 RSI: ffff8e3a8be02200 RDI: ffff8e3a8ba2d880
[ 74.980412] RBP: ffffffffbd07ca60 R08: ffff8e3a9ee22ae0 R09: 0000000000000000
[ 74.988405] R10: 0000000000000000 R11: 0000001113e35491 R12: 0000000000000008
[ 74.996398] R13: ffff8e3a8ba2d880 R14: ffffffffbd07ca60 R15: ffff8e3a8be02200
[ 75.004391] FS: 0000000000000000(0000) GS:ffff8e3a9ee00000(0000) knlGS:0000000000000000
[ 75.013454] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 75.019893] CR2: 0000000000000000 CR3: 000000062da0a000 CR4: 00000000003406e0
[ 75.027888] Kernel panic - not syncing: Attempted to kill the idle task!
[ 76.307847] Shutting down cpus with NMI
[ 76.312170] Kernel Offset: 0x3b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 76.324234] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
$ cat obj.txt |grep switch_mm_irq
ffffffff8106d480 <switch_mm_irqs_off>:
ffffffff8106d4b5: 0f 84 1d 02 00 00 je ffffffff8106d6d8 <switch_mm_irqs_off+0x258>
ffffffff8106d4c1: 74 43 je ffffffff8106d506 <switch_mm_irqs_off+0x86>
ffffffff8106d4cd: 74 37 je ffffffff8106d506 <switch_mm_irqs_off+0x86>
ffffffff8106d4cf: e9 2d 00 00 00 jmpq ffffffff8106d501 <switch_mm_irqs_off+0x81>
ffffffff8106d4ec: 74 0b je ffffffff8106d4f9 <switch_mm_irqs_off+0x79>
ffffffff8106d4f3: 0f 85 67 03 00 00 jne ffffffff8106d860 <switch_mm_irqs_off+0x3e0>
ffffffff8106d521: 0f 84 89 03 00 00 je ffffffff8106d8b0 <switch_mm_irqs_off+0x430>
ffffffff8106d52e: 74 0f je ffffffff8106d53f <switch_mm_irqs_off+0xbf>
ffffffff8106d546: 74 0c je ffffffff8106d554 <switch_mm_irqs_off+0xd4>
ffffffff8106d569: 0f 85 04 03 00 00 jne ffffffff8106d873 <switch_mm_irqs_off+0x3f3>
ffffffff8106d59b: 0f 84 0a 01 00 00 je ffffffff8106d6ab <switch_mm_irqs_off+0x22b>
ffffffff8106d5a7: 75 d6 jne ffffffff8106d57f <switch_mm_irqs_off+0xff>
ffffffff8106d5be: 0f 87 1e 03 00 00 ja ffffffff8106d8e2 <switch_mm_irqs_off+0x462>
ffffffff8106d5e0: eb 1b jmp ffffffff8106d5fd <switch_mm_irqs_off+0x17d>
ffffffff8106d645: 0f 84 f0 00 00 00 je ffffffff8106d73b <switch_mm_irqs_off+0x2bb>
ffffffff8106d671: 74 0f je ffffffff8106d682 <switch_mm_irqs_off+0x202>
ffffffff8106d69a: 0f 85 27 02 00 00 jne ffffffff8106d8c7 <switch_mm_irqs_off+0x447>
ffffffff8106d6bb: 0f 87 1b ff ff ff ja ffffffff8106d5dc <switch_mm_irqs_off+0x15c>
ffffffff8106d6d3: e9 4f ff ff ff jmpq ffffffff8106d627 <switch_mm_irqs_off+0x1a7>
ffffffff8106d6df: 74 bf je ffffffff8106d6a0 <switch_mm_irqs_off+0x220>
ffffffff8106d6ec: 72 b2 jb ffffffff8106d6a0 <switch_mm_irqs_off+0x220>
ffffffff8106d713: 0f 84 ed fd ff ff je ffffffff8106d506 <switch_mm_irqs_off+0x86>
ffffffff8106d736: e9 cb fd ff ff jmpq ffffffff8106d506 <switch_mm_irqs_off+0x86>
ffffffff8106d761: 0f 84 1b ff ff ff je ffffffff8106d682 <switch_mm_irqs_off+0x202>
ffffffff8106d767: e9 07 ff ff ff jmpq ffffffff8106d673 <switch_mm_irqs_off+0x1f3>
ffffffff8106d77d: 0f 83 a4 fe ff ff jae ffffffff8106d627 <switch_mm_irqs_off+0x1a7>
ffffffff8106d78f: 0f 85 72 01 00 00 jne ffffffff8106d907 <switch_mm_irqs_off+0x487>
ffffffff8106d7b3: 74 1e je ffffffff8106d7d3 <switch_mm_irqs_off+0x353>
ffffffff8106d7d1: 75 e6 jne ffffffff8106d7b9 <switch_mm_irqs_off+0x339>
ffffffff8106d7e7: e9 3b fe ff ff jmpq ffffffff8106d627 <switch_mm_irqs_off+0x1a7>
ffffffff8106d7fd: 0f 83 24 fe ff ff jae ffffffff8106d627 <switch_mm_irqs_off+0x1a7>
ffffffff8106d80f: 0f 85 eb 00 00 00 jne ffffffff8106d900 <switch_mm_irqs_off+0x480>
ffffffff8106d833: 74 9e je ffffffff8106d7d3 <switch_mm_irqs_off+0x353>
ffffffff8106d859: 75 e5 jne ffffffff8106d840 <switch_mm_irqs_off+0x3c0>
ffffffff8106d85b: e9 73 ff ff ff jmpq ffffffff8106d7d3 <switch_mm_irqs_off+0x353>
ffffffff8106d86e: e9 86 fc ff ff jmpq ffffffff8106d4f9 <switch_mm_irqs_off+0x79>
ffffffff8106d88c: 74 08 je ffffffff8106d896 <switch_mm_irqs_off+0x416>
ffffffff8106d8a1: 75 de jne ffffffff8106d881 <switch_mm_irqs_off+0x401>
ffffffff8106d8ab: e9 bf fc ff ff jmpq ffffffff8106d56f <switch_mm_irqs_off+0xef>
ffffffff8106d8c2: e9 60 fc ff ff jmpq ffffffff8106d527 <switch_mm_irqs_off+0xa7>
ffffffff8106d8d1: 75 3b jne ffffffff8106d90e <switch_mm_irqs_off+0x48e>
ffffffff8106d8dd: e9 be fd ff ff jmpq ffffffff8106d6a0 <switch_mm_irqs_off+0x220>
ffffffff8106d8f4: e9 f5 fc ff ff jmpq ffffffff8106d5ee <switch_mm_irqs_off+0x16e>
ffffffff8106d8fb: e9 6f fc ff ff jmpq ffffffff8106d56f <switch_mm_irqs_off+0xef>
ffffffff8106d902: e9 0e ff ff ff jmpq ffffffff8106d815 <switch_mm_irqs_off+0x395>
ffffffff8106d909: e9 87 fe ff ff jmpq ffffffff8106d795 <switch_mm_irqs_off+0x315>
ffffffff8106d919: 77 34 ja ffffffff8106d94f <switch_mm_irqs_off+0x4cf>
ffffffff8106d938: e9 63 fd ff ff jmpq ffffffff8106d6a0 <switch_mm_irqs_off+0x220>
ffffffff8106d94a: e9 51 fd ff ff jmpq ffffffff8106d6a0 <switch_mm_irqs_off+0x220>
ffffffff8106d95d: e9 3e fd ff ff jmpq ffffffff8106d6a0 <switch_mm_irqs_off+0x220>
ffffffff8106d987: e8 f4 fa ff ff callq ffffffff8106d480 <switch_mm_irqs_off>
ffffffff8106db30: e9 4b f9 ff ff jmpq ffffffff8106d480 <switch_mm_irqs_off>
ffffffff8176d59e: e8 dd fe 8f ff callq ffffffff8106d480 <switch_mm_irqs_off>
addr2line ffffffff8106d480 -e vmlinux-4.19.0-6-amd64
./debian/build/build_amd64_none_amd64/./arch/x86/mm/tlb.c:274 (实际代码在272行)
addr2line ffffffff8106d86c -e vmlinux-4.19.0-6-amd64
./debian/build/build_amd64_none_amd64/./arch/x86/include/asm/nospec-branch.h:274
static __always_inline
void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
{
asm volatile(ALTERNATIVE("", "wrmsr", %c[feature])
: : "c" (msr),
"a" ((u32)val),
"d" ((u32)(val >> 32)),
[feature] "i" (feature)
: "memory");
}
WRMSR — Write to Model Specific Register
| Opcode | Instruction | Op/En | 64-Bit Mode | Compat/Leg Mode | Description |
|---|---|---|---|---|---|
| 0F 30 | WRMSR | ZO | Valid | Valid | Write the value in EDX:EAX to MSR specified by ECX. |
Instruction Operand Encoding ¶
| Op/En | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
| ZO | NA | NA | NA | NA |
Writes the contents of registers EDX:EAX into the 64-bit model specific register (MSR) specified in the ECX register. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The contents of the EDX register are copied to high-order 32 bits of the selected MSR and the contents of the EAX register are copied to low-order 32 bits of the MSR. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are ignored.) Undefined or reserved bits in an MSR should be set to values previously read.
RAX: 0000000000000001 RBX: ffff8e3a8be02200 RCX: 0000000000000049
[ 74.766619] RDX: 0000000000000000 RSI: ffff8e3a8be02200 RDI: ffff8e3a8ba2d880
[ 74.774612] RBP: ffffffffbd07ca60 R08: ffff8e3a9ee22ae0 R09: 0000000000000000
意思是将0000:0001中的数据写入到 0x49 msr寄存器中. EDX寄存器中的内容0拷贝到MSR 0X49的高32位,EAX的内容1被拷贝到0x49的低32位。也就是往0x49 msr里面写了一个1,就导致异常了?

intel 手册,AMD的呢?
如果CPUID中支持IBPB_SUPPORT 或者 SPEC_CTRL的话,那么可以通过控制PRED_CMD MSR来控制IBPB功能。
当ibpb_enabled 设置为1的时候,IBPB barrier会在guest mode或user mode(用户态)的上下文切换的时候去刷新间接分支预测器中的内容,以阻止同主机上的其他虚拟机攻击或是同主机上的其他进程攻击
产生GP的原因:
1) This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) is generated. 指令必须在level 0或者实模式下运行。否则产生GP
2)Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception.
在ECX里面指定了一个保留或者没有实现的MSR地址,也会产生GP
3)The processor will also generate a general protection exception if software attempts to write to bits in a reserved MSR.
往保留的MSR里面写入bit,也会产生GP
后续需要:
1) amd手册中 0x49msr寄存器的含义
2) switch_mm_irqs_off接口到wrmsr接口的流程及是否有什么判断条件才能调用到wrmsr接口。
5.12 内核switch_mm_irqs_off 代码
void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk)
{
struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);
unsigned cpu = smp_processor_id();
u64 next_tlb_gen;
bool need_flush;
u16 new_asid;
/*
* NB: The scheduler will call us with prev == next when switching
* from lazy TLB mode to normal mode if active_mm isn't changing.
* When this happens, we don't assume that CR3 (and hence
* cpu_tlbstate.loaded_mm) matches next.
*
* NB: leave_mm() calls us with prev == NULL and tsk == NULL.
*/
/* We don't want flush_tlb_func_* to run concurrently with us. */
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(!irqs_disabled());
/*
* Verify that CR3 is what we think it is. This will catch
* hypothetical buggy code that directly switches to swapper_pg_dir
* without going through leave_mm() / switch_mm_irqs_off() or that
* does something like write_cr3(read_cr3_pa()).
*
* Only do this check if CONFIG_DEBUG_VM=y because __read_cr3()
* isn't free.
*/
#ifdef CONFIG_DEBUG_VM
if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
/*
* If we were to BUG here, we'd be very likely to kill
* the system so hard that we don't see the call trace.
* Try to recover instead by ignoring the error and doing
* a global flush to minimize the chance of corruption.
*
* (This is far from being a fully correct recovery.
* Architecturally, the CPU could prefetch something
* back into an incorrect ASID slot and leave it there
* to cause trouble down the road. It's better than
* nothing, though.)
*/
__flush_tlb_all();
}
#endif
this_cpu_write(cpu_tlbstate.is_lazy, false);
/*
* The membarrier system call requires a full memory barrier and
* core serialization before returning to user-space, after
* storing to rq->curr, when changing mm. This is because
* membarrier() sends IPIs to all CPUs that are in the target mm
* to make them issue memory barriers. However, if another CPU
* switches to/from the target mm concurrently with
* membarrier(), it can cause that CPU not to receive an IPI
* when it really should issue a memory barrier. Writing to CR3
* provides that full memory barrier and core serializing
* instruction.
*/
if (real_prev == next) {
VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
next->context.ctx_id);
/*
* Even in lazy TLB mode, the CPU should stay set in the
* mm_cpumask. The TLB shootdown code can figure out from
* from cpu_tlbstate.is_lazy whether or not to send an IPI.
*/
if (WARN_ON_ONCE(real_prev != &init_mm &&
!cpumask_test_cpu(cpu, mm_cpumask(next))))
cpumask_set_cpu(cpu, mm_cpumask(next));
/*
* If the CPU is not in lazy TLB mode, we are just switching
* from one thread in a process to another thread in the same
* process. No TLB flush required.
*/
if (!was_lazy)
return;
/*
* Read the tlb_gen to check whether a flush is needed.
* If the TLB is up to date, just use it.
* The barrier synchronizes with the tlb_gen increment in
* the TLB shootdown code.
*/
smp_mb();
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
next_tlb_gen)
return;
/*
* TLB contents went out of date while we were in lazy
* mode. Fall through to the TLB switching code below.
*/
new_asid = prev_asid;
need_flush = true;
} else {
/*
* Avoid user/user BTB poisoning by flushing the branch
* predictor when switching between processes. This stops
* one process from doing Spectre-v2 attacks on another.
*/
cond_ibpb(tsk); //此处接口为wrmsr的调用者
/*
* Stop remote flushes for the previous mm.
* Skip kernel threads; we never send init_mm TLB flushing IPIs,
* but the bitmap manipulation can cause cache line contention.
*/
if (real_prev != &init_mm) {
VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,
mm_cpumask(real_prev)));
cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
}
/*
* Start remote flushes and then read tlb_gen.
*/
if (next != &init_mm)
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
/* Let nmi_uaccess_okay() know that we're changing CR3. */
this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
barrier();
}
}
//下一级函数 都在tlb.c文件中
static void cond_ibpb(struct task_struct *next)
{
if (!next || !next->mm)
return;
/*
* Both, the conditional and the always IBPB mode use the mm
* pointer to avoid the IBPB when switching between tasks of the
* same process. Using the mm pointer instead of mm->context.ctx_id
* opens a hypothetical hole vs. mm_struct reuse, which is more or
* less impossible to control by an attacker. Aside of that it
* would only affect the first schedule so the theoretically
* exposed data is not really interesting.
*/
if (static_branch_likely(&switch_mm_cond_ibpb)) { // switch_mm_cond_ibpb为关键变量,如果没配置,后续wrmsr就不会被访问到。
unsigned long prev_mm, next_mm;
/*
* This is a bit more complex than the always mode because
* it has to handle two cases:
*
* 1) Switch from a user space task (potential attacker)
* which has TIF_SPEC_IB set to a user space task
* (potential victim) which has TIF_SPEC_IB not set.
*
* 2) Switch from a user space task (potential attacker)
* which has TIF_SPEC_IB not set to a user space task
* (potential victim) which has TIF_SPEC_IB set.
*
* This could be done by unconditionally issuing IBPB when
* a task which has TIF_SPEC_IB set is either scheduled in
* or out. Though that results in two flushes when:
*
* - the same user space task is scheduled out and later
* scheduled in again and only a kernel thread ran in
* between.
*
* - a user space task belonging to the same process is
* scheduled in after a kernel thread ran in between
*
* - a user space task belonging to the same process is
* scheduled in immediately.
*
* Optimize this with reasonably small overhead for the
* above cases. Mangle the TIF_SPEC_IB bit into the mm
* pointer of the incoming task which is stored in
* cpu_tlbstate.last_user_mm_ibpb for comparison.
*/
next_mm = mm_mangle_tif_spec_ib(next);
prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb);
/*
* Issue IBPB only if the mm's are different and one or
* both have the IBPB bit set.
*/
if (next_mm != prev_mm &&
(next_mm | prev_mm) & LAST_USER_MM_IBPB)
indirect_branch_prediction_barrier(); //这里是wrmsr
this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm);
}
if (static_branch_unlikely(&switch_mm_always_ibpb)) { //switch_mm_always_ibpb 这个如果不为真,则后续也被访问不到。
/*
* Only flush when switching to a user space task with a
* different context than the user space task which ran
* last on this CPU.
*/
if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) {
indirect_branch_prediction_barrier();//这里是wrmsr
this_cpu_write(cpu_tlbstate.last_user_mm, next->mm);
}
}
}
//具体实现代码 在 arch/x86/include/asm/nospec-branch.h
// static __always_inline
void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
{
asm volatile(ALTERNATIVE("", "wrmsr", %c[feature])
: : "c" (msr),
"a" ((u32)val),
"d" ((u32)(val >> 32)),
[feature] "i" (feature)
: "memory");
}
static inline void indirect_branch_prediction_barrier(void)
{
u64 val = PRED_CMD_IBPB;
alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
}
/x86/include/asm/msr-index.h:#define PRED_CMD_IBPB BIT(0)
./x86/include/asm/msr-index.h:#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
./x86/include/asm/cpufeatures.h:#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
上述这段代码和panic打印出来的信息一致。
上述代码中,最关键为 switch_mm_always_ibpb和 switch_mm_cond_ibpb,如果此两者不为真,则不会调用wrmsr 的寄存器。
此两者的赋值在: linux-5.12.2/arch/x86/kernel/cpu/bugs.c,通过代码走读,可以通过 cmdline参数进行控制。
static void __init
spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
{
enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE;
bool smt_possible = IS_ENABLED(CONFIG_SMP);
enum spectre_v2_user_cmd cmd;
if (!boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_STIBP))
return;
if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
smt_possible = false;
cmd = spectre_v2_parse_user_cmdline(v2_cmd);
........
/* Initialize Indirect Branch Prediction Barrier */
if (boot_cpu_has(X86_FEATURE_IBPB)) { //此处先判断是否有IBPB功能。
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
spectre_v2_user_ibpb = mode;
switch (cmd) {
case SPECTRE_V2_USER_CMD_FORCE:
case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
static_branch_enable(&switch_mm_always_ibpb); //根据不同的值使能变量
spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT;
break;
case SPECTRE_V2_USER_CMD_PRCTL:
case SPECTRE_V2_USER_CMD_AUTO:
case SPECTRE_V2_USER_CMD_SECCOMP:
static_branch_enable(&switch_mm_cond_ibpb);//根据不同的值使能变量
break;
default:
break; //default跑到这里,则都不启用。
}
pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n",
static_key_enabled(&switch_mm_always_ibpb) ?
"always-on" : "conditional"); //此处在dmesg中可以看到信息
}
}
实际上 switch_mm_irqs_off 接口是从4.4的某个内核版本才被引入的。https://elixir.bootlin.com/linux/v4.4.271/A/ident/switch_mm_irqs_off 可以搜到,在v4.4的则搜索不到。
那么为什么要引入呢?
1. 引入switch_mm_irqs_off
86/mm, sched/core: Turn off IRQs in switch_mm() …
amluto authored and Ingo Molnar committed on 28 Apr 2016
是如上的linux 合入的 ,2016-4-28
https://github.com/torvalds/linux/commit/078194f8e9fe3cf54c8fd8bded48a1db5bd8eb8a#
2. 重构代码switch_mm_irqs_off
3. 引入 ibpb
https://lore.kernel.org/lkml/20181125185005.466447057@linutronix.de/ 获取cpu 是否使用ibpb的接口 2018-11-25
https://lore.kernel.org/lkml/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk/ 29 Jan 2018 22:04:47 +0000
static char *ibpb_state(void)
{
- if (boot_cpu_has(X86_FEATURE_USE_IBPB))
- return ", IBPB";
- else
- return "";
+ if (boot_cpu_has(X86_FEATURE_IBPB)) {
+ switch (spectre_v2_user) {
+ case SPECTRE_V2_USER_NONE:
+ return ", IBPB: disabled";
+ case SPECTRE_V2_USER_STRICT:
+ return ", IBPB: always-on";
+ }
+ }
+ return "";
}

查看spectre_v2是否已经开启
通过命令通过命令查看 grep . /sys/devices/system/cpu/vulnerabilities/*
通过命令查看 grep . /sys/devices/system/cpu/vulnerabilities/*

通过命令查看
增加启动参数 /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="splash quiet nospectre_v2"
grep . /sys/devices/system/cpu/vulnerabilities/*
indirect-branch-predictor-barrier.html
switch_mm_irqs_off 一些介绍 https://lwn.net/Articles/763058/
内核spectre cmdline参数说明 https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/spectre.html
wrmsr指令,这篇介绍的很详细,包括错误:https://www.felixcloutier.com/x86/wrmsr
intel手册下载地址: https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html
amd 手册下载地址: https://developer.amd.com/resources/developer-guides-manuals/
wrmsr指令: https://www.xuebuyuan.com/804478.html
类似问题,但是挂的地方不一样 https://lkml.org/lkml/2019/1/3/540
反汇编及addr2line命令 https://www.jianshu.com/p/db13bddf4bc0
通过cmdlinedisable 内核补丁 https://yux.im/posts/technology/security/disable-meltdown-and-spectre-patches-on-linux/
博客内容主要涉及Linux内核中的进程切换过程,特别是`switch_mm_irqs_off`函数与`WRMSR`(Write to Model Specific Register)指令的使用。内容提到了在内核版本4.4引入的`switch_mm_irqs_off`函数,以及该函数在处理间接分支预测器(IBPB)时如何通过`wrmsr`接口写入Model Specific Register(MSR)。同时,文章讨论了`switch_mm_irqs_off`中可能导致异常的情况,如尝试写入保留或未实现的MSR地址,以及如何通过内核启动参数控制相关安全特性。
1920

被折叠的 条评论
为什么被折叠?



