定时测量（笔记）

最新推荐文章于 2023-11-20 18:27:26 发布

ckernel96

最新推荐文章于 2023-11-20 18:27:26 发布

阅读量1.4k

点赞数

CC 4.0 BY-SA版权

分类专栏： linux内核学习笔记文章标签： timer struct list 数据结构 user linux

本文链接：https://blog.youkuaiyun.com/ckernel96/article/details/6706142

linux内核学习笔记专栏收录该内容

6 篇文章

订阅专栏

本文详细介绍了Linux内核的定时测量机制，包括实时时钟RTC、时间戳计数器TSC、可编程间隔定时器PIT以及CPU本地定时器。讲解了各种定时器的工作原理、中断处理和相关数据结构，如jiffies变量、动态定时器和间隔定时器的使用。还涉及到了内核中的计时函数、中断处理程序和多处理器系统下的时钟同步。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

linux内核必须完成两种主要的定时测量：

保存当前的时间和日期，以便能通过time, ftime和gettimeofday系统调用把他们返回给用户程序
维持定时器，告诉内核或用户程序，某一时间间隔已经过去了。

定时测量是由基于固定频率振荡器和计数器的几个硬件电路完成的。

时钟和定时器电路

实时时钟（RTC， Real Time Clock）

它是独立于CPU和所有其他芯片的。

即使切断电源，RTC还继续工作，靠一个小电池或蓄电池供电。CMOS RAM和RTC被集成在一个芯片上。

RTC能在IRQ8上发周期性的中断。linux只用RTC来获取时间和日期。内核通过0x70和0x71 I/O端口访问RTC。

时间戳计数器（TSC， Time Stamp Counter）

所有的80x86微处理器都包含一条CLK输入引线，接受外部振荡器的时钟信号。包含一个计数器，该计数器利用64位的TSC寄存器来实现，可以通过汇编指令rdtsc读这个寄存器。linux在初始化阶段必须确定时钟信号的频率，编译内核时并不声明这个频率，所以内核映像可以运行在不同时钟频率的CPU上。

初始化完成之后，通过calibrate_tsc函数算一个大约在5ms的时间间隔内产生的时钟信号的个数来算出CPU实际频率。

unsigned long __init calibrate_tsc(void)
{
	mach_prepare_counter();

	{
		unsigned long startlow, starthigh;
		unsigned long endlow, endhigh;
		unsigned long count;

		rdtsc(startlow,starthigh);
		mach_countup(&count);
		rdtsc(endlow,endhigh);


		/* Error: ECTCNEVERSET */
		if (count <= 1)
			goto bad_ctc;

		/* 64-bit subtract - gcc just messes up with long longs */
		__asm__("subl %2,%0\n\t"
			"sbbl %3,%1"
			:"=a" (endlow), "=d" (endhigh)
			:"g" (startlow), "g" (starthigh),
			 "0" (endlow), "1" (endhigh));

		/* Error: ECPUTOOFAST */
		if (endhigh)
			goto bad_ctc;

		/* Error: ECPUTOOSLOW */
		if (endlow <= CALIBRATE_TIME)
			goto bad_ctc;

		__asm__("divl %2"
			:"=a" (endlow), "=d" (endhigh)
			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));

		return endlow;
	}

	/*
	 * The CTC wasn't reliable: we got a hit on the very first read,
	 * or the CPU was so fast/slow that the quotient wouldn't fit in
	 * 32 bits..
	 */
bad_ctc:
	return 0;
}

可编程间隔定时器（PIT， Programmable Internal Timer）

IBM兼容PC还包含了第三种时间测量设备，就是PIT。这个设备通过发出一个特殊的中断，叫做时钟中断来通知内核又一个时间间隔过去了。PIT通常是使用0x40 ~ 0x43 I/O端口的一个8254 CMOS芯片。

时钟中断的频率取决于硬件体系结构。

linux中，有几个宏产生决定时钟中断频率的常量：

HZ产生每秒时钟中断的近似个数，也就是时钟中断的频率。在IBM PC上，这个值为1000
CLOCK_TICK_RATE产生的值为1193182，是8254芯片的内部振荡器频率。
LATCH产生CLOCK_TICK_RATE和HZ的比值再四舍五入的整数值。这个值用来对PIT编程。

void setup_pit_timer(void)
{
	extern spinlock_t i8253_lock;
	unsigned long flags;

	spin_lock_irqsave(&i8253_lock, flags);
	outb_p(0x34,PIT_MODE);		/* binary, mode 2, LSB/MSB, ch 0 */
	udelay(10);
	outb_p(LATCH & 0xff , PIT_CH0);	/* LSB */
	udelay(10);
	outb(LATCH >> 8 , PIT_CH0);	/* MSB */
	spin_unlock_irqrestore(&i8253_lock, flags);
}

#define LATCH  ((CLOCK_TICK_RATE + HZ/2) / HZ)

#define PIT_MODE		0x43
#define PIT_CH0			0x40
#define PIT_CH2			0x42

CPU本地定时器

在最近80x86的本地APIC中，还提供了CPU本地定时器，这个是一种能够提供单步中断和周期性中断的设备。它与可编程间隔定时器不同的是：

APIC计数器时32位，而PIC计数器时16位；因此，可以对本地定时器编程来产生很低频率的中断。
本地APIC定时器把中断只发送给自己的处理器，而PIT产生一个全局性中断，系统中的任一CPU都可以对其处理。
APIC定时器是基于总线时钟信号的，PIT有其自己的内部时钟振荡器，可以灵活编程。

高精度事件定时器（HPET）

HPET是由Intel和Microsoft联合开发的一种新型定时器芯片。

ACPI电源管理定时器

它的时钟信号拥有大约3.58MHz的固定频率。为了读取计数器的当前值，内核需要访问某个I/O端口，这个I/O端口的地址由BIOS在初始化阶段确定。

linux计时体系结构

基于80x86多处理器机器所具有的计时体系结构和单处理器机器所具有的稍有不同：

在单处理器上，所有的计时活动都是由全局定时器产生的中断触发的。
在多处理器，所有普通的活动都是由全局定时器产生的中断触发的，具体CPU的活动都是由本地APIC定时器产生的中断触发的。

内核使用两种基本的计时函数：一个保持当前最新的时间，另一个计算在当前秒内走过的纳秒数。

计时体系结构的数据结构

定时器对象

它是timer_opts类型的一个描述符。

struct timer_opts {
	char* name; //标识定时器源的一个字符串
	void (*mark_offset)(void); //记录上一个节拍的准确时间，由时钟中断处理程序调用
	unsigned long (*get_offset)(void); //返回自上一个节拍开始所经过的纳秒数
	unsigned long long (*monotonic_clock)(void);//返回自内核初始化开始所经过的纳秒数
	void (*delay)(unsigned long); //等待指定数目的“循环”
};

其中最重要的时mark_offset和get_offset两个字段。由于这两种方法，linux计时体系结构能够达到子节拍的分辨度。内核能以比节拍周期更高的精度来测定当前的时间，这种操作叫做“定时插补(time interpolation)”。cur_timer存放了某个定时器的地址，该定时器时系统可利用的定时器资源中“最好的”。最初cur_timer指向timer_zone，这是一个虚拟的定时器资源对象。内核初始化期间，select_timer函数设置cur_timer指向适当定时器对象的地址。

struct timer_opts *cur_timer = &timer_none;
struct timer_opts* __init select_timer(void)
{
	int i = 0;
//优先选择HPET；否则，将选择ACPI电源管理定时器；再次之使TSC；最后方案选择总是存在PIT。	
	/* find most preferred working timer */
	while (timers[i]) {
		if (timers[i]->init)
			if (timers[i]->init(clock_override) == 0)
				return timers[i]->opts;
		++i;
	}
		
	panic("select_timer: Cannot find a suitable timer\n");
	return NULL;
}

void __init time_init(void)
{
#ifdef CONFIG_HPET_TIMER
	if (is_hpet_capable()) {
		/*
		 * HPET initialization needs to do memory-mapped io. So, let
		 * us do a late initialization after mem_init().
		 */
		late_time_init = hpet_time_init;
		return;
	}
#endif
	xtime.tv_sec = get_cmos_time();
	xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
	set_normalized_timespec(&wall_to_monotonic,
		-xtime.tv_sec, -xtime.tv_nsec);

	cur_timer = select_timer();
	printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);

	time_init_hook();
}

在time_init中通过select_timer返回值设置cur_timer。

static struct init_timer_opts* __initdata timers[] = {
#ifdef CONFIG_X86_CYCLONE_TIMER
	&timer_cyclone_init,
#endif
#ifdef CONFIG_HPET_TIMER
	&timer_hpet_init,
#endif
#ifdef CONFIG_X86_PM_TIMER
	&timer_pmtmr_init,
#endif
	&timer_tsc_init,
	&timer_pit_init,
	NULL,
};
struct init_timer_opts {
	int (*init)(char *override);
	struct timer_opts *opts;
};

本地APIC定时器没有对应的定时器对象，因为本地APIC定时器仅用来产生周期性中断而从不用来获得子节拍的分辨度。

jiffies变量

这是一个计数器，用来记录系统启动以来产生的节拍总数。每次时钟中断发生时，它便加1。80x86体系结构中，jiffies是一个32位的变量，每隔大约50天它的值会回绕到0,。使用了time_after, time_after_eq, time_before和time_before_eq四个宏，内核处理了jiffies变量的溢出。

#define time_after(a,b)     \                                                       
    (typecheck(unsigned long, a) && \                                               
     typecheck(unsigned long, b) && \                                               
     ((long)(b) - (long)(a) < 0))                                                   
#define time_before(a,b)    time_after(b,a)                                         
                                                                                    
#define time_after_eq(a,b)  \                                                       
    (typecheck(unsigned long, a) && \                                               
     typecheck(unsigned long, b) && \                                               
     ((long)(a) - (long)(b) >= 0))                                                  
#define time_before_eq(a,b) time_after_eq(b,a)

jiffies被初始化为fffb6c20,它是32位有符号值，等于-300000。所以，计数器将会在系统启动后的5分钟内处于溢出状态。这样做，使得那些不对jiffies作溢出检测的内核代码在开发阶段被及时发现，从而不再出现在稳定版本中。

linux需要自系统启动以来产生的系统节拍的真实数目。所以，jiffies变量通过连接器被换算成一个64位计数器的低32位，被称作为jiffies_64。

u64 get_jiffies_64(void)                                                            
{
    unsigned long seq;
    u64 ret;

    do {
        seq = read_seqbegin(&xtime_lock);
        ret = jiffies_64;
    } while (read_seqretry(&xtime_lock, seq));
    return ret;
}

xtime变量
xtime变量存放当前时间和日期，是一个timespec类型的数据结构。

struct timespec {
    time_t  tv_sec;     /* seconds 存放自1970年1月1日午夜以来经过的秒数 */
    long    tv_nsec;    /* nanoseconds 存放自上一秒开始经过的纳秒数 */
};

单处理器系统上的计时体系结构
在单处理器上，所有与定时有关的活动都是IRQ线0上的可编程间隔定时器产生的中断触发的。

初始化阶段

初始化阶段，time_init建立计时体系结构。

irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)         
{                                                                                
    /*                                                                           
     * Here we are in the timer irq handler. We just have irqs locally           
     * disabled but we don't know if the timer_bh is running on the other        
     * CPU. We need to avoid to SMP race with it. NOTE: we don' t need           
     * the irq version of write_lock because as just said we have irq            
     * locally disabled. -arca                                                   
     */                                                                          
    write_seqlock(&xtime_lock);//保护与定时相关的内核变量

    cur_timer->mark_offset();

    do_timer_interrupt(irq, NULL, regs);

    write_sequnlock(&xtime_lock);
    return IRQ_HANDLED;
}

do_timer_interrupt函数，执行如下操作：

使jiffies_64的值增1，这时候为写操作持有xtime_lock顺序锁
调用update_times函数，更新系统日期和时间。
调用update_process_times函数为本地CPU执行几个与定时相关的计数操作。
调用profile_tick函数
如果使用外部时钟来同步系统时钟，则每隔660秒，调用一次set_rtc_mmss调整实时时钟。

多处理器系统上的计时体系结构

多处理器系统可以依赖两种不同的时钟中断源：可编程间隔定时器或高精度事件定时器产生的中断源。

初始化阶段

函数apic_intr_init中，根据LOCAL_TIMER_VECTOR和低级中断处理程序apic_timer_interrupt的地址设置IDT的中断门。每个APIC必须被告知多久产生一次本地时钟中断。函数calibrate_APIC_clock通过正在启动的CPU的本地APIC来计算在一个节拍内收到多少个总线时钟信号。然后用这个值来对本地所有的APIC编程，通过setup_APIC_timer函数完成。

全局时钟中断处理程序

本地时钟中断处理程序

该处理程序执行系统中与特定CPU相关的计时活动，监管内核代码并检测当前进程在特定CPU上的运行时间。

fastcall void smp_apic_timer_interrupt(struct pt_regs *regs)                         
{
    int cpu = smp_processor_id();//获得CPU逻辑号

    /*
     * the NMI deadlock-detector uses this.
     */
    irq_stat[cpu].apic_timer_irqs++;

    /*
     * NOTE! We'd better ACK the irq immediately,
     * because timer handling can be slow.
     */
    ack_APIC_irq();//应答本地APIC上的中断
    /* 
     * update_process_times() expects us to have done irq_enter().
     * Besides, if we don't timer interrupts ignore the global
     * interrupt lock, which is the WrongThing (tm) to do.
     */
    irq_enter();
    smp_local_timer_interrupt(regs);
    irq_exit();
}

inline void smp_local_timer_interrupt(struct pt_regs * regs)                         
{
    int cpu = smp_processor_id();

    profile_tick(CPU_PROFILING, regs);
    if (--per_cpu(prof_counter, cpu) <= 0) {                                                                         
        per_cpu(prof_counter, cpu) = per_cpu(prof_multiplier, cpu);
        if (per_cpu(prof_counter, cpu) !=
                    per_cpu(prof_old_multiplier, cpu)) {
            __setup_APIC_LVTT(
                    calibration_result/
                    per_cpu(prof_counter, cpu));
            per_cpu(prof_old_multiplier, cpu) =
                        per_cpu(prof_counter, cpu);
        }
 
#ifdef CONFIG_SMP                                                                    
        update_process_times(user_mode(regs));//检查当前进程运行的时间，并更新一些本地CPU统计数
#endif                                                                               
    }
}

更新时间和日期
全局时钟中断处理程序调用update_times函数更新xtime变量的值

static inline void update_times(void)                                                
{
    unsigned long ticks;

    ticks = jiffies - wall_jiffies;
    if (ticks) {
        wall_jiffies += ticks;
        update_wall_time(ticks);
    }
    calc_load(ticks);
}
static void update_wall_time(unsigned long ticks)
{
    do {
        ticks--;
        update_wall_time_one_tick();
        if (xtime.tv_nsec >= 1000000000) {
            xtime.tv_nsec -= 1000000000;
            xtime.tv_sec++;
            second_overflow();
        }
    } while (ticks);
}

更新系统统计次数
更新本地CPU统计数
=========================================

void update_process_times(int user_tick)                                             
{
    struct task_struct *p = current;
    int cpu = smp_processor_id();

    /* Note: this timer irq context must be accounted for as well. */
    if (user_tick)//根据当前进程运行在用户态
        account_user_time(p, jiffies_to_cputime(1));
    else
        account_system_time(p, HARDIRQ_OFFSET, jiffies_to_cputime(1));
    run_local_timers();
    if (rcu_pending(cpu))
        rcu_check_callbacks(cpu, user_tick);//检查本地CPU是否经历了静止状态并调用tasklet_schedule来激活本地CPU的rcu_tasklet任务队列
    scheduler_tick();//使当前进程的时间片计数器减1，检查计数器是否到0。
}

void account_user_time(struct task_struct *p, cputime_t cputime)                 
{                                                                                
    struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;                    
    cputime64_t tmp;                                                             
                                                                                 
    p->utime = cputime_add(p->utime, cputime);                                       
                                                                                     
    /* Check for signals (SIGVTALRM, SIGPROF, SIGXCPU & SIGKILL). */                 
    check_rlimit(p, cputime);                                                        
    account_it_virt(p, cputime);                                                     
    account_it_prof(p, cputime);                                                     
                                                                                     
    /* Add user time to cpustat. */                                                  
    tmp = cputime_to_cputime64(cputime);                                             
    if (TASK_NICE(p) > 0)                                                            
        cpustat->nice = cputime64_add(cpustat->nice, tmp);                           
    else                                                                             
        cpustat->user = cputime64_add(cpustat->user, tmp);                           
}

void run_local_timers(void)                                                      
{                                                                                
    raise_softirq(TIMER_SOFTIRQ);//激活本地CPU上的TIMER_SOFTIRQ队列
}

account_user_time或是account_system_time函数执行：

更新当前进程描述符的utime字段或stime字段。进程描述符中提供两个cutime和cstime的附加字段，分别用来统计子进程在用户态和内核态下所经过的CPU节拍数。在这里，并不更新这两个字段。只是父进程询问她的其中一个子进程的状态时才对其进行更新。
检查是否达到总的CPU时限，如果是，向current进程发送SIGXCPU和SIGKILL信号。
调用account_it_virt和account_it_prof检查进程定时器
更新一些内核统计数，存放在每CPU变量kstat中。

记录系统负载

static inline void calc_load(unsigned long ticks)                                
{//计算平均负载
    unsigned long active_tasks; /* fixed-point */
    static int count = LOAD_FREQ;
 
    count -= ticks;
    if (count < 0) {
        count += LOAD_FREQ;
        active_tasks = count_active_tasks();
        CALC_LOAD(avenrun[0], EXP_1, active_tasks);
        CALC_LOAD(avenrun[1], EXP_5, active_tasks);
        CALC_LOAD(avenrun[2], EXP_15, active_tasks);
    }
}

监管内核代码

linux包含一个readprofiler的最低要求的代码监管器，确定内核的“热点”(hot spot) ---执行最频繁的内核代码片段。

void profile_tick(int type, struct pt_regs *regs)//为代码监管器采集数据
{
    if (type == CPU_PROFILING && timer_hook)
        timer_hook(regs);
    if (!user_mode(regs) && cpu_isset(smp_processor_id(), prof_cpu_mask))
        profile_hit(type, (void *)profile_pc(regs));
}

这个函数在单处理器系统上被do_timer_interrupt调用，多处理器系统被smp_local_timer_interrupt调用。

当使用oprofile采集数据时，profile_tick调用timer_notify函数来收集这个新监管器所使用的数据。

检查非屏蔽中断（NMI）监视器

多处理器系统上，linux为内核开发者提供了另外一种功能：看门狗系统，对于探测引起系统冻结的内核bug相当有用。必须内核启动时，传递nmi_watchdog参数。

看门狗基于I/O APIC巧妙的硬件特性，能在每个CPU上产生周期性的NMI中断。这个中断不能用汇编语言cli，即使禁止中断，看门狗也能检测到死锁。

一旦每个时钟节拍到来，所有CPU都开始执行NMI中断处理程序，该程序又调用do_nmi。

fastcall void do_nmi(struct pt_regs * regs, long error_code)                     
{                                                                                
    int cpu;                                                                     
                                                                                 
    nmi_enter();                                                                 
                                                                                 
    cpu = smp_processor_id();//获取CPU的逻辑号n
    ++nmi_count(cpu);//检查irq_stat数组第n项的apic_timer_irqs字段

    if (!nmi_callback(regs, cpu))
        default_do_nmi(regs);

    nmi_exit();
}

当NMI中断处理程序检测到一个CPU冻结时，把引起恐慌的信息记录在系统日志文件中，转储该CPU寄存器的内容和内核栈的内容，并杀死当前进程。

软定时器和延迟函数

每个定时器都包含一个字段，这个字段的初值就是jiffies的当前值加上合适的节拍数。这个字段的值不再改变，每当内核检查定时器时，会比较这个值和当前jiffies的值，如果jiffies大于存放的值，定时器到期。

linux考虑两种类型的定时器，即动态定时器（由内核使用）和间隔定时器（由进程在用户态创建）。

动态定时器

动态定时器存放在timer_list结构中：

struct timer_list {
    struct list_head entry;//将软定时器插入双向循环链表队列中
    unsigned long expires;//给出定时器到期时间，用节拍数表示
                                                                                 
    spinlock_t lock;
    unsigned long magic;
                                                                                 
    void (*function)(unsigned long);//包含定时器到期时执行函数的地址
    unsigned long data;//传给定时器函数的参数
                                                                                 
    struct tvec_t_base_s *base;
};

为了创建并激活一个动态定时器，内核必须：

如果需要，创建一个timer_list对象，可以通过：代码中定义一个静态全局变量；函数内定义一个局部变量，对象存放在内核堆栈；动态分配的描述符中包含这个对象；这几个方式来进行。
通过init_timer(&t)初始化这个对象
把定时器到期时激活函数的地址存入function字段。如果需要，把传递给函数的参数值存入data字段。
如果定时器还没有插入到链表中，给expires字段赋一个值并调用add_timer(&t);
否则，如果动态定时器已经被插入到链表中，则调用mod_timer函数来更新expires字段。

一旦定时器到期，内核就自动把元素从链表中删除，不过，有时进程需要del_timer, del_timer_sync, del_singleshot_timer_sync函数显示的从定时器链表中删除一个定时器。

动态定时器与竞争条件
在多处理器系统上，del_timer函数有时不安全，如果定时器函数还在其他CPU上运行，定时器函数还作用在资源上时，资源可能被释放，此时应该用del_timer_sync函数，删除定时器时，会检查是否还在其他CPU上运行，如果是，就等待，直到定时器函数结束。

如果内核开发者知道定时器函数从不重新激活定时器，就简单使用del_singleshot_timer_sync是定时器无效，并等待知道定时器结束。

动态定时器的数据结构

动态定时器的主要数据结构是一个叫做tvec_bases的每CPU变量，包含NR_CPUS个元素，每个元素都是tvec_base_t类型的数据结构。

struct tvec_t_base_s {
    spinlock_t lock;
    unsigned long timer_jiffies;//需要检查的动态定时器的最早到期时间；
    struct timer_list *running_timer;//在多处理器中，指向本地CPU当前正处理的动态定时器的timer_list数据结构
    tvec_root_t tv1;//包含一个vec数组，由256个list_head元素组成
    tvec_t tv2;
    tvec_t tv3;
    tvec_t tv4;
    tvec_t tv5;
} ____cacheline_aligned_in_smp;

typedef struct tvec_t_base_s tvec_base_t;

================================
动态定时器处理

run_timer_softirq函数是与TIMER_SOFTIRQ软中断请求相关的可延迟函数。

static void run_timer_softirq(struct softirq_action *h)
{
    tvec_base_t *base = &__get_cpu_var(tvec_bases);//把本地CPU相关的tvec_base_t数据结构的地址存放在base本地变量中

    if (time_after_eq(jiffies, base->timer_jiffies))
        __run_timers(base);
}
static inline void __run_timers(tvec_base_t *base)
{
    struct timer_list *timer;

    spin_lock_irq(&base->lock);//获得lock自旋锁
    while (time_after_eq(jiffies, base->timer_jiffies)) {
        struct list_head work_list = LIST_HEAD_INIT(work_list);
        struct list_head *head = &work_list;
        int index = base->timer_jiffies & TVR_MASK;//计算tv1中链表的索引，保存在index

        /*
         * Cascade timers:
         */
        if (!index &&//如果index为0，说明tv1中所有的链表已经被检查过，调用cascade来过滤动态定时器
            (!cascade(base, &base->tv2, INDEX(0))) &&
                (!cascade(base, &base->tv3, INDEX(1))) &&
                    !cascade(base, &base->tv4, INDEX(2)))
            cascade(base, &base->tv5, INDEX(3));
        ++base->timer_jiffies;
        list_splice_init(base->tv1.vec + index, &work_list);//执行对应tv1.vec[index]链表上的每一个定时器
repeat:
        if (!list_empty(head)) {
            void (*fn)(unsigned long);
            unsigned long data;

            timer = list_entry(head->next,struct timer_list,entry);
            fn = timer->function;
            data = timer->data;

            list_del(&timer->entry);
            set_running_timer(base, timer);
            smp_wmb();
            timer->base = NULL;
            spin_unlock_irq(&base->lock);
            {
                u32 preempt_count = preempt_count();
                fn(data);
                if (preempt_count != preempt_count()) {
                    printk("huh, entered %p with %08x, exited with %08x?\n", fn, preempt_count, preempt_count());
                    BUG();
                }
            }
            spin_lock_irq(&base->lock);
            goto repeat;
        }
    }
    set_running_timer(base, NULL);//设置running_timer为NULL
    spin_unlock_irq(&base->lock);//释放lock自旋锁并 允许本地中断
}

动态定时器应用之一：nanosleep系统调用

发

asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
{
    struct timespec t;
    unsigned long expire;
    long ret;
 
    if (copy_from_user(&t, rqtp, sizeof(t)))//将包含在timespec结构中的值复制到局部t中
        return -EFAULT;
                                                                                 
    if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0))         
        return -EINVAL;                                                          
                                                                                 
    expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);//timespec_to_jiffies将timespec结构中的时间间隔转换成节拍数，保险起见，加上一个节拍
    current->state = TASK_INTERRUPTIBLE;                                         
    expire = schedule_timeout(expire);                                           
                                                                                 
    ret = 0;                                                                     
    if (expire) {                                                                
        struct restart_block *restart;                                           
        jiffies_to_timespec(expire, &t);                                         
        if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))                           
            return -EFAULT;                                                      
                                                                                 
        restart = ¤t_thread_info()->restart_block;                         
        restart->fn = nanosleep_restart;                                         
        restart->arg0 = jiffies + expire;                                        
        restart->arg1 = (unsigned long) rmtp;                                    
        ret = -ERESTART_RESTARTBLOCK;                                            
    }                                                                            
    return ret;                                                                  
}

fastcall signed long __sched schedule_timeout(signed long timeout)               
{
    struct timer_list timer;
    unsigned long expire;

    switch (timeout)
    {
    case MAX_SCHEDULE_TIMEOUT:
        /*
         * These two special cases are useful to be comfortable
         * in the caller. Nothing more. We could take
         * MAX_SCHEDULE_TIMEOUT from one of the negative value
         * but I' d like to return a valid offset (>=0) to allow
         * the caller to do everything it want with the retval.
         */
        schedule();//进程挂起，直到定时器到时
        goto out;                                                                
    default:                                                                     
        /*                                                                       
         * Another bit of PARANOID. Note that the retval will be                 
         * 0 since no piece of kernel is supposed to do a check                  
         * for a negative retval of schedule_timeout() (since it                 
         * should never happens anyway). You just have the printk()              
         * that will tell you if something is gone wrong and where.              
         */                                                                      
        if (timeout < 0)                                                         
        {                                                                        
            printk(KERN_ERR "schedule_timeout: wrong timeout "                   
                   "value %lx from %p\n", timeout,                               
                   __builtin_return_address(0));                                 
            current->state = TASK_RUNNING;                                       
            goto out;                                                            
        }                                                                        
    }                                                                            
                                                                                 
    expire = timeout + jiffies;                                                  
                                                                                 
    init_timer(&timer);                                                          
    timer.expires = expire;                                                      
    timer.data = (unsigned long) current;//延迟函数接收进程描述符作为参数
    timer.function = process_timeout;                                            
                                                                                 
    add_timer(&timer);                                                           
    schedule();                                                                  
    del_singleshot_timer_sync(&timer);                                           
                                                                                 
    timeout = expire - jiffies;                                                  
                                                                                 
 out:                                                                            
    return timeout < 0 ? 0 : timeout;                                            
}

tatic void process_timeout(unsigned long __data)
{
    wake_up_process((task_t *)__data);//延迟到期，执行这个函数
}

一旦进程唤醒，继续执行sys_nanosleep系统调用，如果schedule_timeout返回的值表明进程延时到期，就结束，否则，将自动重新启动。

当内核需要较短的时间间隔时，内核使用udelay和ndelay函数，前者是微秒级的时间间隔，后者是纳秒级的时间间隔。

void __delay(unsigned long loops)                                                
{                                                                                
    cur_timer->delay(loops);                                                     
}                                                                                
                                                                                 
inline void __const_udelay(unsigned long xloops)                                 
{                                                                                
    int d0;                                                                      
    xloops *= 4;                                                                 
    __asm__("mull %0"                                                            
        :"=d" (xloops), "=&a" (d0)                                               
        :"1" (xloops),"0" (cpu_data[_smp_processor_id()].loops_per_jiffy * (HZ/4)));
        __delay(++xloops);                                                       
}                                                                                
                                                                                 
void __udelay(unsigned long usecs)                                               
{                                                                                
    __const_udelay(usecs * 0x000010c7);  /* 2**32 / 1000000 (rounded up) */      
}                                                                                
                                                                                 
void __ndelay(unsigned long nsecs)                                               
{                                                                                
    __const_udelay(nsecs * 0x00005);  /* 2**32 / 1000000000 (rounded up) */      
}
#define udelay(n) (__builtin_constant_p(n) ? \                                   
    ((n) > 20000 ? __bad_udelay() : __const_udelay((n) * 0x10c7ul)) : \          
    __udelay(n))                                                                 
                                                                                 
#define ndelay(n) (__builtin_constant_p(n) ? \                                   
    ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \               
    __ndelay(n))

与定时测量相关的系统调用
time和gettimeofday系统调用
gettimeofday系统调用由sys_gettimeofday函数实现，这个函数又调用do_gettimeofday

smlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone __user *tz)
{                                                                                
    if (likely(tv != NULL)) {                                                    
        struct timeval ktv;                                                      
        do_gettimeofday(&ktv);                                                   
        if (copy_to_user(tv, &ktv, sizeof(ktv)))                                 
            return -EFAULT;                                                      
    }                                                                            
    if (unlikely(tz != NULL)) {                                                  
        if (copy_to_user(tz, &sys_tz, sizeof(sys_tz)))                           
            return -EFAULT;                                                      
    }                                                                            
    return 0;                                                                    
}

void do_gettimeofday(struct timeval *tv)
{
    unsigned long seq;
    unsigned long usec, sec;
    unsigned long max_ntp_tick;

    do {
        unsigned long lost;

        seq = read_seqbegin(&xtime_lock);//为读操作获取xtime_lock顺序锁

        usec = cur_timer->get_offset();//确定自上一次时钟中断以来所有过的微秒数
        lost = jiffies - wall_jiffies;

        /*
         * If time_adjust is negative then NTP is slowing the clock
         * so make sure not to go into next possible interval.
         * Better to lose some accuracy than have time go backwards..
         */
        if (unlikely(time_adjust < 0)) {
            max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
            usec = min(usec, max_ntp_tick);

            if (lost)
                usec += lost * max_ntp_tick;
        }
        else if (unlikely(lost))
            usec += lost * (USEC_PER_SEC / HZ);
 
        sec = xtime.tv_sec;
        usec += (xtime.tv_nsec / 1000);
    } while (read_seqretry(&xtime_lock, seq));

    while (usec >= 1000000) {//检查微秒字段是否溢出，如果必要则调整该字段和秒字段
        usec -= 1000000;
        sec++;
    }

    tv->tv_sec = sec;
    tv->tv_usec = usec;
}

do_gettimeofday和do_settimeofday修改xtime值时，没有修改RTC寄存器，系统关机时，新的时间会丢失。除非用户用clock程序改变RTC的值。

adjtimex系统调用

把系统配置成能在常规基准上运行时间同步协议，例如网络定时协议（NTP），依赖于adjtimex系统调用。

setitimer和alarm系统调用

linux允许用户态程序激活一种叫做间隔定时器的特殊定时器。间隔定时器由以下两个方面来刻画：