1 NOHZ模式(动态时钟)
在dynamic tick引入之前,内核一直使用周期性的基于HZ的tick。传统的tick机制在系统进入空闲状态时仍然会产生周期性的中断,这种频繁的中断迫使CPU无法进入更深的睡眠。如果放开这个限制,在系统进入空闲时停止tick,有工作时恢复tick,实现完全自由的,根据需要产生tick的机制,可以使CPU获得更多的睡眠机会以及更深的睡眠,从而进一步节能。dynamic tick的出现,就是为彻底替换掉周期性的tick机制而产生的。周期性运行的tick机制需要完成诸如进程时间片的计算,更新profile,协助CPU进行负载均衡等诸多工作,这些工作dynamic tick都提供了相应的模拟机制来完成。
从上文中可知内核时钟子系统支持低精度和高精度两种模式,因此dynamic tick也必须有两套对应的处理机制。
其核心数据结构为:
/**
* struct tick_sched - sched tick emulation and no idle tick control/stats
* @sched_timer: hrtimer to schedule the periodic tick in high
* resolution mode
* @idle_tick: Store the last idle tick expiry time when the tick
* timer is modified for idle sleeps. This is necessary
* to resume the tick timer operation in the timeline
* when the CPU returns from idle
* @tick_stopped: Indicator that the idle tick has been stopped
* @idle_jiffies: jiffies at the entry to idle for idle time accounting
* @idle_calls: Total number of idle calls
* @idle_sleeps: Number of idle calls, where the sched tick was stopped
* @idle_entrytime: Time when the idle call was entered
* @idle_waketime: Time when the idle was interrupted
* @idle_exittime: Time when the idle state was left
* @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
* @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
* @sleep_length: Duration of the current idle sleep
* @do_timer_lst: CPU was the last one doing do_timer before going idle
*/
struct tick_sched {
struct hrtimer sched_timer;
unsigned long check_clocks;
enum tick_nohz_mode nohz_mode;
ktime_t idle_tick;
int inidle;
int tick_stopped;
unsigned long idle_jiffies;
unsigned long idle_calls;
unsigned long idle_sleeps;
int idle_active;
ktime_t idle_entrytime;
ktime_t idle_waketime;
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
ktime_t sleep_length;
unsigned long last_jiffies;
unsigned long next_jiffies;
ktime_t idle_expires;
int do_timer_last;
};
/*
* Per cpu nohz control structure
*/
static DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched);
1.1 低精度NOHZ模式
在低精度模式下,每次tick都会触发TIMER_SOFTIRQ软中断,软中断处理函数run_time_softirq这个函数里可能使得时钟模式切换到NOHZ模式。切换过程如下:
run_timer_softirq
hrtimer_run_pending
tick_check_oneshot_change
tick_nohz_switch_to_nohz();
tick_switch_to_oneshot(tick_nohz_handler)
发生上述调用流程的前提是没有设置CONFIG_HIGH_RES_TIMERS选项,即没有启用高精度模式但是内核使能了NOHZ模式。
低精度模式下dynamic tick的核心处理函数tick_nohz_handler,其核心处理函数下所示。
static void tick_nohz_handler(struct clock_event_device *dev)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
struct pt_regs *regs = get_irq_regs();
int cpu = smp_processor_id();
ktime_t now = ktime_get();
dev->next_event.tv64 = KTIME_MAX;
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
tick_do_timer_cpu = cpu;
/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);
/*
* When we are idle and the tick is stopped, we have to touch
* the watchdog as we might not schedule for a really long
* time. This happens on complete idle SMP systems while
* waiting on the login prompt. We also increment the "start
* of idle" jiffy stamp so the idle accounting adjustment we
* do when we go busy again does not account too much ticks.
*/
if (ts->tick_stopped) {
touch_softlockup_watchdog();
ts->idle_jiffies++;
}
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
//设置下次超时事件
while (tick_nohz_reprogram(ts, now)) {
now = ktime_get();
tick_do_update_jiffies64(now);
}
}
在这个函数中首先模拟周期性tick device完成类似的工作:如果当前CPU负责全局tick device的工作,则更新jiffies,同时完成对本地CPU的进程时间统计等工作。如果当前tick device在此之前已经处于停止状态,为了防止tick停止时间过长造成 watchdog 超时,从而引发soft-lockdep的错误,需要通过调用touch_softlockup_watchdog复位软件看门狗防止其溢出。正如代码中注释所描述,这种情况有可能出现在启动完毕后完全空闲等待登录的SMP 系统上。最后需要设置下一次tick的超时时间。如果tick_nohz_reprogram执行时间超过了一个jiffy,会导致设置的下一次超时时间已经过期,因此需要重新设置,相应的也需要再次更新jiffies。这里虽然设置了下一次的超时事件,但是由于系统空闲时会停止tick,因此下一次的超时事件可能发生,也可能不发生。这也正是dynamic tick根本特性。
1.2 高精度NOHZ模式
其具体的流程为:
hrtimer_switch_to_hres();
tick_init_highres
tick_switch_to_oneshot(hrtimer_interrupt);
tick_setup_sched_timer();
ts->sched_timer.function = tick_sched_timer;
高精度NOHZ模式下的核心处理函数是tick_sched_timer,具体实现如下:
/*
* We rearm the timer until we get disabled by the idle code.
* Called with interrupts disabled and timer->base->cpu_base->lock held.
*/
static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
{
struct tick_sched *ts =
container_of(timer, struct tick_sched, sched_timer);
struct pt_regs *regs = get_irq_regs();
ktime_t now = ktime_get();
int cpu = smp_processor_id();
#ifdef CONFIG_NO_HZ
/*
* Check if the do_timer duty was dropped. We don't care about
* concurrency: This happens only when the cpu in charge went
* into a long sleep. If two cpus happen to assign themself to
* this duty, then the jiffies update is still serialized by
* xtime_lock.
*/
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
tick_do_timer_cpu = cpu;
#endif
/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);
/*
* Do not call, when we are not in irq context and have
* no valid regs pointer
*/
if (regs) {
/*
* When we are idle and the tick is stopped, we have to touch
* the watchdog as we might not schedule for a really long
* time. This happens on complete idle SMP systems while
* waiting on the login prompt. We also increment the "start of
* idle" jiffy stamp so the idle accounting adjustment we do
* when we go busy again does not account too much ticks.
*/
if (ts->tick_stopped) {
touch_softlockup_watchdog();
ts->idle_jiffies++;
}
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
}
//设置下次超时事件
hrtimer_forward(timer, now, tick_period);
return HRTIMER_RESTART;
}
从hrtimer高精度模式下模拟周期运行的tick device的简化实现中可以看到,在高精度模式下tick_sched_timer用来模拟周期性tick device的功能。需要注意的是tick_sched_timer又是在hrtimer_interrupt中调用的。dynamic tick的实现也使用了这个函数。这是因为hrtimer在高精度模式时必须使用one-shot模式的tick device,这也同时符合dynamic tick的要求。虽然使用同样的函数,表面上都会触发周期性的 tick 中断,但是使用dynamic tick的系统在空闲时会停止tick工作,因此tick中断不会是周期产生的。
1.3 Dynamic tick 的开始和停止
当CPU进入空闲时是最好的启动dynamic tick机制时机,停止tick;反之在CPU从空闲中恢复到工作状态时,则可以停止dynamic tick,如下所示:
CPU 在 idle 时 dynamic tick 的启动/停止设置
void cpu_idle(void)
{
. . . .
while (1) {
tick_nohz_stop_sched_tick(1);
while (!need_resched()) {
. . . .
}
tick_nohz_restart_sched_tick();
}