1. 前言
Wakeup count是Wakeup events framework的组成部分,用于解决“system suspend和system wakeup events之间的同步问题”。本文将结合“Linux电源管理(6)_Generic PM之Suspend功能”和“Linux电源管理(7)_Wakeup events framework”两篇文章,分析wakeup count的功能、实现逻辑、背后的思考,同时也是对这两篇文章的复习和总结
2. wakeup count在电源管理中的位置
wakeup count的实现位于wakeup events framework中(drivers/base/power/wakeup.c),主要为两个模块提供接口:通过PM core向用户空间提供sysfs接口;直接向autosleep(请参考下一篇文章)提供接口。
3. wakeup count的功能
wakeup count的功能是suspend同步,实现思路是这样的:
1)任何想发起电源状态切换的实体(可以是用户空间电源管理进程,也可以是内核线程,简称C),在发起状态切换前,读取系统的wakeup counts(该值记录了当前的wakeup event总数),并将读取的counts告知wakeup events framework。
2)wakeup events framework记录该counts到一个全局变量中(saved_count)
3)随后C发起电源状态切换(如STR),执行suspend过程。
4)在suspend的过程中,wakeup events framework照旧工作(直到系统中断被关闭),上报wakeup events,增加wakeup events counts。
5)suspend执行的一些时间点(可参考“Linux电源管理(6)_Generic PM之Suspend功能”),会调用wakeup events framework提供的接口(pm_wakeup_pending),检查是否有wakeup没有处理。
6)检查逻辑很简单,就是比较当前的wakeup counts和saved wakeup counts(C发起电源状态切换时的counts),如果不同,就要终止suspend过程。
4. wakeup count的实现逻辑
4.1 一个例子
在进行代码分析之前,我们先用伪代码的形式,写出一个利用wakeup count进行suspend操作的例子,然后基于该例子,分析相关的实现。
1: do {
2: ret = read(&cnt, "/sys/power/wakeup_count");
3: if (ret) {
4: ret = write(cnt, "/sys/power/wakeup_count");
5: } else {
6: countine;
7: }
8: } while (!ret);
9:
10: write("mem", "/sys/power/state");
11:
12: /* goto here after wakeup */
例子很简单:
a)读取wakeup count值,如果成功,将读取的值回写。否则说明有正在处理的wakeup events,continue。
b)回写后,判断返回值是否成功,如果不成功(说明读、写的过程中产生了wakeup events),继续读、写,直到成功。成功后,可以触发电源状态切换。
4.2 /sys/power/wakeup_count
wakeup_count文件是在kernel/power/main.c中,利用power_attr注册的,如下(大家可以仔细研读一下那一大段注释,内核很多注释写的非常好,而好的注释,就是软件功力的体现):
#ifdef CONFIG_PM_SLEEP
/*
* The 'wakeup_count' attribute, along with the functions defined in
* drivers/base/power/wakeup.c, provides a means by which wakeup events can be
* handled in a non-racy way.
*
* If a wakeup event occurs when the system is in a sleep state, it simply is
* woken up. In turn, if an event that would wake the system up from a sleep
* state occurs when it is undergoing a transition to that sleep state, the
* transition should be aborted. Moreover, if such an event occurs when the
* system is in the working state, an attempt to start a transition to the
* given sleep state should fail during certain period after the detection of
* the event. Using the 'state' attribute alone is not sufficient to satisfy
* these requirements, because a wakeup event may occur exactly when 'state'
* is being written to and may be delivered to user space right before it is
* frozen, so the event will remain only partially processed until the system is
* woken up by another event. In particular, it won't cause the transition to
* a sleep state to be aborted.
*
* This difficulty may be overcome if user space uses 'wakeup_count' before
* writing to 'state'. It first should read from 'wakeup_count' and store
* the read value. Then, after carrying out its own preparations for the system
* transition to a sleep state, it should write the stored value to
* 'wakeup_count'. If that fails, at least one wakeup event has occurred since
* 'wakeup_count' was read and 'state' should not be written to. Otherwise, it
* is allowed to write to 'state', but the transition will be aborted if there
* are any wakeup events detected after 'wakeup_count' was written to.
*/
static ssize_t wakeup_count_show(struct kobject *kobj,
struct kobj_attribute *attr,
char *buf)
{
unsigned int val;
return pm_get_wakeup_count(&val, true) ?
sprintf(buf, "%u\n", val) : -EINTR;
}
static ssize_t wakeup_count_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t n)
{
unsigned int val;
int error;
error = pm_autosleep_lock();
if (error)
return error;
if (pm_autosleep_state() > PM_SUSPEND_ON) {
error = -EBUSY;
goto out;
}
error = -EINVAL;
if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
error = n;
else
pm_print_active_wakeup_sources();
}
out:
pm_autosleep_unlock();
return error;
}
power_attr(wakeup_count);
实现很简单:read时,直接调用pm_get_wakeup_count(注意第2个参数);write时,直接调用pm_save_wakeup_count(注意用户空间的wakeup count功能和auto sleep互斥,会在下篇文章解释原因)。这两个接口均是wakeup events framework提供的接口,跟着代码往下看吧。
4.3 pm_get_wakeup_count
pm_get_wakeup_count的实现如下:
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
* @block: Whether or not to block.
*
* Store the number of registered wakeup events at the address in @count. If
* @block is set, block until the current number of wakeup events being
* processed is zero.
*
* Return 'false' if the current number of wakeup events being processed is
* nonzero. Otherwise, return 'true'.
*/
bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr; // 声明两个无符号整数变量
if (block) {
DEFINE_WAIT(wait); // 定义一个等待队列项
for (;;) { // 进入无限循环
prepare_to_wait(&wakeup_count_wait_queue, &wait, TASK_INTERRUPTIBLE); // 将当前任务添加到等待队列中,状态设置为可中断等待
split_counters(&cnt, &inpr); // 获取唤醒事件计数器的值和正在处理的唤醒事件数量
if (inpr == 0 || signal_pending(current)) // 如果正在处理的唤醒事件数量为零或当前任务收到了信号
break; // 跳出循环
pm_print_active_wakeup_sources(); // 打印活动的唤醒源
schedule(); // 调度任务,让出处理器并进入等待状态
}
finish_wait(&wakeup_count_wait_queue, &wait); // 结束等待状态
}
split_counters(&cnt, &inpr); // 获取最终的唤醒事件计数器的值和正在处理的唤醒事件数量
*count = cnt; // 将唤醒事件计数器的值存储到提供的地址count中
return !inpr; // 返回正在处理的唤醒事件数量是否为零的布尔值
}
该接口有两个参数,一个是保存返回的count值得指针,另一个指示是否block,具体请参考代码逻辑:
a)如果block为false,直接读取registered wakeup events和wakeup events in progress两个counter值,将registered wakeup events交给第一个参数,并返回wakeup events in progress的状态(若返回false,说明当前有wakeup events正在处理,不适合suspend)。
b)如果block为true,定义一个等待队列,等待wakeup events in progress为0,再返回counter。
注1:由4.2小节可知,sysfs发起的read动作,block为true,所以如果有正在处理的wakeup events,read进程会阻塞。其它模块(如auto sleep)发起的read,则可能不需要阻塞。
4.4 pm_save_wakeup_count
pm_save_wakeup_count的实现如下:
/**
* pm_save_wakeup_count - Save the current number of registered wakeup events.
* @count: Value to compare with the current number of registered wakeup events.
*
* If @count is equal to the current number of registered wakeup events and the
* current number of wakeup events being processed is zero, store @count as the
* old number of registered wakeup events for pm_check_wakeup_events(), enable
* wakeup events detection, and return 'true'. Otherwise, disable wakeup events
* detection and return 'false'.
*/
bool pm_save_wakeup_count(unsigned int count)
{
unsigned int cnt, inpr; // 声明两个无符号整数变量
unsigned long flags; // 声明一个无符号长整型变量
events_check_enabled = false; // 禁用唤醒事件检测
raw_spin_lock_irqsave(&events_lock, flags); // 获取并保存自旋锁的状态
split_counters(&cnt, &inpr); // 获取唤醒事件计数器的值和正在处理的唤醒事件数量
if (cnt == count && inpr == 0) { // 如果唤醒事件计数器的值等于count且正在处理的唤醒事件数量为零
saved_count = count; // 将count存储为旧的注册唤醒事件数,用于pm_check_wakeup_events()
events_check_enabled = true; // 启用唤醒事件检测
}
raw_spin_unlock_irqrestore(&events_lock, flags); // 释放自旋锁并恢复中断状态
return events_check_enabled; // 返回唤醒事件检测是否启用的布尔值
}
1)注意这个变量,events_check_enabled,如果它不为真,pm_wakeup_pending接口直接返回false,意味着如果不利用wakeup count功能,suspend过程中不会做任何wakeup events检查,也就不会进行任何的同步。
2)解除当前的registered wakeup events、wakeup events in progress,保存在变量cnt和inpr中。
3)如果写入的值和cnt不同(说明读、写的过程中产生events),或者inpr不为零(说明有events正在被处理),返回false(说明此时不宜suspend)。
4)否则,events_check_enabled置位(后续的pm_wakeup_pending才会干活),返回true(可以suspend),并将当前的wakeup count保存在saved count变量中。
4.5 /sys/power/state
再回忆一下“Linux电源管理(6)_Generic PM之Suspend功能”中suspend的流程,在suspend_enter接口中,suspend前的最后一刻,会调用pm_wakeup_pending接口,代码如下:
1: static int suspend_enter(suspend_state_t state, bool *wakeup)
2: {
3: ...
4: error = syscore_suspend();
5: if (!error) {
6: *wakeup = pm_wakeup_pending();
7: if (!(suspend_test(TEST_CORE) || *wakeup)) {
8: error = suspend_ops->enter(state);
9: events_check_enabled = false;
10: }
11: syscore_resume();
12: }
13: ...
14: }
在write wakeup_count到调用pm_wakeup_pending这一段时间内,wakeup events framework会照常产生wakeup events,因此如果pending返回true,则不能“enter”,终止suspend吧!
注2:wakeup后,会清除events_check_enabled标记。
“Linux电源管理(7)_Wakeup events framework”中已经介绍过pm_wakeup_pending了,让我们再看一遍吧:
/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
* value from the past and return true if new wakeup events have been registered
* since the old value was stored. Also return true if the current number of
* wakeup events being processed is different from zero.
*/
bool pm_wakeup_pending(void)
{
unsigned long flags; // 声明一个无符号长整型变量
bool ret = false; // 声明一个布尔型变量,并初始化为false
char suspend_abort[MAX_SUSPEND_ABORT_LEN]; // 声明一个字符数组
raw_spin_lock_irqsave(&events_lock, flags); // 获取并保存自旋锁的状态
if (events_check_enabled) { // 如果唤醒事件检测已启用
unsigned int cnt, inpr;
split_counters(&cnt, &inpr); // 获取唤醒事件计数器的值和正在处理的唤醒事件数量
ret = (cnt != saved_count || inpr > 0); // 比较唤醒事件计数器的值与保存的旧值,或判断正在处理的唤醒事件数量是否大于零
events_check_enabled = !ret; // 更新唤醒事件检测的状态
}
raw_spin_unlock_irqrestore(&events_lock, flags); // 释放自旋锁并恢复中断状态
if (ret) { // 如果有唤醒事件待处理
pm_pr_dbg("Wakeup pending, aborting suspend\n"); // 打印调试信息
pm_print_active_wakeup_sources(); // 打印活动的唤醒源
pm_get_active_wakeup_sources(suspend_abort, MAX_SUSPEND_ABORT_LEN); // 获取活动的唤醒源信息
log_suspend_abort_reason(suspend_abort); // 记录挂起中止的原因
pr_info("PM: %s\n", suspend_abort); // 打印挂起中止的原因
}
return ret || atomic_read(&pm_abort_suspend) > 0; // 返回唤醒事件是否待处理或挂起中止的标志
}
EXPORT_SYMBOL_GPL(pm_wakeup_pending);
a)首先会判断events_check_enabled是否有效,无效直接返回false。有效的话:
b)获得cnt和inpr,如果cnt不等于saved_count(说明这段时间内有events产生),或者inpr不为0(说明有events正在被处理),返回true(告诉调用者,放弃吧,时机不到)。同时清除events_check_enabled的状态。
c)否则,返回false(放心睡吧),同时保持events_check_enabled的置位状态(以免pm_wakeup_pending再次调用)。
参考文献:http://www.wowotech.net/pm_subsystem/wakeup_count.html/comment-page-2#comments