cgroup调度_cgroup autogroup-优快云博客

本文链接：https://blog.youkuaiyun.com/u012441962/article/details/121903516

一、前言

调度组的本质是将进程组作为一个调度的实体，这样做的目的是对类似桌面系统这样的环境，cfs公平分配的情况下，将进程放在组中的，防止单一的进程抢占更多的资源，导致桌面的进程分配更低的资源，例如10个单一进程，每个进程分配约1/10的cpu资源，如果将9个进程放到组中，那么这个组会和另外一个进程平分资源。cgroup也是基于此来进行的资源隔离控制。

二、数据结构
1、调度组的数据结构定义在sched.h中，每个组创建后都会放到task_groups这个全局变量为头的链表中（sched_init中是这样操作的）。
2、实际创建组的过程，se、cfs_rq是创建在以root_task_group为根的红黑树中的，也就是说组的遍历查找是在此红黑树上进行的。

struct task_group {
struct cgroup_subsys_state css; //cgroup相关的一些数据结构

#ifdef CONFIG_FAIR_GROUP_SCHED
/* schedulable entities of this group on each CPU */
struct sched_entity *se; //调度器遍历到此成员判断当前se是组还是进程，若不是组运行，反之继续向下遍历
/ runqueue “owned” by this group on each CPU */
struct cfs_rq **cfs_rq; //用于指向每个cpu运行队列的对象，每个cpu运行队列是一个红黑树
unsigned long shares; //此组全部成员的权重

#ifdef CONFIG_SMP
/*
* load_avg can be heavily contended at clock tick time, so put
* it in its own cacheline separated from the fields above which
* will also be accessed at each tick.
*/
atomic_long_t load_avg ____cacheline_aligned;
#endif
#endif

#ifdef CONFIG_RT_GROUP_SCHED //实时组调度的成员，走实时进程调度器
struct sched_rt_entity **rt_se;
struct rt_rq **rt_rq;

struct rt_bandwidth	rt_bandwidth;

#endif

struct rcu_head		rcu;
struct list_head	list;

struct task_group	*parent;
struct list_head	siblings;
struct list_head	children;

#ifdef CONFIG_SCHED_AUTOGROUP
struct autogroup *autogroup;
#endif

struct cfs_bandwidth	cfs_bandwidth; //cfs 带宽控制

};

3、以cgroup为例子创建组的数据结构，本质上就是以root_task_group为根对runqueue构建红黑树的过程：

三、autogroup与组调度的关系
autogroup创建组和cgroup是一样的，本质上都是调用sched_create_group创建组，调度方式也一样，只不过cgroup在用户态提供了更多的组控制接口。而autogroup只在用户态提供了组的renice接口，并且autogroup只能通过setsid这个系统调用来自动创建组，例如打开一个终端会调用此接口。

3.1 autogroup创建组的过程(基于ftrace的追踪结果)
打开一个终端 --> ksys_setsid() --> sched_autogroup_create_attach --> sched_create_group --> sched_online_group --> autogroup_move_group --> sched_move_task --> sched_change_group --> enqueue_task_fair --> set_curr_task_fair

在当前终端中运行进程，fork时会判断sched_autogroup_enabled是否使能，若使能则将进程放到当前创建的组下，否则会以root_task_group为根（组），不进行分组的相关操作。
autogroup_move_group将以root_task_group为根的进程放到autogroup下，形成自己的组。

这个操作是在sched_autogroup_create_attach函数下实现的，
void sched_autogroup_create_attach(struct task_struct *p)
{
struct autogroup *ag = autogroup_create(); //首先创建一个以root_task_group为根的组

autogroup_move_group(p, ag); //将所有当前进程放入到创建的autogroup中

/* Drop extra reference added by autogroup_create(): */
autogroup_kref_put(ag);

}

3.2 调度时对组的处理
pick_next_task_fair中如果发现当前查找到的进程为组，则会继续向下查找，直到找到一个可运行的进程为止。init_tg_cfs_entry中对组中se->my_q进行了初始化，通过这个依据来判断当前节点是进程还是进程组。

do{
…
se = pick_next_entity(cfs_rq, curr);
cfs_rq = group_cfs_rq(se); //找到一个se->my_q为Null的进程
} while (cfs_rq);

而对非fair_sched_class的进程，采用simple方式进行进程查找。

3.3 组调度的性能
从目前的理解上来看，开启分组调度CONFIG_FAIR_GROUP_SCHED，pick_next_task_fair中在进行进程查找时的复杂度要高于不打开组调度的方式，分组的方式需要额外的进行组的判断。但从资源隔离角度来讲，分组的方式可以人为的将资源更好的分配控制。
对于性能这边，我在测试中发现，使能autogroup，将测试跑在一个组上时，unixbench的context switch 会降低，其实也可以理解为分组的方式，把原本测试可以分到更多的资源，反而由于分组的原因，反而降低了。
例如：
不采用分组的方式，理想情况下，unixbench可以分到1/3的时间片

如果采用分组的方式，理想条件下，shell和unixbench的组分到1/2的时间片

大致的树形结构：
在负载均衡之后，得到平衡的二叉树，process1与autogroup1分配到相同的时间片，unixbench与shell cmd共享autogroup1的时间片，则shell cmd与unixbench分到1/4的时间片，unixbench分到的时间片比不分组的情况下要低。

附录
参考文章：
http://lkml.iu.edu/hypermail/linux/kernel/1611.3/00766.html
https://lwn.net/Articles/240474/