Documentation_scheduler_sched-rt-group.txt

最新推荐文章于 2025-06-22 10:05:03 发布

翻译最新推荐文章于 2025-06-22 10:05:03 发布 · 1.3k 阅读

本文深入探讨了实时多组调度的概念、接口、设置和未来发展方向，包括如何分配CPU资源、设置运行时间和限制，以及如何通过配置实现非均匀期间的可用性。同时，文章还介绍了实时调度面临的挑战和解决方法，特别提到了使用EDF调度以实现更高效的实时任务处理。此外，文章详细阐述了实时任务与非实时任务之间的资源分配策略，以及如何通过合理设置确保系统的稳定性和性能。

如果想评论或更新本文的内容，请直接联系原文档的维护者。

如果你使用英文交流有困难的话，也可以向中文版维护者求助。

如果本翻译更新不及时或者翻译存在问题，请联系中文版维护者。

中文版维护者：陶莹莉 tyl18768122426@163.com

中文版翻译者：陶莹莉 tyl18768122426@163.com

中文版校译者：陶莹莉 tyl18768122426@163.com

    Real-Time group scheduling
    --------------------------

CONTENTS
========
      实时多组调度
      --------
目录
====

0. WARNING
1. Overview
1.1 The problem
1.2 The solution
2. The interface
2.1 System-wide settings
2.2 Default behaviour
2.3 Basis for grouping tasks
3. Future plans

0.警告
1.概念
1.1问题
1.2解决方案
2.接口
2.1系统设置
2.2默认设置
2.3分组的基础
3.未来计划

0. WARNING
==========

Fiddling with these settings can result in an unstable system, the knobs are
root only and assumes root knows what he is doing.

Most notable:

* very small values in sched_rt_period_us can result in an unstable
system when the period is smaller than either the available hrtimer
resolution, or the time it takes to handle the budget refresh itself.

* very small values in sched_rt_runtime_us can result in an unstable
   system when the runtime is so small the system has difficulty making
   forward progress (NOTE: the migration thread and kstopmachine both
   are real-time processes).

0.警告
=====
拨弄这些设置可能会导致系统不稳定，假设只有根知道这些旋钮正在做什么。

最值得注意的：
*非常小的值的改动在使用附表时都可能会导致系统的不稳定，当使用周期小于可用的定时器
决议的时间，或处理预算刷新的时间。
*非常小的值的改动在sched_rt_runtime_us也可能导致系统的不稳定，当运行时间很小，系统推进进程会有困难
（注：运移线程和kstopmachine的都是实时进程）。

1. Overview
===========
1.概念
======

1.1 The problem
---------------

Realtime scheduling is all about determinism, a group has to be able to rely on
the amount of bandwidth (eg. CPU time) being constant. In order to schedule
multiple groups of realtime tasks, each group must be assigned a fixed portion
of the CPU time available. Without a minimum guarantee a realtime group can
obviously fall short. A fuzzy upper limit is of no use since it cannot be
relied upon. Which leaves us with just the single fixed portion.

1.1问题
-------
实时调度都是确定的，每组可用的带宽量（如：CPU时间）是恒定的。为了调度多组的实时任务，
每一组都必须分配一个固定的可用CPU时间。没有最低保证的话一个实时分组是没有用的。
同样的，模糊的上限也是没用的，因为它不能被依靠。留给我们的是每个的固定比例。

1.2 The solution
----------------
1.2解决方案
-----------
CPU time is divided by means of specifying how much time can be spent running
in a given period. We allocate this "run time" for each realtime group which
the other realtime groups will not be permitted to use.

用特定装置将一个给定的时期中能被用来运行的CPU时间划分，为每个实时组分配一个“运行时间”
这个时间其他实时组不被允许使用。

Any time not allocated to a realtime group will be used to run normal priority
tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by
SCHED_OTHER.

没有被分配给实时组的所有时间将用于其他优先运行的常规任务（SCHED_OTHER）。
没有被使用的被分配的运行时间也可以被其他任务使用。

Let's consider an example: a frame fixed realtime renderer must deliver 25
frames a second, which yields a period of 0.04s per frame. Now say it will also
have to play some music and respond to input, leaving it with around 80% CPU
time dedicated for the graphics. We can then give this group a run time of 0.8
* 0.04s = 0.032s.

让我们来考虑这样一个例子：一个框架固定的实时渲染必须交付25帧每秒，
这将产生一段每帧0.04秒的时间。现在说，它也可以
播放一些音乐并且响应输入，留下80％左右的CPU时间给图形专用。然后我们可以给这个组的运行时间
就为0.8*0.04S=0.032s。

This way the graphics group will have a 0.04s period with a 0.032s run time
limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but
needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
0.00015s. So this group can be scheduled with a period of 0.005s and a run time
of 0.00015s.

这样的话图形组将有一个有0.032s运行时间的限制的0.04秒的期间。如果现在音频线程每0.005S
需要补充DMA缓冲区，但大约只要3％的CPU时间来做这件事，它可以做0.03 *0.005S=0.00015s。
因此，这组可以被预定一段有一个0.00015s运行时间的0.005S的期间。

The remaining CPU time will be used for user input and other tasks. Because-
realtime tasks have explicitly allocated the CPU time they need to perform
their tasks, buffer underruns in the graphics or audio can be eliminated.

剩余的CPU的时间将被用于用户输入和执行其他的任务。因为实时任务已经为他们需要执行
的任务明确分配了CPU时间，所以在图形或音频中的缓冲区欠载可以被消除。

NOTE: the above example is not fully implemented yet. We still
lack an EDF scheduler to make non-uniform periods usable.

注：上面的例子还没有完全实现。我们仍然缺乏一个EDF调度以实现非均匀期间的可用。

2. The Interface
================
2.接口
======

2.1 System wide settings
------------------------
2.1系统设置
The system wide settings are configured under the /proc virtual file system:
系统的设置在/ proc虚拟文件系统下配置：

/proc/sys/kernel/sched_rt_period_us:
The scheduling period that is equivalent to 100% CPU bandwidth

/proc/sys/kernel/sched_rt_period_us:
调度周期相当于100％的CPU带宽
/proc/sys/kernel/sched_rt_runtime_us:

A global limit on how much time realtime scheduling may use. Even without
CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime
processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth
available to all realtime groups.

* Time is specified in us because the interface is s32. This gives an
operating range from 1us to about 35 minutes.
* sched_rt_period_us takes values from 1 to INT_MAX.
* sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
* A run time of -1 specifies runtime == period, ie. no limit.

/proc/sys/kernel/sched_rt_runtime_us:
多少时间实时调度可以使用有一个全球性的限制。即使没有启用CONFIG_RT_GROUP_SCHED，也将迫使为实时
进程保留时间。它标志着随着CONFIG_RT_GROUP_SCHED的启用，总带宽被提供给所有实时组。
2.2 Default behaviour
---------------------
2.2 默认设置
---------------------
The default values for sched_rt_period_us (1000000 or 1s) and
sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by
SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
realtime tasks will not lock up the machine but leave a little time to recover
it. By setting runtime to -1 you'd get the old behaviour back.

默认值为sched_rt_period_us（1000000或1）和sched_rt_runtime_us（9500000.95s）。SCHED_OTHER（非RT任务）
给出了0.05秒来供使用。通过这些默认设置的选择，使运行实时任务时将不锁机，而是留下一点时间来恢复。
通过将运行时间设置为-1，你可以恢复到原来的设置。

By default all bandwidth is assigned to the root group and new groups get the
period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
want to assign bandwidth to another group, reduce the root group's bandwidth
and assign some or all of the difference to another group.

默认情况下，所有带宽被分配给根组而新组得到
从/ proc / sys/kernel/ sched_rt_period_us获得一个时期和一个值为0的运行时间。如果您
希望为另一组分配带宽，减少根组的带宽并将减少的部分或全部分配给另一个组。

Realtime group scheduling means you have to assign a portion of total CPU
bandwidth to the group before it will accept realtime tasks. Therefore you will
not be able to run realtime tasks as any user other than root until you have
done that, even if the user has the rights to run processes with realtime
priority!

实时调度意味着你必须从所有的CPU带宽中指定一部分分配给将要接受实时任务的组。因此，在你这样处理之前，你将会
无法作为root以外的任何用户运行实时任务，即使这个用户有优先权限运行实时进程！

2.3 Basis for grouping tasks
----------------------------
2.3 分组的基础
----------------------------

Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
CPU bandwidth to task groups.

授权的CONFIG_RT_GROUP_SCHED让你能明确地给任务组分配真正的CPU带宽。

This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
to control the CPU time reserved for each control group.

这是使用cgroup虚拟文件系统和“<cgroup>/ cpu.rt_runtime_us”来控制为每个控制组保留CPU时间。

For more information on working with control groups, you should read
Documentation/cgroups/cgroups.txt as well.

欲了解控制组更多的信息，你应该阅读Documentation/ cgroups/ cgroups.txt为好。

Group settings are checked against the following limits in order to keep the
configuration schedulable:

\Sum_{i} runtime_{i} / global_period <= global_runtime / global_period

For now, this can be simplified to just the following (but see Future plans):

\Sum_{i} runtime_{i} <= global_runtime

为了使配置可调度，检查组设置在以下限额内，：

\ Sum_{}{} runtime_/ global_period<= global_runtime/ global_period

现在为止，这只可以实现到如下所示（请看未来计划）：

\ Sum_{} runtime_{}<= global_runtime

3. Future plans
===============
3.未来计划
===============

There is work in progress to make the scheduling period for each group
("<cgroup>/cpu.rt_period_us") configurable as well.

这是一个进展中的工作，使调度周期为每个组（“<cgroup>/ cpu.rt_period_us）也配置好。

The constraint on the period is that a subgroup must have a smaller or
equal period to its parent. But realistically its not very useful _yet_
as its prone to starvation without deadline scheduling.

期间上的约束是一个子组必须具有更小或等于它的父期间。但实际上它不是非常有用
因为限期调度它容易产生饥饿。

Consider two sibling groups A and B; both have 50% bandwidth, but A's
period is twice the length of B's.

* group A: period=100000us, runtime=10000us
- this runs for 0.01s once every 0.1s

* group B: period= 50000us, runtime=10000us
- this runs for 0.01s twice every 0.1s (or once every 0.05 sec).

This means that currently a while (1) loop in A will run for the full period of
B and can starve B's tasks (assuming they are of lower priority) for a whole
period.

考虑两个同级的A组和B组，都有50％的带宽，但是A的周期长度为B的两倍。

A组：周期=100000us，运行时间=10000us
- 每0.1秒运行0.01秒一次

B组：周期=50000us，运行时间=10000us
- 每0.1秒运行0.01秒两次（或每0.05秒运行一次）。

这意味着一般而言，A的一个while循环在B的整个周期中会一直运行这可能会在整个周期饿死B的任务（假设他们是低优先级的）。

The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring
full deadline scheduling to the linux kernel. Deadline scheduling the above
groups and treating end of the period as a deadline will ensure that they both
get their allocated time.

下一个项目是SCHED_EDF（优先调度）给linux内核所带来的限期调度。对上述组进行限期调度并
将处理的结束期间的作为时限将确保他们都得到他们的分配时间。

Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now)).

实施SCHED_EDF可能需要一段时间才能完成。优先级继承是最大挑战因为当前Linux PI基础设施面向的是
有限的静态优先级0-99的标准。随着时限调度你不得不做限期继承（因为优先级与到限期日的剩余时间成反比
（截止期限 - 现在））。

This means the whole PI machinery will have to be reworked - and that is one of
the most complex pieces of code we have.

这意味着整个PI机器将需要重新加工 - 即使这将是我们所拥有的最复杂的代码之一。