rust, smol
Smol 是一个轻量级的异步io运行时刻库,实现了高效率的多线程 Executor 和 Reactor,支持定时器。smol 对异步 Socket I/O 也做了很好的封装,同时使用独立线程池的机制支持异步化的同步操作(Blocking Executor)。目前 async-std 正是使用 smol 作为基础的运行库。
Dependencies
- async-task, as the abstrction of task or runnable
- blocking, to block async I/O, in its dedicated thread pool, which has nothing to do with smol Executor
- concurrent-queue, a concurrent multi-producer multi-consumer queue
- fastrand, a simple random number generator
- futures-io
- futures-util
- libc
- once_cell, cell but only initialized once. used to initialize global variables
- scoped-tls, scoped_thread_local!() macro
- slab, just like Vec, but return index when inserted
- socket2, provide direct access to the system’s functionality for sockets
Cross-platform, I/O multiplexing
Reactor is built on top of:
- Linux, [epoll]: https://en.wikipedia.org/wiki/Epoll
- BSD, [kqueue]: https://en.wikipedia.org/wiki/Kqueue
- Windows, [wepoll]: https://github.com/piscisaureus/wepoll
Executors
- Thread-local executor for tasks created by Task::local().
- Work-stealing executor for tasks created by Task::spawn().
- Blocking executor for tasks created by Task::blocking(), blocking!, iter(), reader() and writer().
Task
Task是对 async-task 的包装。启动/spawn 一个task实际上是生成一个 async-task,将其放入当前线程的 Worker 任务队列,或者当 Worker 不存在时,放入全局队列。同时返回 Task handle。Task 随后将被 Executor 调度执行。
生成一个 Task 需要提供 一个 Future。在内部,Future 包装入 RawTask, 并同时被 runnable 和 handle 引用。Rawtask 包含了Task的基本定义,如状态,waker,输出等等。runnable 是基本调度单位,被推入队列,择机执行。runnable.run() 被用来最终执行 Future。而 handle 也是一个 Future,根据 RawTask 的状态返回 Pending/Ready,推动 Executor 运行。这部分有一些复杂,需要仔细研究代码。
pub struct Task<T>(pub(crate) Option<async_task::JoinHandle<T, ()>>);
/// Raw pointers to the fields inside a task.
pub(crate) struct RawTask<F, R, S, T> {
/// The task header.
pub(crate) header: *const Header,
/// The schedule function.
pub(crate) schedule: *const S,
/// The tag inside the task.
pub(crate) tag: *mut T,
/// The future.
pub(crate) future: *mut F,
/// The output of the future.
pub(crate) output: *mut R,
}
...
pub fn spawn(future: impl Future<Output = T> + Send + 'static) -> Task<T> {
QUEUE.spawn(future)
}
...
pub fn QUEUE::spawn<T: Send + 'static>(
&self,
future: impl Future<Output = T> + Send + 'static,
) -> Task<T> {
let global = self.global.clone();
// The function that schedules a runnable task when it gets woken up.
let schedule = move |runnable| {
if WORKER.is_set() {
WORKER.with(|w| {
if Arc::ptr_eq(&global, &w.global) {
if let Err(err) = w.shard.push(runnable) {
global.queue.push(err.into_inner()).unwrap();
}
} else {
global.queue.push(runnable).unwrap();
}
});
} else {
global.queue.push(runnable).unwrap();
}
global.notify();
};
// Create a task, push it into the queue by scheduling it, and return its `Task` handle.
let (runnable, handle) = async_task::spawn(future, schedule, ());
runnable.schedule();
Task(Some(handle))
}
Blocking Executor
Blocking Executor 比较简单。大致等于 block_on 的逻辑。提供的功能是将同步的操作放到一个线程池上去执行。这样的功能其实比较重要,可以在 async-std 中看到,对于常规文件操作都是通过这种方式来实现。
基本的原理是创建一个任务,将其放入全局队列。线程池会轮询这个队列,从中提取任务执行。任务执行的结果通过 async-task的 JoinHandle 这个Future 返回给 Blocking Executor。
/// The blocking executor. 这是一个全局单例的数据结构。惰性初始化
pub(crate) struct BlockingExecutor {
/// The current state of the executor.
state: Mutex<State>,
/// Used to put idle threads to sleep and wake them up when new work comes in.
cvar: Condvar,
}
/// Current state of the blocking executor. 这里也记录了所使用到的线程池的状态。
struct State {
/// Number of idle threads in the pool.
///
/// Idle threads are sleeping, waiting to get a task to run.
idle_count: usize,
/// Total number of thread in the pool.
///
/// This is the number of idle threads + the number of active threads.
thread_count: usize,
/// The queue of blocking tasks.
queue: VecDeque<Runnable>,
}
...
impl BlockingExecutor {
/// Returns a reference to the blocking executor.
pub fn get() -> &'static BlockingExecutor {
static EXECUTOR: Lazy<BlockingExecutor> = Lazy::new(|| BlockingExecutor {
state: Mutex::new(State {
idle_count: 0,
thread_count: 0,
queue: VecDeque::new(),
}),
cvar: Condvar::new(),
});
&EXECUTOR
}
// 注意这里创建task的方式。并没有使用task::spawn,schedule 的方式也是 blocking executor 特有的
pub fn spawn<T: Send + 'static>(
&'static self,
future: impl Future<Output = T> + Send + 'static,
) -> Task<T> {
// Create a task, schedule it, and return its `Task` handle.
let (runnable, handle) = async_task::spawn(future, move |r| self.schedule(r), ());
runnable.schedule();
Task(Some(handle))
}
// 线程的主循环。执行这个任务。
fn main_loop(&'static self) {
let mut state = self.state.lock().unwrap();
loop {
// This thread is not idle anymore because it's going to run tasks.
state.idle_count -= 1;
// Run tasks in the queue.
while let Some(runnable) = state.queue.pop_front() {
// We have found a task - grow the pool if needed.
self.grow_pool(state);
// Run the task.
let _ = panic::catch_unwind(|| runnable.run());
// Re-lock the state and continue.
state = self.state.lock().unwrap();
}
// This thread is now becoming idle.
state.idle_count += 1;
// Put the thread to sleep until another task is scheduled.
let timeout = Duration::from_millis(500);
let (s, res) = self.cvar.wait_timeout(state, timeout).unwrap();
state = s;
// If there are no tasks after a while, stop this thread.
if res.timed_out() && state.queue.is_empty() {
state.idle_count -= 1;
state.thread_count -= 1;
break;
}
}
}
/// Schedules a runnable task for execution. 任务推入全局队列。
fn schedule(&'static self, runnable: Runnable) {
let mut state = self.state.lock().unwrap();
state.queue.push_back(runnable);
// Notify a sleeping thread and spawn more threads if needed.
self.cvar.notify_one();
self.grow_pool(state);
}
Local Executor
Task::spawn_local 会创建一个在 local executor上执行的任务。使用 worker local queue
pub fn spawn_local<T: 'static>(&self, future: impl Future<Output = T> + 'static) -> Task<T> {
let queue = self.local.queue.clone();
let callback = self.callback.clone();
let id = thread_id();
// The function that schedules a runnable task when it gets woken up.
let schedule = move |runnable| {
if thread_id() == id && WORKER.is_set() {
WORKER.with(|w| {
if Arc::ptr_eq(&queue, &w.local.queue) {
w.local.push(runnable).unwrap();
} else {
queue.push(runnable).unwrap();
}
});
} else {
queue.push(runnable).unwrap();
}
callback.call();
};
// Create a task, push it into the queue by scheduling it, and return its `Task` handle.
let (runnable, handle) = async_task::spawn_local(future, schedule, ());
runnable.schedule();
Task(Some(handle))
}
Work-stealing Executor
Task::spawn 会创建一个在多线程任务窃取 executor上执行的任务。
Multi-thread
To start multi-threaded executor, we have to create the OS threads explicitly:
for _ in 0..num_threads {
thread::spawn(|| smol::run( future::pending::<u8>()));
}
smol::run
执行executor,轮询reactor。至少需要在一个线程中被调用 。如果顶层被执行的 future 返回ready,那么函数将会返回,execturo 线程退出。所以一般在启动executor时,会将传入的顶层future设置为 future::pending(),永远返回 pending。这种情况下,excutor 会尝试从队列里获取已经启动(spawned)的任务并执行。如果当前没有任务可以执行,则挂起在 Reactor 上等待 I/O 或定时器事件。
QUEUE.worker(move || unparker.unpark()),会创建一个 Executor的 worker。worker 维护了所有运行任务的队列。注意worker 被放入 scoped TLS,与 闭包 loop 循环相关。 WORKER.set(&worker ...
pub fn run<T>(future: impl Future<Output = T>) -> T {
let parker = Parker::new();
let unparker = parker.unparker();
// 创建一个worker
let worker = QUEUE.worker(move || unparker.unpark());
// Create a waker that triggers an I/O event in the thread-local scheduler.
let unparker = parker.unparker();
let waker = async_task::waker_fn(move || unparker.unpark());
let cx = &mut Context::from_waker(&waker);
futures_util::pin_mut!(future);
// Set up tokio if enabled.
context::enter(|| {
WORKER.set(&worker, || {
'start: loop {
// Poll the main future.
if let Poll::Ready(val) = future.as_mut().poll(cx) {
return val;
}
for _ in 0..200 {
// Take函数会试图执行任务
if !worker.tick() {
// 暂时没有任务,挂起reactor
parker.park();
continue 'start;
}
}
// Process ready I/O events without blocking.
parker.park_timeout(Duration::from_secs(0));
}
})
})
}
如果当前没有任何可以执行的 task,Executor 会调用parker.park() 从而进入 reactor 。
Call stack:
ntdll!ZwRemoveIoCompletionEx 0x00007ffec7e6db84
KERNELBASE!GetQueuedCompletionStatusEx 0x00007ffec5b07414
port__poll wepoll.c:1235
port_wait wepoll.c:1292
epoll_wait wepoll.c:680
smol::reactor::sys::Reactor::wait reactor.rs:820
smol::reactor::ReactorLock::react reactor.rs:241
smol::parking::Inner::park parking.rs:203
smol::parking::Parker::park parking.rs:40
smol::run::run::{{closure}}::{{closure}} run.rs:125
scoped_tls::ScopedKey<T>::set lib.rs:137
smol::run::run::{{closure}} run.rs:116
smol::context::enter context.rs:8
smol::run::run run.rs:115
second::main::{{closure}} main.rs:110
Queue::worker()
Worker 维护几个不同的任务队列,实现任务运行的策略。
pub fn worker(&self, notify: impl Fn() + Send + Sync + 'static) -> Worker {
let mut shards = self.global.shards.write().unwrap();
let vacant = shards.vacant_entry();
// Create a worker and put its stealer handle into the executor.
let worker = Worker {
key: vacant.key(),
global: Arc::new(self.global.clone()),
shard: SlotQueue {
slot: Cell::new(None),
queue: Arc::new(ConcurrentQueue::bounded(512)),
},
local: SlotQueue {
slot: Cell::new(None),
queue: Arc::new(ConcurrentQueue::unbounded()),
},
callback: Callback::new(notify),
sleeping: Cell::new(false),
ticker: Cell::new(0),
};
vacant.insert(worker.shard.queue.clone());
worker
}
worker.tick()
从队列里获取一个任务并执行。self.search() 实现了任务的窃取机制。
/// Runs a single task and returns `true` if one was found.
pub fn tick(&self) -> bool {
loop {
match self.search() {
None => {
// Move to sleeping and unnotified state.
if !self.sleep() {
// If already sleeping and unnotified, return.
return false;
}
}
Some(r) => {
// Wake up.
if !self.wake() {
// If already woken, notify another worker.
self.global.notify();
}
// Bump the ticker.
let ticker = self.ticker.get();
self.ticker.set(ticker.wrapping_add(1));
// Flush slots to ensure fair task scheduling.
if ticker % 16 == 0 {
if let Err(err) = self.shard.flush() {
self.global.queue.push(err.into_inner()).unwrap();
self.global.notify();
}
self.local.flush().unwrap();
}
// Steal tasks from the global queue to ensure fair task scheduling.
if ticker % 64 == 0 {
self.shard.steal(&self.global.queue);
}
// Run the task.
if WORKER.set(self, || r.run()) {
// The task was woken while it was running, which means it got
// scheduled the moment running completed. Therefore, it is now inside
// the slot and would be the next task to run.
//
// Instead of re-running the task in the next iteration, let's flush
// the slot in order to give other tasks a chance to run.
//
// This is a necessary step to ensure task yielding works as expected.
// If a task wakes itself and returns `Poll::Pending`, we don't want it
// to run immediately after that because that'd defeat the whole
// purpose of yielding.
if let Err(err) = self.shard.flush() {
self.global.queue.push(err.into_inner()).unwrap();
self.global.notify();
}
self.local.flush().unwrap();
}
return true;
}
}
}
}
parker.park()
Parker 实现了从executor 进入reactor 的过程。
park 调用Reactor 进行 epoll 等待I/O 事件或者定时器。因为reactor 被一个mutex 保护。因此,当运行了多线程的executor ,在任一时刻只会有一个executor 持有 reactor。未能获取到 Reactor的executor 将会阻塞进入睡眠,直到被 I/O事件唤醒。
pub(crate) struct Parker {
key: Cell<Option<usize>>,
unparker: Unparker,
}
pub(crate) struct Unparker {
inner: Arc<Inner>,
}
fn park(&self, timeout: Option<Duration>) -> bool {
...
// 这里尝试获取reactor,这个lock 会一直持有。
let mut reactor_lock = Reactor::get().try_lock();
let state = match reactor_lock {
None => PARKED,
Some(_) => POLLING,
};
let mut m = self.lock.lock().unwrap();
match self.state.compare_exchange(EMPTY, state, SeqCst, SeqCst) {
Ok(_) => {}
// Consume this notification to avoid spurious wakeups in the next park.
Err(NOTIFIED) => {
// We must read `state` here, even though we know it will be `NOTIFIED`. This is
// because `unpark` may have been called again since we read `NOTIFIED` in the
// `compare_exchange` above. We must perform an acquire operation that synchronizes
// with that `unpark` to observe any writes it made before the call to `unpark`. To
// do that we must read from the write it made to `state`.
let old = self.state.swap(EMPTY, SeqCst);
assert_eq!(old, NOTIFIED, "park state changed unexpectedly");
return true;
}
Err(n) => panic!("inconsistent park_timeout state: {}", n),
}
match timeout {
None => {
loop {
// Block the current thread on the conditional variable.
match &mut reactor_lock {
// 没有获得 lock的 executor thread, 阻塞在这个条件变量上。
None => m = self.cvar.wait(m).unwrap(),
Some(reactor_lock) => {
drop(m);
//println!("lock on tid={:?}", std::thread::current().id());
reactor_lock.react(None).expect("failure while polling I/O");
m = self.lock.lock().unwrap();
}
}
match self.state.compare_exchange(NOTIFIED, EMPTY, SeqCst, SeqCst) {
Ok(_) => return true, // got a notification
Err(_) => {} // spurious wakeup, go back to sleep
}
}
}
Some(timeout) => {
// Wait with a timeout, and if we spuriously wake up or otherwise wake up from a
// notification we just want to unconditionally set `state` back to `EMPTY`, either
// consuming a notification or un-flagging ourselves as parked.
let _m = match reactor_lock.as_mut() {
None => self.cvar.wait_timeout(m, timeout).unwrap().0,
Some(reactor_lock) => {
drop(m);
let deadline = Instant::now() + timeout;
loop {
reactor_lock
.react(Some(deadline.saturating_duration_since(Instant::now())))
.expect("failure while polling I/O");
if Instant::now() >= deadline {
break;
}
}
self.lock.lock().unwrap()
}
};
match self.state.swap(EMPTY, SeqCst) {
NOTIFIED => true, // got a notification
PARKED | POLLING => false, // no notification
n => panic!("inconsistent park_timeout state: {}", n),
}
}
}
}
Reactor.react
处理定时器以及IO 事件。当IO事件发生,调用相应的 waker 。
/// Processes new events, blocking until the first event or the timeout.
pub fn react(&mut self, timeout: Option<Duration>) -> io::Result<()> {
// Fire timers.
let next_timer = self.reactor.fire_timers();
// compute the timeout for blocking on I/O events.
let timeout = match (next_timer, timeout) {
(None, None) => None,
(Some(t), None) | (None, Some(t)) => Some(t),
(Some(a), Some(b)) => Some(a.min(b)),
};
// Bump the ticker before polling I/O.
let tick = self
.reactor
.ticker
.fetch_add(1, Ordering::SeqCst)
.wrapping_add(1);
// Block on I/O events.
match self.reactor.sys.wait(&mut self.events, timeout) {
// No I/O events occurred.
Ok(0) => {
if timeout != Some(Duration::from_secs(0)) {
// The non-zero timeout was hit so fire ready timers.
self.reactor.fire_timers();
}
Ok(())
}
// At least one I/O event occurred.
Ok(_) => {
// Iterate over sources in the event list.
let sources = self.reactor.sources.lock().unwrap();
let mut ready = Vec::new();
for ev in self.events.iter() {
// Check if there is a source in the table with this key.
if let Some(source) = sources.get(ev.key) {
let mut wakers = source.wakers.lock().unwrap();
// Wake readers if a readability event was emitted.
if ev.readable {
wakers.tick_readable = tick;
ready.append(&mut wakers.readers);
}
// Wake writers if a writability event was emitted.
if ev.writable {
wakers.tick_writable = tick;
ready.append(&mut wakers.writers);
}
// Re-register if there are still writers or
// readers. The can happen if e.g. we were
// previously interested in both readability and
// writability, but only one of them was emitted.
if !(wakers.writers.is_empty() && wakers.readers.is_empty()) {
self.reactor.sys.reregister(
source.raw,
source.key,
!wakers.readers.is_empty(),
!wakers.writers.is_empty(),
)?;
}
}
}
// Drop the lock before waking.
drop(sources);
// Wake up tasks waiting on I/O.
for waker in ready {
waker.wake();
}
Ok(())
}
...
}
Wake up
空闲时,多线程的 Executor 中其中一个 worker 阻塞在 Reactor 上等待 I/O,其他 worker 则阻塞于 Parker 条件变量上。一旦 I/O 事件到来,Reactor 所在的 Worker 会调用对应的 Waker,unpark 相应的 parking worker,进而恢复执行 Task。这样,Task 有机会继续轮询自己的 Future 的状态。
smol::parking::Inner::unpark parking.rs:268
smol::parking::Unparker::unpark parking.rs:111
smol::run::run::{{closure}} run.rs:110
async_task::waker_fn::Helper<F>::wake waker_fn.rs:32
core::task::wake::Waker::wake wake.rs:241
smol::reactor::ReactorLock::react reactor.rs:294
smol::parking::Inner::park parking.rs:203
smol::parking::Parker::park parking.rs:40
smol::run::run::{{closure}}::{{closure}} run.rs:125
scoped_tls::ScopedKey<T>::set lib.rs:137
smol::run::run::{{closure}} run.rs:116
smol::context::enter context.rs:8
smol::run::run run.rs:115
second::main::{{closure}} main.rs:110
Reactor
TBD
Async IO
TBD
Timer
定时器的实现依赖于 Reactor 的epoll 的超时机制。Reactor 使用BTree 存放所有定时器,每次进入Reactor epoll wait,会使用下一个即将超时的定时器时间值作为 wait 的超时参数,此时有任何 I/O 事件,则处理I/O,然后继续以更新后的定时器超时值作为参数继续等待。否则,wait 返回即以为定时器超时,移除定时器,调用 waker。
定时器本质上是一个 Future,首次轮询时,会插入定时器到 Reactor,等待唤醒。代码如下:
impl Future for Timer {
type Output = Instant;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// Check if the timer has already fired.
if Instant::now() >= self.when {
if let Some(id) = self.id.take() {
// Deregister the timer from the reactor.
Reactor::get().remove_timer(self.when, id);
}
Poll::Ready(self.when)
} else {
if self.id.is_none() {
// Register the timer in the reactor.
self.id = Some(Reactor::get().insert_timer(self.when, cx.waker()));
}
Poll::Pending
}
}
}
Timer Btree
一个按照元组(时间,ID)为key组织的排序数据结构 BtreeMap。其中,时间代表定时器到期时刻,ID 是一个按照定时器创建顺序递增的惟一值。
BtreeMap.split_off(key) 可以方便地将已超时的所有定时器移出BtreeMap。
/// An ordered map of registered timers.
///
/// Timers are in the order in which they fire. The `usize` in this type is a timer ID used to
/// distinguish timers that fire at the same time. The `Waker` represents the task awaiting the
/// timer.
timers: Mutex<BTreeMap<(Instant, usize), Waker>>,
Reactor 定时器超时处理逻辑
已超时的定时器将会被移出,应依次调用其waker 函数。
fn fire_timers(&self) -> Option<Duration> {
let mut timers = self.timers.lock().unwrap();
// Process timer operations, but no more than the queue capacity because otherwise we could
// keep popping operations forever.
for _ in 0..self.timer_ops.capacity().unwrap() {
match self.timer_ops.pop() {
Ok(TimerOp::Insert(when, id, waker)) => {
timers.insert((when, id), waker);
}
Ok(TimerOp::Remove(when, id)) => {
timers.remove(&(when, id));
}
Err(_) => break,
}
}
let now = Instant::now();
// Split timers into ready and pending timers.
let pending = timers.split_off(&(now, 0));
let ready = mem::replace(&mut *timers, pending);
// Calculate the duration until the next event.
let dur = if ready.is_empty() {
// Duration until the next timer.
timers
.keys()
.next()
.map(|(when, _)| when.saturating_duration_since(now))
} else {
// Timers are about to fire right now.
Some(Duration::from_secs(0))
};
// Drop the lock before waking.
drop(timers);
// Wake up tasks waiting on timers.
for (_, waker) in ready {
waker.wake();
}
dur
}