OSv—Optimizing the Operating System for Virtual Machines 3

文章详细介绍了OSv操作系统如何优化网络通道处理和线程调度,解决并发访问和数据共享间的锁、缓存行竞争问题,通过引入单一生产者消费者队列和简化锁机制提升性能。此外,阐述了OSv线程调度的核心设计,包括无锁、抢占式、无周期时钟中断等特性,以及如何实现公平、高效的调度算法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



2.3 network channels
每个云操作系统必须提供高可用的TCP/IP协议栈, OSv实现的是依据Van Jacobson’s net channelideas [10]


从图a, 所有的中断内容(硬中断和软中断)和程序(packets)内容的处理都在网络协议栈,这就造成在大量socket连接及数据处理时了内容访问和数据共享之间的lock、cache line的竞争,导致性能恶化。


为了解决上述竞争问题,在OSv中所有的packet处理是有一个线程, 当处理packets接收, 使用一个简单分类器 channel, 这是一个单例的生产者和消费者队列处理packet的转发到相应的应用程序,
每个管道管理一个简单的数据流类型, 比如TCP连接或者UDP路径是直接由网卡到socket。


另外,现在只有一个线程访问数据,locking变得简单多了,同时减少run-time和maintenance的overhead


图C, 
1:因为socket receive buffer 和 send buffer现在在同一个线程,所以并不需要拆分出两个lock。
2:等待队列已经减少和代替 内部交换的排它锁的数据同步问题。
3:TCP lock和socket lock已经合并,因此现在处理TCP都伴随着处理socket的内容。 


2.4 线程调度
OSv线程调度设计的核心lock-free, preemptive, tick-less, fair, scalable and efficient.
lock-free: section 2.2已经解释了, OSV的调度也不应该使用spin-lock ,显然的也不能使用sleep mutex


preemptive: OSv完全支持抢占式多任务的, 线程自动的被调度(waiting, yielding or waking up), 任意一种都可以在任何时间发生,
抢占式被由中断出发, 所有线程都可以被抢占,并不区分app和kernel。 使用preemp-disable计数器一个线程可以临时的避免被抢占,
这个功能在一些case非常有用,比如 保持每个CPU可以 或者 RCU locks。 在一个现在已经禁用抢占式的时候接收不到中断型号,但是一旦该线程启用了抢占式,这个线程最终还是会接收到以前中断信号。


Tick-less : 大多数经典的kernel,甚至最新的kernel, 都利用周期时钟中断, 比如TICK, Tick会造成周期性的调度。 内核为每个线程都标记时间片,然后分配给时间片给每个线程。

Tick带来很多方便,但是也有好多不利的方面,最重要的就是执行时钟中断浪费cpu时间,另外一个事实就是在一个虚拟机中执行中断比在物理机上要更慢, 还有涉及到hypervisor退出问题。
由于ticks在虚拟化方面的缺点, OSv实现了自己的设计,使用high resolution clock, 调度器计算每个线程的精确的执行时间,使用其近似值代替ticks, 不过有时候时钟中断也会被用到。
无论如何公平调度算法决定哪个线程运行, 也计算出该线程什么时候需要切换到下一个线程。 调度程序也使用延迟值避免频繁切换在两个busy thread。 默认的延迟值为2ms, 两个busy thread就会
造成4ms的时间片的延迟, 调度器不在再1s内执行超过500个的时钟中断, 如果没有多个线程不断的出现竞争这个值会更低。


Fair: 每次重新调度, 调度器比句觉得哪个CPU将要运行,且需要执行多少时间。 一个公平的调度器必须记录每个线程的run-time, 实现公平调度或者实现针对不同优先级别的线程提高响应的百分比。
但是如果使用每个线程的总体运行时间图调度会马上找出不平衡。 比如:如果一个现在没有使用线程超过10s 而接下来将要运行, 这个会造成这个线程接下来会运行10s,这样调度器就实现的公平分配时间。
相反,我们通过使用run-time的历史值来实现公平运行。
OSv调度器计算每个线程的最近run time的指数衰竭值。调度器会选择使用最低的平均值来确定哪个线程运行, 同时计算这个线程可以运行的的时间。
我们的平均运行时间是一个浮点值, 有意思的是有些kernel里面是禁止使用浮点数的。这也是无奈的选择,但是让kernel里面允许浮点数的原因是OSv不区分kernel和user space。
最大的困惑是阻塞在实现使用平均运行实现它的可伸缩性。 每个调度器去更新所有线程的的平均运行时间有点不切实际。





Here are some possible ways to optimize the previous code: 1. Vectorize the calculations: Instead of using nested loops to compute the responsibility matrix, we can use vectorized operations to speed up the computation. For example, we can use broadcasting to compute the Euclidean distance between each pair of points in a matrix form. Similarly, we can use matrix multiplication to compute the weighted sums of the point clouds. ```python def em_for_alignment(xs: np.ndarray, ys: np.ndarray, num_iter: int = 10) -> Tuple[np.ndarray, np.ndarray]: """ The em algorithm for aligning two point clouds based on affine transformation :param xs: a set of points with size (N, D), N is the number of samples, D is the dimension of points :param ys: a set of points with size (M, D), M is the number of samples, D is the dimension of points :param num_iter: the number of EM iterations :return: ys_new: the aligned points: ys_new = ys @ affine + translation responsibility: the responsibility matrix P=[p(y_m | x_n)] with size (N, M), whose elements indicating the correspondence between the points """ # initialize the affine matrix and translation vector affine = np.eye(xs.shape[1]) translation = np.zeros(xs.shape[1]) # initialize the responsibility matrix responsibility = np.zeros((xs.shape[0], ys.shape[0])) for i in range(num_iter): # E-step: compute the responsibility matrix diff = xs[:, np.newaxis, :] - ys[np.newaxis, :, :] sq_dist = np.sum(diff ** 2, axis=-1) responsibility = np.exp(-0.5 * sq_dist) / (2 * np.pi) ** (xs.shape[1] / 2) responsibility /= np.sum(responsibility, axis=1, keepdims=True) # M-step: update the affine matrix and translation vector xs_weighted = responsibility.T @ xs ys_weighted = responsibility.T @ ys affine, _, _, _ = np.linalg.lstsq(xs_weighted, ys_weighted, rcond=None) translation = np.mean(ys, axis=0) - np.mean(xs @ affine, axis=0) # compute the aligned points ys_new = ys @ affine + translation return ys_new, responsibility ``` 2. Use the Kabsch algorithm: Instead of using the weighted least squares solution to update the affine matrix, we can use the Kabsch algorithm, which is a more efficient and numerically stable method for finding the optimal rigid transformation between two point clouds. The Kabsch algorithm consists of three steps: centering the point clouds, computing the covariance matrix, and finding the optimal rotation matrix. ```python def em_for_alignment(xs: np.ndarray, ys: np.ndarray, num_iter: int = 10) -> Tuple[np.ndarray, np.ndarray]: """ The em algorithm for aligning two point clouds based on affine transformation :param xs: a set of points with size (N, D), N is the number of samples, D is the dimension of points :param ys: a set of points with size (M, D), M is the number of samples, D is the dimension of points :param num_iter: the number of EM iterations :return: ys_new: the aligned points: ys_new = ys @ affine + translation responsibility: the responsibility matrix P=[p(y_m | x_n)] with size (N, M), whose elements indicating the correspondence between the points """ # center the point clouds xs_centered = xs - np.mean(xs, axis=0) ys_centered = ys - np.mean(ys, axis=0) # initialize the affine matrix and translation vector affine = np.eye(xs.shape[1]) translation = np.zeros(xs.shape[1]) # initialize the responsibility matrix responsibility = np.zeros((xs.shape[0], ys.shape[0])) for i in range(num_iter): # E-step: compute the responsibility matrix diff = xs_centered[:, np.newaxis, :] - ys_centered[np.newaxis, :, :] sq_dist = np.sum(diff ** 2, axis=-1) responsibility = np.exp(-0.5 * sq_dist) / (2 * np.pi) ** (xs.shape[1] / 2) responsibility /= np.sum(responsibility, axis=1, keepdims=True) # M-step: update the affine matrix and translation vector cov = xs_centered.T @ responsibility @ ys_centered u, _, vh = np.linalg.svd(cov) r = vh.T @ u.T t = np.mean(ys, axis=0) - np.mean(xs @ r, axis=0) affine = np.hstack((r, t[:, np.newaxis])) # compute the aligned points ys_new = ys @ affine[:, :-1] + affine[:, -1] return ys_new, responsibility ``` The Kabsch algorithm is more efficient than the weighted least squares solution, especially when the point clouds are high-dimensional or noisy. However, it only works for rigid transformations, i.e., rotations and translations. If the transformation between the point clouds is not rigid, we need to use a more general method, such as the Procrustes analysis or the Iterative Closest Point (ICP) algorithm.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值