OSv—Optimizing the Operating System for Virtual Machines 2

OSv的核心是用c++ 11写的,包括OSv loader,dynamic linker,memory management, thread scheduler, synchronization机制,虚拟硬件驱动。
OSv的小部分传统的驱动是有hypervisor是实现的(VGA, SATA...),自己又编写的部分驱动提高性能(clock, NIC,block)
OSv支持虚拟化文件系统,但采用ZFS作为其主要的文件系统
OSv重写的内核网络协议栈。
接下来部分主要讲的是: 
1:OSv的内存管理
2:OSv是如何且为什么要完全避免spinlock
3:OSv网络协议栈的设计
4:OSv线程调度, lock-free algorithms, floating-point based fair accounting of run-time




1:memory management
 传统的OS使用扁平化的物理内存映射, 但是OSv使用的是虚拟内存, OSv要求内存的与页的映射有mmap API提供,
 为了提供足够大的内存映射, OSv映射的大页(2M),这个尺寸会提高应用程序的性能,减少TLB miss,
 当如果部分page需要变的更小,已经映射的内存,部分可以取消映射,OSv的页不支持换出(swap),而是提供另外的内存管理机制见section 3.


2.2: No Spinlocks
在SMP体系结构中通过spin-lock来控制多个cpu访问data并发现,目前所有的虚拟机也支持SMP体系,所以也使用spin-lock
SMP体系中当一个cpu请求原子操作的时候其他cpu就忙等待,知道该cpu释放资源, SMP OS中使用spin-lock来实现更高层次的锁机制比如sleeping mutexs,
也直接使用spin-lock在sleeping禁止的情况,比如kernel的调度中断处理上下文。
当然spin-locks在SMP的物理体系中很合适,但是我们考虑下虚拟环境下,spin-lock会有一个非常重要的缺点“lock-holder preemption”
在host os情况下物理cpu一直运行的如果OS需要cpu的话, 但是虚拟cpu会造成“pause”在未知的时间及未知的情况, 比如当hypervisor退出,或者hypervisor运行其他guest或者hypervisor就运行在这个cpu上
如果虚拟环境下cpu paused当持有一个spin-lock,如果其他cpu如果想用相同cpu也必须等待,这个浪费cpu时间。
在mutex用spin-lock实现的情况下,当一个线程发生等待的时候回立即转入sleep,让其他线程运行,这种情况就是“lock-holder preemption”,在多个guest下回造成极大的消耗,极端的情况这个线程一直无法获取cpu资源
当然这个在host环境下并不会发生。因为guest os里面的cpu并不完全属于自己。
目前来说可以让hypervisor解决这个问题,但是如果从内核设计来说应该完全避免这种情况的发生。 OSv根本没有spin-lock,并没有放弃基本的lock或者严格限制在一个处理器的环境。


OSv使用为每个cpu设置队列及lock算法达到lock-free
OSv使用普通线程执行所有事情,中断处理只是换新一个线程服务中断设备, kernel的代码在普通的线程上也像个普通的app,OSv的设计重点主要减少上下切换的时间,从而提高效率。
OSv的lock-free的mutex是通过数据结构实现的, OSv lock-free的mutex: 一个cpu的paused不会造成其他cpu的spinning。因此kernel和app code避免了lock-holder preemption问题。
最后,每个调度程序程序有自己的运行队列,所以大部分调度只是针对本地的cpu,所以并不需要锁保护,当调度程序需要跨cpu的时候就需要lock-free算法了,比如唤醒一个属于不同cpu的线程。


2.3 network channels
每个云操作系统必须提供高可用的TCP/IP协议栈, OSv实现的是依据Van Jacobson’s net channelideas [10],





































































Here are some possible ways to optimize the previous code: 1. Vectorize the calculations: Instead of using nested loops to compute the responsibility matrix, we can use vectorized operations to speed up the computation. For example, we can use broadcasting to compute the Euclidean distance between each pair of points in a matrix form. Similarly, we can use matrix multiplication to compute the weighted sums of the point clouds. ```python def em_for_alignment(xs: np.ndarray, ys: np.ndarray, num_iter: int = 10) -> Tuple[np.ndarray, np.ndarray]: """ The em algorithm for aligning two point clouds based on affine transformation :param xs: a set of points with size (N, D), N is the number of samples, D is the dimension of points :param ys: a set of points with size (M, D), M is the number of samples, D is the dimension of points :param num_iter: the number of EM iterations :return: ys_new: the aligned points: ys_new = ys @ affine + translation responsibility: the responsibility matrix P=[p(y_m | x_n)] with size (N, M), whose elements indicating the correspondence between the points """ # initialize the affine matrix and translation vector affine = np.eye(xs.shape[1]) translation = np.zeros(xs.shape[1]) # initialize the responsibility matrix responsibility = np.zeros((xs.shape[0], ys.shape[0])) for i in range(num_iter): # E-step: compute the responsibility matrix diff = xs[:, np.newaxis, :] - ys[np.newaxis, :, :] sq_dist = np.sum(diff ** 2, axis=-1) responsibility = np.exp(-0.5 * sq_dist) / (2 * np.pi) ** (xs.shape[1] / 2) responsibility /= np.sum(responsibility, axis=1, keepdims=True) # M-step: update the affine matrix and translation vector xs_weighted = responsibility.T @ xs ys_weighted = responsibility.T @ ys affine, _, _, _ = np.linalg.lstsq(xs_weighted, ys_weighted, rcond=None) translation = np.mean(ys, axis=0) - np.mean(xs @ affine, axis=0) # compute the aligned points ys_new = ys @ affine + translation return ys_new, responsibility ``` 2. Use the Kabsch algorithm: Instead of using the weighted least squares solution to update the affine matrix, we can use the Kabsch algorithm, which is a more efficient and numerically stable method for finding the optimal rigid transformation between two point clouds. The Kabsch algorithm consists of three steps: centering the point clouds, computing the covariance matrix, and finding the optimal rotation matrix. ```python def em_for_alignment(xs: np.ndarray, ys: np.ndarray, num_iter: int = 10) -> Tuple[np.ndarray, np.ndarray]: """ The em algorithm for aligning two point clouds based on affine transformation :param xs: a set of points with size (N, D), N is the number of samples, D is the dimension of points :param ys: a set of points with size (M, D), M is the number of samples, D is the dimension of points :param num_iter: the number of EM iterations :return: ys_new: the aligned points: ys_new = ys @ affine + translation responsibility: the responsibility matrix P=[p(y_m | x_n)] with size (N, M), whose elements indicating the correspondence between the points """ # center the point clouds xs_centered = xs - np.mean(xs, axis=0) ys_centered = ys - np.mean(ys, axis=0) # initialize the affine matrix and translation vector affine = np.eye(xs.shape[1]) translation = np.zeros(xs.shape[1]) # initialize the responsibility matrix responsibility = np.zeros((xs.shape[0], ys.shape[0])) for i in range(num_iter): # E-step: compute the responsibility matrix diff = xs_centered[:, np.newaxis, :] - ys_centered[np.newaxis, :, :] sq_dist = np.sum(diff ** 2, axis=-1) responsibility = np.exp(-0.5 * sq_dist) / (2 * np.pi) ** (xs.shape[1] / 2) responsibility /= np.sum(responsibility, axis=1, keepdims=True) # M-step: update the affine matrix and translation vector cov = xs_centered.T @ responsibility @ ys_centered u, _, vh = np.linalg.svd(cov) r = vh.T @ u.T t = np.mean(ys, axis=0) - np.mean(xs @ r, axis=0) affine = np.hstack((r, t[:, np.newaxis])) # compute the aligned points ys_new = ys @ affine[:, :-1] + affine[:, -1] return ys_new, responsibility ``` The Kabsch algorithm is more efficient than the weighted least squares solution, especially when the point clouds are high-dimensional or noisy. However, it only works for rigid transformations, i.e., rotations and translations. If the transformation between the point clouds is not rigid, we need to use a more general method, such as the Procrustes analysis or the Iterative Closest Point (ICP) algorithm.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值