spinlock internal

本文深入探讨了Spinlock及Mutex锁的实现原理,详细解释了两种主要的Spinlock实现方式:test_and_set指令和MCS队列锁,并通过具体的代码示例说明了它们的工作流程。

Spinlock is used in cpu synchronization for shared memory in multiprocessor environment.  when one processor lock the data bus it will access to a shared variable called spinlock_t and compare the spinlock_t with a predefined value representing the lock is locked. if they are not equal then set the spinlock_t as the value to demonstrate that it is locked and if they are not equal then loop the procedure until the spinlock_t is not locked(spin on spinlock_t). the procedure must be supported by hardware primitives such as lock instruction.

pthread is a famous multithread development library and it implements its mutual-exclusion lock called mutex. the mutex is also implmented with spinlock. let's have a look at how the mutex is implemented in pthread library.

//code snippet extracted from spinlock.c in pthread library
struct _pthread_fastlock
{
  long int __status;   /* "Free" or "taken" or head of waiting list */
  int __spinlock;      /* Used by compare_and_swap emulation. Also,
              adaptive SMP lock stores spin count here. */
};
void __pthread_alt_lock(struct _pthread_fastlock * lock,
                pthread_descr self)
{
#if defined HAS_COMPARE_AND_SWAP
  long oldstatus, newstatus;
#endif
  struct wait_node wait_node;

#if defined TEST_FOR_COMPARE_AND_SWAP
  if (!__pthread_has_cas)
#endif
#if !defined HAS_COMPARE_AND_SWAP || defined TEST_FOR_COMPARE_AND_SWAP
  {
    int suspend_needed = 0;
    __pthread_acquire(&lock->__spinlock);

    if (lock->__status == 0)
      lock->__status = 1;
    else {
      if (self == NULL)
    self = thread_self();

      wait_node.abandoned = 0;
      wait_node.next = (struct wait_node *) lock->__status;
      wait_node.thr = self;
      lock->__status = (long) &wait_node;
      suspend_needed = 1;
    }

    __pthread_release(&lock->__spinlock);

    if (suspend_needed)
      suspend (self);
    return;
  }
#endif

#if defined HAS_COMPARE_AND_SWAP
  do {
    oldstatus = lock->__status;
    if (oldstatus == 0) {
      newstatus = 1;
    } else {
      if (self == NULL)
    self = thread_self();
      wait_node.thr = self;
      newstatus = (long) &wait_node;
    }
    wait_node.abandoned = 0;
    wait_node.next = (struct wait_node *) oldstatus;
    /* Make sure the store in wait_node.next completes before performing
       the compare-and-swap */
    MEMORY_BARRIER();
  } while(! __compare_and_swap(&lock->__status, oldstatus, newstatus));

  /* Suspend. Note that unlike in __pthread_lock, we don't worry
     here about spurious wakeup. That's because this lock is not
     used in situations where that can happen; the restart can
     only come from the previous lock owner. */

  if (oldstatus != 0)
    suspend(self);

  READ_MEMORY_BARRIER();
#endif
}
From the above code we can know that there are two methods to implement spinlock in the pthread. one is test_and_set and another one is MCS list-based queuing lock

Let us check the first one

//in pthread_mutex_lock, pthread will call function "testandset" to acquire spin lock
static void __pthread_acquire(int * spinlock)
{
  int cnt = 0;
  struct timespec tm;

  READ_MEMORY_BARRIER();

//it will always spin on spinlock and there will be hundreds of write here
//, which is a very expensive operation.
  while (testandset(spinlock)) { 
    if (cnt < MAX_SPIN_COUNT) {
      sched_yield();
      cnt++;
    } else {
     tm.tv_sec = 0;
     tm.tv_nsec = SPIN_SLEEP_DURATION;
     nanosleep(&tm, NULL);
     cnt = 0;
    }
  }
}

//In x86 architecture:
//"XCHG exchanges two operands. The operands can be in either order. 
//If a memory operand is involved, BUS LOCK is asserted for the duration of the exchange,
//regardless of the presence or absence of the LOCK prefix or of the value of the IOPL." 
//(extracted from 80386 programmer's reference manual.)

PT_EI long int
testandset (int *spinlock)
{
  long int ret;

  __asm__ __volatile__(
      /*swap %0 and %1. if spinlock = 1 then ret =  1.
         if spinlock=0 then ret =0;*/
       "xchgl %0, %1"
       : "=r"(ret), "=m"(*spinlock)
       : "0"(1), "m"(*spinlock)
       : "memory");

      /*return 0 means lock has not been taken and we can acquire it. 
         otherwise it means lock has been taken.*/  
      return ret;
}

//in power architecture:
/*
lwarx and stwcx.atomic update of shared storage. //load and reservation and store conditional.
lwarx load a word and create a reservation.
stwcx. store a word if reservation created by lwarx is still valid.
one processor can only have one reservation
*/

//typical compare ans swap emulation.
PT_EI long int
testandset (int *p)
{
  long int ret, val = 1;
  MEMORY_BARRIER ();

  __asm__ __volatile__ (
       "0:    lwarx %0,0,%1 ;"//load and reserve, load value *p into register %0
       "      cmpwi  0,%0,0;"//compare register %0 with 0
       "      bne 1f;"//if not equal then jump to 1:. this command means lock has been taken
       "      stwcx. %2,0,%1;"//if equal then it means lock has not been taken then store 1 into *p if reservation is still valid
       "      bne- 0b;"//if stwcx. stores succeeds, then EQ bit in cr0 is 1.
                       //if store fails then EQ bit in CR0 is 0 then it jumps to 0: 
                      //and continue to try to acquire lock.
       "1:    "
    : "=&r"(ret)
    : "r"(p), "r" (val)
    : "cr0", "memory");//cr0 will be changed by stwcw. and we need reload ret so memory must be specified in the clobbered list.
  MEMORY_BARRIER ();
  return ret != 0;
}

finally, all the lock operations must supported by hardware lock.Test_and_set is a typical compare and swap emulation. lock the data bus and compare shared variable to a value to test whethere it is locked. if locked then return false and otherwise return true. it will always write 1 to variable "spinlock". (at least it is in the x86 platform).in power arch, it will write spinlock always if it is locked already.

The problem in test_and_set lies in that it will always write spinlock even though only one processor can acquire the lock. in the cache coherent architecture it will cause invalidation of whole cache line which has a bad damage on performance.

The second implementation is MCS list-based queuing lock. it is proposed in the paper "algorithm for scalable synchronization in shared memory multiprocessor"

void __pthread_alt_lock(struct _pthread_fastlock * lock,
                pthread_descr self)
{
#if defined HAS_COMPARE_AND_SWAP
  long oldstatus, newstatus;
#endif
  struct wait_node wait_node;

#if defined HAS_COMPARE_AND_SWAP
  do {
    oldstatus = lock->__status;
    if (oldstatus == 0) {
      newstatus = 1;
    } else {
      if (self == NULL)
    self = thread_self();
      wait_node.thr = self;
      newstatus = (long) &wait_node;
    }
    wait_node.abandoned = 0;
    wait_node.next = (struct wait_node *) oldstatus;
    /* Make sure the store in wait_node.next completes before performing
       the compare-and-swap */
    MEMORY_BARRIER();
  } while(! __compare_and_swap(&lock->__status, oldstatus, newstatus));        
  //here for thread that can get lock lock->_status will be set 1.
  // for those threads that cannot get lock lock->_status will be set address of wait_node.
  //spin on local oldstatus variable, which is very cheap.

  /* Suspend. Note that unlike in __pthread_lock, we don't worry
  here about spurious wakeup. That's because this lock is not
  used in situations where that can happen; the restart can
  only come from the previous lock owner. */

  if (oldstatus != 0)
    suspend(self);//if oldstatus != 0 it means that the thread does not acquire  
                  //the lock successfully and has to suspend itself to wait for the lock. 
  READ_MEMORY_BARRIER();
#endif
}
/*in function __pthread_alt_lock which is called by pthread_mutex_lock will try to get lock if the lock has not been taken 
or put the current thread into waiting list and suspend the thread with thread signal(pthread_sigsuspend).*/

/*in function __pthread_alt_unlock,it will release the lock and wake up the thread with highest priority in the waiting list.
(try it: all the thread in the waiting list will be removed ? )
the trick here is it use lock->_status to store the address of wait node for the thread that cannot get lock and has to suspend.*/

//function "__compare_and_swap" in __pthread_alt_lock in ppc architecture:
PT_EI int
__compare_and_swap (long int *p, long int oldval, long int newval)
{
  long int ret;

  __asm__ __volatile__ (
       "0:    ldarx %0,0,%1 ;"
       "      xor. %0,%3,%0;"    //here it uses xor for testing whether *p == oldval (always succeed)
       "      bne 1f;"
       "      stdcx. %2,0,%1;"
       "      bne- 0b;"
       "1:    "
    : "=&r"(ret)
    : "r"(p), "r"(newval), "r"(oldval)
    : "cr0", "memory");
  /* This version of __compare_and_swap is to be used when acquiring
     a lock, so we don't need to worry about whether other memory
     operations have completed, but we do need to be sure that any loads
     after this point really occur after we have acquired the lock.  */
  __asm__ __volatile__ ("isync" : : : "memory");
  return (int)(ret == 0);
}

//i think __compare_and_swap in i386 has to perform twice each loop due to 
//it has to set old to eax when old is not equal to eax.
PT_EI int
__compare_and_swap (long int *p, long int oldval, long int newval)
{
  char ret;
  long int readval;

  __asm__ __volatile__ ("lock; cmpxchgl %3, %1; sete %0"
            : "=q" (ret), "=m" (*p), "=a" (readval)
            : "r" (newval), "m" (*p), "a" (oldval)
            : "memory");
  return ret;
}

In this method, algorithm will put all the processor waiting for the lock into queue in FIFO order. so that algorithm does not have to always write variable "spinlock". it just write the thread node into the queue and lock->status will hold the head node of the queue.

So on average, algorithm will just have to call compare_and_swap only once and write the thread node address into the queue and suspend thread itself to wait for the lock. this method can improve the efficiency quite lot and also serve the requests in FIFO order.

All the method above must be supported by the hardware atomic operations, including "lock" in x86 and "ldarx", "stdcx" in powerpc. algorithm optimizatoin majorly lies in how to reduce the times of write and hardware lock operation.


reference:

1. http://www.cs.ucla.edu/~kohler/class/04f-aos/l14.txt (really good notes)

2. Paper: algorithm for scalable synchronization in shared memory multiprocessor

3. pthread library source code: http://ftp.gnu.org/gnu/glibc/


### 光流法C++源代码解析与应用 #### 光流法原理 光流法是一种在计算机视觉领域中用于追踪视频序列中运动物体的方法。它基于亮度不变性假设,即场景中的点在时间上保持相同的灰度值,从而通过分析连续帧之间的像素变化来估计运动方向和速度。在数学上,光流场可以表示为像素位置和时间的一阶导数,即Ex、Ey(空间梯度)和Et(时间梯度),它们共同构成光流方程的基础。 #### C++实现细节 在给定的C++源代码片段中,`calculate`函数负责计算光流场。该函数接收一个图像缓冲区`buf`作为输入,并初始化了几个关键变量:`Ex`、`Ey`和`Et`分别代表沿x轴、y轴和时间轴的像素强度变化;`gray1`和`gray2`用于存储当前帧和前一帧的平均灰度值;`u`则表示计算出的光流矢量大小。 #### 图像处理流程 1. **初始化和预处理**:`memset`函数被用来清零`opticalflow`数组,它将保存计算出的光流数据。同时,`output`数组被填充为白色,这通常用于可视化结果。 2. **灰度计算**:对每一像素点进行处理,计算其灰度值。这里采用的是RGB通道平均值的计算方法,将每个像素的R、G、B值相加后除以3,得到一个近似灰度值。此步骤确保了计算过程的鲁棒性和效率。 3. **光流向量计算**:通过比较当前帧和前一帧的灰度值,计算出每个像素点的Ex、Ey和Et值。这里值得注意的是,光流向量的大小`u`是通过`Et`除以`sqrt(Ex^2 + Ey^2)`得到的,再乘以10进行量化处理,以减少计算复杂度。 4. **结果存储与阈值处理**:计算出的光流值被存储在`opticalflow`数组中。如果`u`的绝对值超过10,则认为该点存在显著运动,因此在`output`数组中将对应位置标记为黑色,形成运动区域的可视化效果。 5. **状态更新**:通过`memcpy`函数将当前帧复制到`prevframe`中,为下一次迭代做准备。 #### 扩展应用:Lukas-Kanade算法 除了上述基础的光流计算外,代码还提到了Lukas-Kanade算法的应用。这是一种更高级的光流计算方法,能够提供更精确的运动估计。在`ImgOpticalFlow`函数中,通过调用`cvCalcOpticalFlowLK`函数实现了这一算法,该函数接受前一帧和当前帧的灰度图,以及窗口大小等参数,返回像素级别的光流场信息。 在实际应用中,光流法常用于目标跟踪、运动检测、视频压缩等领域。通过深入理解和优化光流算法,可以进一步提升视频分析的准确性和实时性能。 光流法及其C++实现是计算机视觉领域的一个重要组成部分,通过对连续帧间像素变化的精细分析,能够有效捕捉和理解动态场景中的运动信息
微信小程序作为腾讯推出的一种轻型应用形式,因其便捷性与高效性,已广泛应用于日常生活中。以下为该平台的主要特性及配套资源说明: 特性方面: 操作便捷,即开即用:用户通过微信内搜索或扫描二维码即可直接使用,无需额外下载安装,减少了对手机存储空间的占用,也简化了使用流程。 多端兼容,统一开发:该平台支持在多种操作系统与设备上运行,开发者无需针对不同平台进行重复适配,可在一个统一的环境中完成开发工作。 功能丰富,接口完善:平台提供了多样化的API接口,便于开发者实现如支付功能、用户身份验证及消息通知等多样化需求。 社交整合,传播高效:小程序深度嵌入微信生态,能有效利用社交关系链,促进用户之间的互动与传播。 开发成本低,周期短:相比传统应用程序,小程序的开发投入更少,开发周期更短,有助于企业快速实现产品上线。 资源内容: “微信小程序-项目源码-原生开发框架-含效果截图示例”这一资料包,提供了完整的项目源码,并基于原生开发方式构建,确保了代码的稳定性与可维护性。内容涵盖项目结构、页面设计、功能模块等关键部分,配有详细说明与注释,便于使用者迅速理解并掌握开发方法。此外,还附有多个实际运行效果的截图,帮助用户直观了解功能实现情况,评估其在实际应用中的表现与价值。该资源适用于前端开发人员、技术爱好者及希望拓展业务的机构,具有较高的参考与使用价值。欢迎查阅,助力小程序开发实践。资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值