cuda编程之 __syncthreads()

__syncthreads( ) 对一个thread block中的线程进行同步。

B.6. Synchronization Functions

void __syncthreads();

waits until all threads in the thread block have reached this point and all global andshared memory accesses made by these threads prior to __syncthreads() are visibleto all threads in the block.

__syncthreads() is used to coordinate communication between the threads of thesame block. When some threads within a block access the same addresses in shared
or global memory, there are potential read-after-write, write-after-read, or write-after-write hazards for some of these memory accesses. These data hazards can be avoided bysynchronizing threads in-between these accesses.

__syncthreads() is allowed in conditional code but only if the conditional evaluatesidentically across the entire thread block, otherwise the code execution is likely to hangor produce unintended side effects.

Devices of compute capability 2.x and higher support three variations of__syncthreads() described below.
int __syncthreads_count(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicatefor all threads of the block and returns the number of threads for which predicateevaluates to non-zero.
int __syncthreads_and(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicatefor all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for all of them.
int __syncthreads_or(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicatefor all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for any of them.

 void __syncwarp(unsigned mask=0xffffffff);

will cause the executing thread to wait until all warp lanes named in mask haveexecuted a __syncwarp() (with the same mask) before resuming execution. All non-exited threads named in mask must execute a corresponding __syncwarp() with thesame mask, or the result is undefined.

Executing __syncwarp() guarantees memory ordering among threads participating inthe barrier. Thus, threads within a warp that wish to communicate via memory can storeto memory, execute __syncwarp(), and then safely read values stored by other threadsin the warp. 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值