Cuda stream programming 是一种优化GPU效率的有效手段,具体有下面6点作用:
1. Stream programming (pipeline) is a useful parallel pattern.
2. Data transfer from host to device is a major performance bottleneck in GPU programming3. CUDA provides support for asynchronous data transfer and kernel executions.
4. A stream is simply a sequence of operations that are performed in order on the device.
5. Allow concurrent execution of kernels.
6. Maximum number of concurrent kernel calls to be launched is 16.