early z optimization

本文介绍了NVIDIA GeForce FX及后续系列显卡中的一种高级优化技术——ZCull(早期Z优化)。该技术能在像素着色器处理前剔除不可见像素,显著减少不必要的计算,提高渲染效率。文章详细解释了ZCull的工作原理、硬件实现方式以及如何通过双速深度模板渲染来充分利用这一特性。

这个是nvidia‘s GPU Programming Guide,3.6节 中的内容。

就是在geforce fx,以及6以上的系列里面,加了early z optimization或者也叫zcull的特性。

正常的depth test是在pixel shader之后的pixel rendering阶段,即便一个fragment被砍掉,仍旧耗费了pixel shader的计算时间。

这个zcull是放在光栅化之后和pixel shader之前的位置,如果这个fragment没有通过zcull 的test,那么就被砍掉,不进入pixel shader进行计算。

zcull的硬件实现一般也是有一个memory与传统的depth/stencil memory绑定,当一个fragment从pixel shader中出来并且通过了depth/stencil test,这个fragment就会进一步做一个zcull的test,如果通过了就更新zcull memory。这个更新memory的test和pixel shader之前的那个test不是一个含义的。

而这个zcull memory也就是用来在pixel shader之前砍fragment的依据。

如果是我们通过SetRenderTarget换了depth/stencil memory,就是用了没有被zcull 绑定的depth buffer,zcull就不起作用而且也不会更新。

---------------------------------------------------------------------------------------------------------------------------------------------------

使用zcull时候最好先构建一个depth buffer。

这里就用到double-speed-depth/stencil-render

在禁掉color buffer写入,alpha test, user clip,multi-sampling,texkill, color key的情况下,render会变得非常快。

由于实际操作的时候,做double speed z only render都省去lighting, texture sampling等等,这个速度非常快。

然后用这个结果来做zcull,可以省去大量的计算。

另外说一点,这个double-speed-z-only-render用来清depth buffer非常好,某些情况下比硬件提供的clear_depth_buffer函数还快。

---------------------------------------------------------------------------------------------------------------------------------------------------

这种剔除优化已经比较生猛了,可以大幅度弥补cpu端对可见性判断优化不足的特点,节省大量计算。

cool feature

def model_train(batchsize,channel_SNR,noise_init,nl_factor,eq_flag,epsilon,earlystop_epoch):## start training # constellation annimation fig1, ax1 = plt.subplots(figsize=(4,5)) Epoch_list = [] Loss_list = [] min_loss = 1e10 max_GMI = 0 patience = 0 # initialize the noise noise_exp = noise_init[np.newaxis,:] GS_flag_len = len(GS_flag) iter_per_GS_flag = iterations//GS_flag_len GS_flag_index = 0 GS_flag_now = GS_flag[GS_flag_index] PS_flag_now = PS_flag[GS_flag_index] # create two saving figures fig=plt.figure() ax_1 = fig.add_subplot(121) ax_2 = fig.add_subplot(122,projection='3d') for epoch in range(iterations): total_loss = 0 total_loss_Eq = 0 GMI_mean = 0 NGMI_mean = 0 if GS_flag_len>1 and (epoch+1)%iter_per_GS_flag == 0 and (GS_flag_index+1)<GS_flag_len: GS_flag_index += 1 GS_flag_now = GS_flag[GS_flag_index] PS_flag_now = PS_flag[GS_flag_index] for batch in range(0,batchsize): ## get the channel impairments from the Rx data noise_tf = tf.cast(noise_exp, dtype=dtype) ## apply the channel impairments and train the model batch_loss,batch_loss_Eq,NGMI,GMI,entropy_S,p_s,norm_constellation,x = train_step(channel_SNR,noise_tf,GS_flag_now,PS_flag_now,eq_flag,epsilon) ## save the training data x for transmission Tx = get_Tx(channel_SNR,epsilon)[-1] ## Will be substituted by the real channel impairments in the Experiment Rx = channel_ISI_NL(Tx,channel_SNR_db1,channel_FIR,nl_factor=nl_factor,ch_type=channel_type) ## after the transmission, get the Rx data and new noise noise_exp = Rx - Tx noise_exp = noise_exp[np.newaxis,:] ## calculate the loss total_loss_Eq += batch_loss_Eq total_loss += batch_loss GMI_mean += GMI NGMI_mean += NGMI total_loss = total_loss / batchsize total_loss_Eq = total_loss_Eq / batchsize GMI_mean = GMI_mean / batchsize NGMI_mean = NGMI_mean/ batchsize Epoch_list.append(epoch+1) Loss_list.append(total_loss.numpy()) if total_loss < min_loss: min_loss = total_loss manager_Loss.save(checkpoint_number=epoch+1) print('[INFO] New Loss = %.6f, Save new model named ./model_free_training_checkpoints/V6_ckpt_Loss-%d.index'%(min_loss,epoch+1)) if GMI_mean > max_GMI: patience = 0 max_GMI = GMI_mean manager_MI.save(checkpoint_number=epoch+1) print('[INFO] New GMI = %.6f, NGMI = %.6f, Save new model named ./model_free_training_checkpoints/V6_ckpt_MI-%d.index'%(max_GMI,NGMI_mean,epoch+1)) else: patience += 1 if patience > earlystop_epoch: manager_Loss.save(checkpoint_number=epoch+1) print('[INFO] Early stop at epoch %d, loss = %.6f, GMI = %.6f'%(epoch+1,total_loss,GMI_mean)) break if (epoch+1)%displayStep == 0 or epoch == 0: # np.savetxt('training_data_Tx.txt',Tx,fmt='%f') ## draw constellation X = np.real(norm_constellation) Y = np.imag(norm_constellation) ax1.scatter(X,Y,c=cons_colors) ax1.axis('square') ax1.set_title('Constellation Optimization') fig1.canvas.draw() fig1.canvas.flush_events() print('[DEBUG] epoch:{:3d}/{:2d} | loss: {:.2f} | loss_Eq: {:.4f} | GMI: {:.4f} | NGMI: {:.4f} | entropy_S: {:.4f}' .format(epoch+1, iterations, total_loss.numpy(), total_loss_Eq.numpy(), GMI_mean.numpy(), NGMI.numpy(),entropy_S)) ## plot the distribution of PS and GS X, Y=X.ravel(), Y.ravel() bottom=np.zeros_like(X) P=p_s.numpy().ravel() ## specific the colormap width=height=2/np.sqrt(M+5) dz = P offset = dz + np.abs(dz.min()) fracs = offset.astype(float)/offset.max() norm = colors.Normalize(fracs.min(), fracs.max()) color_values = plt.cm.viridis(norm(fracs.tolist())) ## draw ax_1.cla() ax_1.set_title('Constellation Tx') ax_1.scatter(X,Y,s=80) ax_1.axis('square') ax_2.cla() ax_2.bar3d(X, Y, bottom, width, height, P, shade=True, color=color_values,edgecolor='black')# ax_2.set_xlabel('X') ax_2.set_ylabel('Y') ax_2.set_zlabel('Z(value)') # ax_2.set_title('GMI = %.2f, NGMI = %.2f'%(GMI_mean.numpy(),NGMI_mean.numpy())) fig.set_size_inches(20,10) plt.savefig('./figures/32QAM/PS_distribution_epoch%d.png'%(epoch+1),dpi=300) manager_Loss.save(checkpoint_number=epoch+1) Epoch_list = np.reshape(np.stack(Epoch_list),-1) Loss_list = np.reshape(np.stack(Loss_list),-1) return Epoch_list,Loss_list 根据修改后的tran_step对应修改上述代码
最新发布
08-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值