Tips:
1. CUDA memory for lookup tables:
It may be best not to use any tables on the GPU at all (see also CUDA math library), as FLOPS are increasing faster than memory bandwidth across GPU generations.
Tips:
1. CUDA memory for lookup tables:
It may be best not to use any tables on the GPU at all (see also CUDA math library), as FLOPS are increasing faster than memory bandwidth across GPU generations.