cuda - shared memory and constants -
is there benefit storing constant value in shared memory? example:
a[tid] = constant * b[tid]
,
where , b arrays, constant constant value e.g. 4. , tid thread index (array element = single thread).
every thread has read value constant
, shared memory should useful, right?
how think works: reading global memory consume lot of time, read constatnt value global memory once shared memory , threads can read fast. since there many threads (constant value has read many times) shared memory should speed up.
some cpu instruction sets, such x86, support storing full sized constants operands interleaved opcodes themselves. in case, constants read in rest of stream of instructions cpu running , seems unlikely storing them anywhere else can faster.
other architectures, such arm, support storing small constants , shift values within opcodes. constants typically needed in program can represented small constant plus shift value , can therefore stored directly within opcodes.
i don't know if sass (the native instruction set nvidia gpus) supports such "embedded" constants.
consider though, if store constant in shared memory, need reference constant , reference constant or derived constant (such base address).
also, there cache values designated constants. can take advantage of cache setting values in constant memory before calling kernel.
further, consider overhead of setting constant in shared memory in first place. values in shared memory can shared within threads in block, each block have set constant again. because threads run in groups of 32, called warps, kernel tie 32 threads in setting constant, each time processing started on new block.
to conclude, think it's best let compiler handle single constants such 1 in example, , use constant memory storing constant arrays may have.
Comments
Post a Comment