Cuda

Warp Execution

All threads of a warp are executed by the SIMD hardware as a bundle, where the same instruction is run for all threads.

Warp is the unit of thread scheduling in SMs.

when is it good?

When all threads within a warp follow the same control flow.

For example, for an if-else construct, the execution works well when either all threads execute the if part or all execute the else part.

when is it bad?

When threads within a warp take different control flow paths, the SIMD hardware will take multiple passes through these divergent paths. During each pass, the threads that follow the other path are not allowed to take effect.

These passes are sequential to each other, thus they will add to the execution time.

Table of Contents