- It’s the GPU itself. GPUs work by grouping together multiple threads into a single thread group, which NVIDIA calls a warp and AMD/Intel call a wave. Every thread belonging to a given warp/wave has to take the exact same path through the code, so if you have heavily divergent branches where some threads in a warp/wave take one path and other threads take another path then that can kill performance as the code needs to be executed multiple times to cover all paths.
- GPUs typically don’t operate on strings. Strings aren’t even a supported data type in most GPU-oriented languages or frameworks, especially graphics. If you need to operate on strings on a GPU then you typically break the strings up into individual characters and treat each character as a simple integer, which is totally fine so long as you’re able to wrap your head around dealing with strings at such a low level.
For what it’s worth, GPUs are kind of getting SMT now with the advent of dedicated asynchronous queues for compute and memory transfers. Modern GPUs are able to draw primitives, run compute shaders and perform DMA transfers from CPU to GPU at the same time, with the first two even using the same hardware. Of course you need to be careful with where you do these as there can be hardware resource conflicts, but you can do something kind of like SMT to make use of otherwise idle hardware while the GPU is doing other work.