Investigation into Nvidia GPU workloads reveals that Tensor cores are being hammered, just incredibly briefly.
…
An intrepid Reddit poster, going under the handle Bluedot55, leveraged Nvidia’s Nsight Systems GPU metric tools to drill down into the workloads running on various parts of an Nvidia RTX 4090 GPU.
Bluedot55 ran both DLSS and third party scalers on an Nvidia RTX 4090 and measured Tensor core utilisation. Looking at average Tensor core usage, the figures under DLSS were extremely low, less than 1%.
Initial investigations suggested even the peak utilisation registered in the 4-9% range, implying that while the Tensor cores were being used, they probably weren’t actually essential. However, increasing the polling rate revealed that peak utilisation is in fact in excess of 90%, but only for brief periods measured in microseconds.
I’m sure this wasn’t that unknown. It’s almost free to run on the tensor cores asynchronously. While running on shaders is less efficient and uses up their time that could be, well, shading.
This is why DLSS will always be better than FSR, FSR has to be way more efficient in order to not cost more performance than it boosts.
This isn’t about the efficiency of one vs the other. FSR is a simple upscaling algorithm that can be implemented on any hardware and run quickly, and it’s so efficient that it runs well on 10 year old GPUs. DLSS isn’t efficient enough to do that and as the article states, it’s hammering the tensor cores when it runs. DLSS is just using dedicated hardware, which can offload some of the cost from the GPU.
DLSS will always be superior in terms of appearance, but that’s because it can be actively trained on image data from the specific game so instead so it can be customized from game to game and create a better image.
It was the central premise of DLSS from the start lol.
Tensor cores can be used other ways, but the entire reason they were part of RTX was to accelerate DLSS, and they told us that.
Definitely but I’m just saying it wasn’t unknown it could have probably ran on shaders.
In the sense that shaders are capable of replicating the operations, sure. But the reason for tensor cores is the same reason as for any other hardware feature. It’s obscenely faster and more efficient to do math you’ll do frequently with dedicated hardware.
I don’t know what to say. That’s what I said in my OP just more succinct.