Investigation into Nvidia GPU workloads reveals that Tensor cores are being hammered, just incredibly briefly.
…
An intrepid Reddit poster, going under the handle Bluedot55, leveraged Nvidia’s Nsight Systems GPU metric tools to drill down into the workloads running on various parts of an Nvidia RTX 4090 GPU.
Bluedot55 ran both DLSS and third party scalers on an Nvidia RTX 4090 and measured Tensor core utilisation. Looking at average Tensor core usage, the figures under DLSS were extremely low, less than 1%.
Initial investigations suggested even the peak utilisation registered in the 4-9% range, implying that while the Tensor cores were being used, they probably weren’t actually essential. However, increasing the polling rate revealed that peak utilisation is in fact in excess of 90%, but only for brief periods measured in microseconds.
This isn’t about the efficiency of one vs the other. FSR is a simple upscaling algorithm that can be implemented on any hardware and run quickly, and it’s so efficient that it runs well on 10 year old GPUs. DLSS isn’t efficient enough to do that and as the article states, it’s hammering the tensor cores when it runs. DLSS is just using dedicated hardware, which can offload some of the cost from the GPU.
DLSS will always be superior in terms of appearance, but that’s because it can be actively trained on image data from the specific game so instead so it can be customized from game to game and create a better image.