@jcm2606

jcm2606@alien.top · 11 months ago

Simultaneous multithreading

For what it’s worth, GPUs are kind of getting SMT now with the advent of dedicated asynchronous queues for compute and memory transfers. Modern GPUs are able to draw primitives, run compute shaders and perform DMA transfers from CPU to GPU at the same time, with the first two even using the same hardware. Of course you need to be careful with where you do these as there can be hardware resource conflicts, but you can do something kind of like SMT to make use of otherwise idle hardware while the GPU is doing other work.

jcm2606@alien.top · 11 months ago

It’s the GPU itself. GPUs work by grouping together multiple threads into a single thread group, which NVIDIA calls a warp and AMD/Intel call a wave. Every thread belonging to a given warp/wave has to take the exact same path through the code, so if you have heavily divergent branches where some threads in a warp/wave take one path and other threads take another path then that can kill performance as the code needs to be executed multiple times to cover all paths.
GPUs typically don’t operate on strings. Strings aren’t even a supported data type in most GPU-oriented languages or frameworks, especially graphics. If you need to operate on strings on a GPU then you typically break the strings up into individual characters and treat each character as a simple integer, which is totally fine so long as you’re able to wrap your head around dealing with strings at such a low level.

jcm2606@alien.top · 1 year ago

No problem. I left out some of the more complicated details and simplified others so if you want to learn more I’d recommend looking into how Vulkan’s command buffers, device queues, fence/semaphore resources work which are all part of the logical side of the render queue, as well as how Vulkan’s swapchain works for the frame presentation side of the render queue. Vulkan and DirectX 12 both expose quite a lot of how the render queue works so they can shed some light on what the driver is having to do behind the scenes for DirectX 11 and OpenGL.

jcm2606@alien.top · 1 year ago

What do you mean by “frame may not be available” for CPU? I assumed CPU creates frames. And then “CPU cannot use that frame”. Did you mean to say “frame buffer”?

I meant the render queue, of which the framebuffer/swapchain is part of.

What do you mean by “frame’s resources”?

In this case I mean GPU resources that the CPU may need to access. Think uniform buffers that pipe game state information to the shaders, textures that hold animations that update each frame, vertex/index buffers that hold mesh data that updates each frame, etc. Each frame typically has to be given its own set of these resources so that when the CPU updating the resources for frame N doesn’t change or potentially corrupt the resources that the GPU is actively using for frame N-1.

Isn’t “the wall” render queue limit typically?

Yes and no, depends on how well the CPU and GPU stay in sync with each other.

I guess mailbox presentation mode is LIFO-queued triple buffering. What you described sound like CPU is filling frame buffers with some data that might or might not be later used by GPU, but I assumed it’s GPU that creates and fills frame buffers with data. Are you sure it has anything to do with CPU’s job?

Yes, since it basically lets the CPU bounce between two available/present-pending frames while it waits for a currently-rendering frame to clear. This way the CPU never sits idle, it’s just constantly overwriting previously recorded command lists and previously updated resources that haven’t been picked up by the GPU yet.

In unlocked framerate with no VSync scenario, when GPU is at 99% usage - in most games CPU usage reduces, as render queue is full. It, however, is not the case for some games, like NFS Undercover. How specifically does this process happen in such scenario, or what tells CPU to wait instead of drawing more frames?

Normally it’s an API/system call that tells the render queue to present the current frame and swap to the next frame that tells the CPU to wait. In older APIs it’s a lot more nebulous so I can’t tell you exactly why NFS Undercover does that, but my guess would be that the CPU and GPU are close enough to not exhaust the render queue quickly or the API is detecting that some usage pattern lets the CPU access in-use resources by the GPU in some places in the pipeline.

jcm2606@alien.top · 1 year ago

It’s more complicated than that. Yes, the physical pipeline ends at the GPU since the frame just sits in the GPU until the OS is ready to present it, but the logical pipeline loops back to the CPU since the CPU then moves on to the next frame in the render queue which may or may not be available. Ideally it would simply be available as the GPU has finished rendering that frame and the OS has finished presenting that frame which gives the CPU free reign over it, but it may be in a present-pending state where it’s waiting for the OS to present it or it may be in a currently-rendering state where the GPU is actively rendering it.

If the frame is in a currently-rendering state then the CPU cannot use that frame since that frame’s resources are being actively used by the GPU and trying to access those resources leads to a very bad time, so the CPU has to try another frame. If the frame is in a present-pending state then the CPU can use it so long as vsync is disabled and screen tearing is acceptable, as that frame’s resources aren’t being actively used anymore and the OS generally allows reusing a present-pending frame (after all, that’s why vsync is typically an option and not mandatory).

If the CPU is sufficiently far ahead of the GPU then it will always eventually hit a wall where it tries to use a currently-rendering frame, has no other frames it can use and is forced to sit idle. If you’re on newer APIs such as Vulkan or DirectX 12 then you can bypass this somewhat by using the mailbox presentation mode (not sure what the name is under DirectX 12, but that’s the name under Vulkan) to at least tell the OS that you intend on ping-ponging between two different frames in a triple-buffer setup, which lets the CPU ping-pong between those two frames while the GPU is busy rendering its currently-rendering frame. Things get exponentially more complicated under DirectX 12 and Vulkan, however, as the engine itself is now responsible for building and managing the render queue, the API/driver/OS just handles the presentation side of things.