I am trying to understand how the GPUs work currently.
I saw that Apple M2 Max has for example 30 GPU cores. I was surprised because I heard that GPU have hundreds of cores. So I made a bit research and I got my answer:
on Apple silicon each core is made up of 16 execution units, which each have 8 distinct compute units (ALU)
That makes more sense now. 30 cores is actually 30*16*8=3840 execution units.
But why separate in “cores” like this then ? I heard that GPUs, conversely to GPUs can have all their units corking on the SAME task.
why separate in 16 then in 8 rather than a full grid ? I don’t get it.
Does it have implications or is it just pure marketing ?
How a GPU is broken down usually determines how other things are shared between those ALUs. I’ll use ARC Alchemist for this because I have the spec sheet for it.
The A770 is broken down into 32 Xe Cores. This means it has 4096 shading units, 256 TMUs, 128 ROPs, 512 Execution Units, 512 Tensor Cores, 32 RT Cores, and 16MB of L2 cache.
You can also think of this as each Xe Core being made of 128 shading units, 8 TMUs, 4 ROPs, 16 Execution Units, 16 Tensor Cores, 1 RT Cores, and 512KB of L2 cache.
That Xe Core is the smallest unit you could break an Alchemist GPU into and still have every part of the larger whole. You can’t literally just do that, cut the GPU in half, but if I had to draw a diagram of one that is what would be in each Xe Core block.
I’m not going to get into the technical side of how GPU design works, partly because that’s an entire doctorate thesis to write out, and also because I work on the CPU side and those guys are wizards to me.