This is just a nitpicking question. Do Intel chips still have some space/transistors dedicated to SSE3? If they do, why can’t they implement SSE3 by other, more powerful instrutions (like AVX) to save die space?
why can’t they implement SSE3 by other, more powerful instrutions (like AVX)
In short, the instruction semantics are slightly different, so they don’t do exactly the same thing. But it’s likely that the execution unit hardware is re-used for those.
Not really in most cases. The decoder might need to spend some more transistors to accommodate the instructions but that should not be much. And the very oldest never used ones can be thrown to some very slow microcode rom or something. In the execution side SSE uses the same registers as the latest AVX does. And the low level compute operations actually done by the execution units are the same. You need to understand that each instruction is actually translated to one or more micro operation by the decoder, they are not direct execution control data.
However there are some old no longer used features in x86 CPUs that do complicate the design somewhat. And there are instructions connected to those features. But that’s really not the instructions themselves using the die area. Intel’s x86s standard proposes to remove for example the middle privilege level rings and call gates from the CPUs. As well as some no longer relevant memory access modes.
It’s not the die space that’s the issue; it’s the time to validate the correct operation of those instructions with a pipeline that’s designed for something very different.
No, no CPU has seperate FPUs for SSE & AVX - it’s compiled to the same set of uOps by microcode.
Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.
No, no CPU has seperate FPUs for SSE & AVX - it’s compiled to the same set of uOps by microcode.
Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.
uOps by microcode
That’s not how it works, only a few overtly complex instructions are implemented in microcode and they are slow, most instructions use a random logic decoder.
in x86 that’s not the case, only the critical path x86 instructions are implemented directly in logic lookup tables in the decoder. Some of the less used ones are on the uCode ROM on chip. And a bunch more on PAL code on off-chip ROM. And a few of the rarest ones are on the exception manager libraries of the OS.
A big chunk of the x86 ISA is rarely used so this tiered implementation has been used at least since Nehalem if not before.
Modern x86 chips are so large that the space the decoder takes is relatively small.
It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.
The way x86 instructions are variable length and not self-synchronizing means that you can see up to 15% of your core’s power budget go to decode if you aren’t running in the small cache of decoded instructions, at least a few generations ago when last I heard. That isn’t huge but it does mean that x86 architects have to put thought into how wide to make it, they can’t just size it to make sure it’s never a bottleneck like ARM designers can.
Mate, the 90s were a few decades back. ;-)
x86 decoding hasn’t been a limiter since then.
Modern x86 chips are so large that the space the decoder takes is relatively small.
It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.
The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn’t need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.
Are there performance losses or gains through this translation?
This is incorrect. Very few x86 instructions uses microcode as the microcode engine is quite slow. It’s mainly used for things like
cpuid
and such.A lot of x86 ISA is in the micro and PAL codes. Only the most frequent and performance-limiting ones are on-core for modern x86.
x86 is a huge set, so “very few” is a relative term ;-)
Microcode is used very heavily in modern CPUs. It has been since the 90s.
You are confusing microcode and micro-ops.
what is microcode is, then?
It’s a way of creating a sequential control circuit based on a piece of memory holding the outputs and next state for each state.
The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn’t need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.