This is just a nitpicking question. Do Intel chips still have some space/transistors dedicated to SSE3? If they do, why can’t they implement SSE3 by other, more powerful instrutions (like AVX) to save die space?

  • YumiYumiYumi@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    why can’t they implement SSE3 by other, more powerful instrutions (like AVX)

    In short, the instruction semantics are slightly different, so they don’t do exactly the same thing. But it’s likely that the execution unit hardware is re-used for those.

  • jaaval@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Not really in most cases. The decoder might need to spend some more transistors to accommodate the instructions but that should not be much. And the very oldest never used ones can be thrown to some very slow microcode rom or something. In the execution side SSE uses the same registers as the latest AVX does. And the low level compute operations actually done by the execution units are the same. You need to understand that each instruction is actually translated to one or more micro operation by the decoder, they are not direct execution control data.

    However there are some old no longer used features in x86 CPUs that do complicate the design somewhat. And there are instructions connected to those features. But that’s really not the instructions themselves using the die area. Intel’s x86s standard proposes to remove for example the middle privilege level rings and call gates from the CPUs. As well as some no longer relevant memory access modes.

  • scfw0x0f@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s not the die space that’s the issue; it’s the time to validate the correct operation of those instructions with a pipeline that’s designed for something very different.

  • Jannik2099@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    No, no CPU has seperate FPUs for SSE & AVX - it’s compiled to the same set of uOps by microcode.

    Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.

  • Jannik2099@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    No, no CPU has seperate FPUs for SSE & AVX - it’s compiled to the same set of uOps by microcode.

    Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.

    • wintrmt3@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      uOps by microcode

      That’s not how it works, only a few overtly complex instructions are implemented in microcode and they are slow, most instructions use a random logic decoder.

      • GomaEspumaRegional@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        in x86 that’s not the case, only the critical path x86 instructions are implemented directly in logic lookup tables in the decoder. Some of the less used ones are on the uCode ROM on chip. And a bunch more on PAL code on off-chip ROM. And a few of the rarest ones are on the exception manager libraries of the OS.

        A big chunk of the x86 ISA is rarely used so this tiered implementation has been used at least since Nehalem if not before.

  • AutonomousOrganism@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    Modern x86 chips are so large that the space the decoder takes is relatively small.

    It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.

  • AutonomousOrganism@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    Modern x86 chips are so large that the space the decoder takes is relatively small.

    It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.

  • einmaldrin_alleshin@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn’t need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.

    • th3typh00n@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      This is incorrect. Very few x86 instructions uses microcode as the microcode engine is quite slow. It’s mainly used for things like cpuid and such.

      • GomaEspumaRegional@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        A lot of x86 ISA is in the micro and PAL codes. Only the most frequent and performance-limiting ones are on-core for modern x86.

        x86 is a huge set, so “very few” is a relative term ;-)

        • wintrmt3@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          It’s a way of creating a sequential control circuit based on a piece of memory holding the outputs and next state for each state.

  • einmaldrin_alleshin@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn’t need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.