• WolfLink
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 days ago

    If that really is the bottleneck I’d expect them to use FPGAs or customized ASICs instead of a common CPU.

    • j4k3@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 days ago

      These options do not scale for transformers workloads. I can’t say that I fully understand the reason, but if you search on YT (and get lucky), I have seen someone that was working on this specifically within Altera/Intel that explained why FPGAs do not work for AI. I seem to recall it has to do with power and something else about how scaling works.

      The AI tensor bottleneck is the bus width between L2 to L1. You can find info for that one on the Chips and Cheese blog. That bus can’t be made larger without pulling down the core. The processor speeds are just too high and optimized far too much for serial code execution.

      • WolfLink
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        For very large AI purposes people use GPU clusters, and a GPU is already very close to a tensor-product-optimized ASIC. There also has been recent work on even more specialized AI chips.

        For applications that need to run on a smaller scale, people can and do run things like neural networks on FPGAs.

        • j4k3@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 days ago

          Go put money on FPGAs for AI like no one else and make your fortune.

          The GPU is only a hack as well. It is the best hack at the moment.

          Specialized AI hardware will fail too. The CPU itself will be redesigned and satisfy all workloads because splitting workloads across divergent architectures has been done already in the 286-386 era and failed. Any design that is good enough to handle both workloads will absolutely dominate in data centers and therefore everywhere else too. CPU speeds are only for marketing nonsense. This has no real value. Slowing things down will make parallelism easier and allow for a wider cache bus width. Improving cache access will also pay dividends. There are many ways to improve the ALU too. We are still 8 years away from the real fully redesigned hardware that could have started shortly after the Llama model weights dropped. That was the critical moment on the timeline. All edge hardware has a 10 year lead time.

          • brucethemoose@lemmy.world
            cake
            link
            fedilink
            English
            arrow-up
            1
            ·
            4 days ago

            In AI land, programmability has proven to be king so far.

            Nvidia GPUs are so dominant because everything is prototyped, then deployed, in PyTorch. I think Google TPUs are a good counter example where a big entity throws tons of money at this issue, and even releases some models optimal for their hardware, yet Flax and TPUs themself gain very little traction and are still incompatible with the new architectures that come out every other day because no one has bothered to port them.

            FPGAs take this problem and make it an order of magnitude worse. Very few people know how to port and optimize to them… So they don’t.

            • j4k3@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              4 days ago

              I really wish I had saved the reference of the guy from Altera explaining why FPGAs simply will not work for models. I don’t have a ton of interest in FPGAs in general… It may have been on Lex Friedman.

              The FPGA is ultimately anything, and they work for smaller stuff obviously, but there is some specific reason why they do not scale to work with current models. It might have been the way rotations are done in transformers or something like that. The person was on papers I have seen and skimmed on arxiv. Names and details outside of my curiosity do not stick in my abstractive functional thought and memory.