• Scott
    link
    English
    144 months ago

    I’m just trying to get my hands on some faster hardware, https://groq.com has been able to do some crazy shit with their 500/tokens/sec on their LPUs

    • @[email protected]
      link
      fedilink
      English
      6
      edit-2
      4 months ago

      That is insanely fast! I figured we’d be getting “AI cards” at some point soon.

    • @[email protected]
      link
      fedilink
      English
      14 months ago

      What kind of a website is that? Super slow and doesn’t work without web assembly. Do you really need that for a simple interface

      • Scott
        link
        English
        24 months ago

        It’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive.

        For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.

        • @[email protected]
          link
          fedilink
          English
          14 months ago

          Isn’t it those that cost $2000 per 250mb of memory?? Meaning you’d about 350 to load any half decent model.

          • Scott
            link
            English
            24 months ago

            Not sure how they are doing it, but it was actually $20k not $2k for 250mb of memory on the card. I suspect the models are probably cached in system memory.

        • @[email protected]
          link
          fedilink
          English
          14 months ago

          That with a fp16 model? Don’t be scared to try even a 4 bit quantization, you’d be surprised at how little is lost and how much quicker it is.