Qwen 2.5 32B is where it’s at now. 24GB is affordable, and it fits perfectly.
Otherwise, stay on the lookout for AMD Strix Halo, which can reportedly allocate up to 96GB on its IGP, and you can run faster backends like vllm or exllama.
It’s just smarter with the same number of parameters. Try Qwen QwQ or Qwen coder 32B, see for yourself… it stacks up well against huge models like the 123B Mistral Large, or even GPT-4.
Why? Alibaba trained it well, presumably with better data than OpenAI or whomever else, though specifics are up for debate. Some suggests that bilingual training on English/Chinese (aka the two largest text corpuses in existance) significantly helps the model over mostly english. Some say the government just gave them better data. There’s also suggestions that having so few GPUs compared to American AI companies made the Chinese “thrifty,” and gave them far more incentive to be innovative rather than brute forcing models (which has diminishing returns).
Qwen 2.5 32B is where it’s at now. 24GB is affordable, and it fits perfectly.
Otherwise, stay on the lookout for AMD Strix Halo, which can reportedly allocate up to 96GB on its IGP, and you can run faster backends like vllm or exllama.
What’s up with Qwen that makes it better than anything else?
It’s just smarter with the same number of parameters. Try Qwen QwQ or Qwen coder 32B, see for yourself… it stacks up well against huge models like the 123B Mistral Large, or even GPT-4.
Why? Alibaba trained it well, presumably with better data than OpenAI or whomever else, though specifics are up for debate. Some suggests that bilingual training on English/Chinese (aka the two largest text corpuses in existance) significantly helps the model over mostly english. Some say the government just gave them better data. There’s also suggestions that having so few GPUs compared to American AI companies made the Chinese “thrifty,” and gave them far more incentive to be innovative rather than brute forcing models (which has diminishing returns).