Absolutely needed: to get high efficiency for this beast … as it gets better, we’ll become too dependent.
“all of this growth is for a new technology that’s still finding its footing, and in many applications—education, medical advice, legal analysis—might be the wrong tool for the job,”
Local models cannot be worse than playing a video game.
There’s low-VRAM models for video that approach one frame per second… on the kind of mid-range cards that’d have low VRAM. A 30 Hz clip lasting 10 seconds would take about five minutes. When was the last time you played a really fancy-looking game for less than five minutes?
Now creating the models, yeah, that’s still lumbering giants burning money. But mostly thanks to Jevon’s paradox. How many watt-hours are needed per hand-wavy unit of training has gone down - so they do a lot more of it. And the result is that laptop-sized models today beat datacenter-sized models a year ago.
that’s hardly believable. do you have any statistics on this? is this some special edition of a heavy, high performance gaming laptop, with an external gpu attached, and a datacenter consisting of 2 racks almost filled to half?
Pick any benchmark that some hundred-billion-parameter model bragged about in mid-2024, and there’s a four-billion-parameter model today that’s at least competitive.
This website that’s unusable in dark mode shows Gemini 1.5 Pro and Claude 3.5 Sonnet with GQPA scores of 59 and a bit, as of June 2024. The internet says this Sonnet has 175 billion parameters. Nemotron Nano 8B from March of this year has 8 billion and scores 54.1. Phi 4 Mini Reasoning, at 3.8 billion, scores 52.
Scrolling down and picking the HumanEval coding benchmark, Granite 3.3 8B from last month scores a fraction higher than DeepSeek-V2.5 236B from May 2024.
Yesterday Google released(?) Gemma 3n, a 4B model, claiming a Chatbot Arena Elo a hair below Claude 3.7 Sonnet (GPQA 84.8, February 2025).
For all Ed Zitron loves to roll his eyes at ‘these are early days,’ shit is moving. Some reports of diminishing returns are because you can’t score higher than 100%. Improvement is asymptotic. Big-ass models are chipping away at ‘well AI can’t do [blank].’ Tiny open models are following the same curve, about a year later.
It is mainly due to the orders of magnitude advances in bullshitting people about AI’s capabilities.