Absolutely needed: to get high efficiency for this beast … as it gets better, we’ll become too dependent.
“all of this growth is for a new technology that’s still finding its footing, and in many applications—education, medical advice, legal analysis—might be the wrong tool for the job,”
that’s hardly believable. do you have any statistics on this? is this some special edition of a heavy, high performance gaming laptop, with an external gpu attached, and a datacenter consisting of 2 racks almost filled to half?
Pick any benchmark that some hundred-billion-parameter model bragged about in mid-2024, and there’s a four-billion-parameter model today that’s at least competitive.
This website that’s unusable in dark mode shows Gemini 1.5 Pro and Claude 3.5 Sonnet with GQPA scores of 59 and a bit, as of June 2024. The internet says this Sonnet has 175 billion parameters. Nemotron Nano 8B from March of this year has 8 billion and scores 54.1. Phi 4 Mini Reasoning, at 3.8 billion, scores 52.
Scrolling down and picking the HumanEval coding benchmark, Granite 3.3 8B from last month scores a fraction higher than DeepSeek-V2.5 236B from May 2024.
Yesterday Google released(?) Gemma 3n, a 4B model, claiming a Chatbot Arena Elo a hair below Claude 3.7 Sonnet (GPQA 84.8, February 2025).
For all Ed Zitron loves to roll his eyes at ‘these are early days,’ shit is moving. Some reports of diminishing returns are because you can’t score higher than 100%. Improvement is asymptotic. Big-ass models are chipping away at ‘well AI can’t do [blank].’ Tiny open models are following the same curve, about a year later.
It is mainly due to the orders of magnitude advances in bullshitting people about AI’s capabilities.