David Gerard@awful.systemsM to TechTakes@awful.systemsEnglish · 1 month agoOpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questionspivot-to-ai.comexternal-linkmessage-square11fedilinkarrow-up168arrow-down10cross-posted to: [email protected]
arrow-up168arrow-down1external-linkOpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questionspivot-to-ai.comDavid Gerard@awful.systemsM to TechTakes@awful.systemsEnglish · 1 month agomessage-square11fedilinkcross-posted to: [email protected]
minus-squareNazlinkfedilinkEnglisharrow-up1·1 month agoYou were right. I make open models from scratch and I’ve tested the corporate benchmark banks and some of the results they’ve gotten are extremely sus. After the Volkswagen Dieselgate Scandal, I’ve taken metrics not reported by independent auditors with a chonking peanut scooper of salt. For the curious: My best results so far have been like 74% on HumanEval with a 405B, zero-shot induction.
You were right.
I make open models from scratch and I’ve tested the corporate benchmark banks and some of the results they’ve gotten are extremely sus.
After the Volkswagen Dieselgate Scandal, I’ve taken metrics not reported by independent auditors with a chonking peanut scooper of salt.
For the curious:
My best results so far have been like 74% on HumanEval with a 405B, zero-shot induction.