OpenAI released its o3 model in December, bragging about the model’s unparalleled ability to do math and science problems. The model’s success on the FrontierMath benchmark — solving 25.2% of…
I fucking knew it!!! I don’t even know why I feel so vindicated for calling out such an obvious fraud tbh. anyone, besides possibly a HN poster, could have seen it coming
I fucking knew it!!! I don’t even know why I feel so vindicated for calling out such an obvious fraud tbh. anyone, besides possibly a HN poster, could have seen it coming
You were right.
I make open models from scratch and I’ve tested the corporate benchmark banks and some of the results they’ve gotten are extremely sus.
After the Volkswagen Dieselgate Scandal, I’ve taken metrics not reported by independent auditors with a chonking peanut scooper of salt.
For the curious:
My best results so far have been like 74% on HumanEval with a 405B, zero-shot induction.