The Long and Mostly Short of China’s Newest GPT

Renneder · 1 year ago

The Long and Mostly Short of China’s Newest GPT

Kerfuffle · 1 year ago

1.7 trillion parameters is huge so it doesn’t take a lot to be smaller than that. 33b is really small though. Just from my own playing around with this stuff, models seem to get decent around the 65-70m parameter mark.