RennederM to BecomeMe · 1 year agoThe Long and Mostly Short of China’s Newest GPTspectrum.ieee.orgexternal-linkmessage-square2fedilinkarrow-up12arrow-down14
arrow-up1-2arrow-down1external-linkThe Long and Mostly Short of China’s Newest GPTspectrum.ieee.orgRennederM to BecomeMe · 1 year agomessage-square2fedilink
minus-squareKerfufflelinkfedilinkarrow-up1·1 year ago1.7 trillion parameters is huge so it doesn’t take a lot to be smaller than that. 33b is really small though. Just from my own playing around with this stuff, models seem to get decent around the 65-70m parameter mark.
1.7 trillion parameters is huge so it doesn’t take a lot to be smaller than that. 33b is really small though. Just from my own playing around with this stuff, models seem to get decent around the 65-70m parameter mark.