• Kerfuffle
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    1.7 trillion parameters is huge so it doesn’t take a lot to be smaller than that. 33b is really small though. Just from my own playing around with this stuff, models seem to get decent around the 65-70m parameter mark.