People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1’s 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3
However, something to note:
Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.
Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?
Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md
The same knowledge cutoff, same amount of training data, and same training time give me hope that it’s just a better finetune of maybe Llama 3.1 405b.
On Huggingface, someone said it’s still the same base model: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/discussions/10
And I remember watching some interview with Zuckerberg this year, where he said releasing the models to the public, including base models, is what he wants and part of their strategy.
Thank you so much, that exactly answers my question with the official response (that guy works at Meta) that confirms it’s the same base model!
I was concerned primarily because in the release notes it strangely didn’t mention it anywhere, and I thought it would have been important enough to mention.