llama2.c: Inference Llama 2 in one file of pure C by Andrej Karpathy

github.com

llama2.c: Inference Llama 2 in one file of pure C by Andrej Karpathy

github.com

noneabove1182M to

LocalLLaMAEnglish · 2 years ago

GitHub - karpathy/llama2.c: Inference Llama 2 in one file of pure C

github.com

Inference Llama 2 in one file of pure C. Contribute to karpathy/llama2.c development by creating an account on GitHub.

Have you ever wanted to inference a baby Llama 2 model in pure C? No? Well, now you can!

With this code you can train the Llama 2 LLM architecture from scratch in PyTorch, then save the weights to a raw binary file, then load that into one ~simple 500-line C file (run.c) that inferences the model, simply in fp32 for now. On my cloud Linux devbox a dim 288 6-layer 6-head model (~15M params) inferences at ~100 tok/s in fp32, and about the same on my M1 MacBook Air. I was somewhat pleasantly surprised that one can run reasonably sized models (few ten million params) at highly interactive rates with an approach this simple.

https://twitter.com/karpathy/status/1683143097604243456

You must log in or register to comment.

Chat