Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks Author(s) Tiedong Liu and Bryan Kian Hsiang Low from National University of Singapore
Word Count 6500+
Estimated Read Time Around 15-20 minutes
Source Code A GitHub repo is provided to access their model, dataset, and script for dataset generation: https://github.com/liutiedong/goat
Summary The authors introduce Goat, a fine-tuned LLaMA model that achieves state-of-the-art performance on a range of arithmetic tasks based on the BIG Bench dataset. In particular, the zero-shot Goat-7B model matches or outperforms the accuracy of the few-shot PaLM-540B model.
They show that supervised fine-tuning alone, without any special techniques, enables LLaMA to generate correct answers for large number addition and subtraction. This is attributed to LLaMA’s consistent tokenization of numbers.
For large number multiplication and division, they propose a decomposition method based on task learnability. This method breaks down unlearnable tasks into a series of learnable subtasks leveraging basic arithmetic principles.
Goat-7B was trained using the LoRA technique on a modest 24GB GPU, making it easily reproducible. Limitations around extrapolation and interpretability of the proposed method are also discussed.
The code, dataset, and model are released to facilitate research in instruction tuning and mathematical reasoning in language models.
Applicability Evaluation The research demonstrates how LLaMA’s consistent tokenization facilitates arithmetic tasks and shows that intermediate supervision, through a decomposition method, can help solve complex problems. These findings could be useful for building applications using large language models or GANs that require mathematical reasoning or multistep computations.
Specifically, the proposed instruction tuning pipeline can potentially be integrated with other instruction-tuned LMs to enhance their arithmetic reasoning for solving math word problems.
However, the limited extrapolation capability of the fine-tuned model and the lack of an optimal decomposition method remain challenges that need to be addressed for applicability in real-world applications.
I’m not an AI researcher or anything, but it seems like a “waste” to use the neural network itself to perform arithmetic. Instead, it would be far more efficient to somehow let the model use a normal calculator when it needs it.
we could make an even more power hungry blockchain!
id hope no one uses it for math, though it does speak to some of its reliability in planning.