QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

noneabove1182 · 1 year ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

noneabove1182 · 1 year ago

The abstract is meant to pull in random readers, so it’s understandable they’d lay a bit of foundation about what the paper will be about, even if it seems rather simple and unnecessarily wordy

LoRA is still considered to be the gold standard in efficient fine tuning, so that’s why a lot of comparisons are made to it instead of QLoRA, which is more of a hacky way. They both have their advantages, but are pretty distinct.

Another thing worth pointing out is that 4-bit is not actually just converting all 16bit weights into 4 bits (at least, not in GPTQ style) They also save a quantization factor, so there’s more information that can be retrieved from the final quantization than just “multiple everything by 4”

QA LoRA vs QLoRA: I think my distinction is the same as what you said, it’s just about the starting and ending state. QLoRA though also introduced a lot of other different techniques, like double quantizations, normal float datatypes, and paged optimizations to make it work

it’s also worth point out, not understanding it has nothing to do with intellect, it’s just how much foundational knowledge you have, i don’t understand most of the math but i’ve read enough of the papers to understand to some degree what’s going on

The one thing I can’t quite figure out is, I know QLoRA is competitive with a LoRA because it trains more layers of the transformer vs a LoRA, but I don’t see any specific mention of QA-LoRA following that same method which I would think is needed to maintain the quality

Overall you’re right though, this paper is a bit on the weaker side, that said if it works then it works and it’s a pretty decent discovery, but the paper alone does not guarantee that

rufus@discuss.tchncs.de · 1 year ago

Thanks