Models not loading into RAM

corvus@lemmy.ml · 8 days ago

Models not loading into RAM

gens@programming.dev · edit-2 7 hours ago

The programs usually mmap the file into memory. That means that parts of it are loaded as used and unloaded if there is no memory left. That’s why it does not say it is using memory. Check disc i/o as it is generating the message. For linux that can be seen in htop or iotop, for win idk.

Note that I use lmstudio, that uses llama.cpp to run models. Gpt4all, I think, uses a modified version of same. Doesn’t matter they should all be using mmap to load the file.

PS Depending on the model I also get a couple tokens per sec on the cpu.

Edit: Didn’t see someone already said the same, I’l leave this here anyway.