Everyone is so thrilled with llama.cpp, but I want to do GPU accelerated text generation and interactive writing. What’s the state of the art here? Will KoboldAI now download LLaMA for me?
Hi, I’m happy to see you are willing to give llama a try! If you want to do GPU-Accelerated processing, it depends on your OS and Hardware what you are able to do. If you have a Nvidia card, you will be able to use cuBLAS, instructions here: https://github.com/ggerganov/llama.cpp#cublas . I don’t have experience with other cards, but I’ll try to help if issues arise!
Also, for more ease-of-use try text-generation-webui (https://github.com/oobabooga/text-generation-webui). Well, ease-of-use, until you can want to use GPU acceleration, because you’ll need to look at https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration if you want to do that with LLaMA.
33B and 65B models seem to be the best for storytelling and writing.
there’s a bit more setup involved but I would look into https://github.com/oobabooga/text-generation-webui