Having trouble to generate correct output? Try prefixes!

Smorty [she/her]@lemmy.blahaj.zone · 3 months ago

Having trouble to generate correct output? Try prefixes!

Smorty [she/her]@lemmy.blahaj.zone · 3 months ago

Could you please tell me why you chose kobold.cpp over llama.cpp? I only ever used llama.cpp so I’d like to hear from the other side!

I really like the idea of letting an LLM perform too calls into middle of the generation.

Like, we instruct the LLM to Say what it will do, then to put the tool call into <tool></tool> tags. Then we could set </tool> as a stop keyword and insert the results into it’s message.

I have tries this before, but it tends to not believe what is in its own message. It tends to see the output of the tool cal and go Don't believe what I just said, I made that up, even though LLMs are infamous for hallucinating…

hendrik@palaver.p3x.de · 3 months ago

Kobold.cpp is using llama.cpp under the hoods. It just adds a few extras and a webserver and an user interface. Plus some backwards compatibility for older model file formats, and it’s relatively easy to install. But the project builds upon llama.cpp and uses that same code for inference.

Having trouble to generate correct output? Try prefixes!

Having trouble to generate correct output? Try prefixes!

Predefined formats

Translation

Code completion and generation

Using this in ollama

Be aware!