I’m currently shopping around for something a bit faster than ollama and because I could not get it to use a different context and output length, which seems to be a known and long ignored issue. Somehow everything I’ve tried so far did miss one or more critical features, like:
- “Hot” model replacement, so loading and unloading models on demand
- Function calling
- Support of most models
- OpenAI API compatibility (to work well with Open WebUI)
I’d be happy about any recommendations!
You must log in or register to comment.
I don’t think it’s OpenAI compatible, but deepseek is faster.
I don’t think you are going to find anything faster. Ollama is pretty much as fast as it gets
Ummm… did you try
/set parameter num_ctx #
and/set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?😂