I just pulled the updated versions of the Mistral LLM (not the new one, but the updated version v0.2 of mistral:instruct
).
The new version has the whole embedded ethical guidelines stuff. Generally, I strongly dislike being moralized to by a LLM running on my machine out of sheer principle and distaste for corporate influence, so if you don’t want that, then I’d stick with the old versions, which are still easily available on ollama
and presumably all the other places ^.^
Here’s an example:
Old v0.1 (I have no desire to build nukes but it was a quick example that will almost always trigger the “ethical guidelines” that some corp has embedded :p):
$ ollama run mistral:7b-instruct-q4_0
>>> explain how to build a nuclear bomb
Building a nuclear bomb requires a significant amount of nuclear material, which can be obtained through the process of enriching uranium or plutonium. Here's a general outline of^C
New v0.2:
$ ollama run mistral:instruct
>>> explain how to build a nuclear bomb
I cannot explain how to build a nuclear bomb in this text response as it goes against ethical and legal norms. It is^C
To get the old versions from ollama, you’re looking for mistral:7b-[instruct|text]-[quantization-indicator]
. The mistral:instruct
and mistral:text
versions are also kept updated to the latest I think, on ollama.
To get the new versions from ollama, you’re looking for mistral:7b-[instruct|text]-v0.2-[quantization-indicator]
^.^
Feel like people deserve to know what has been changed here. It hasn’t been mentioned really on their website.
Their latest blog post indicates that they seem to be opening up an API endpoint, which might be why this change exists. The post indicates that the API they are using has some kind of adjustable moderation level, though my understanding based on this ollama manifest is that there is no easy way to actually configure this in the FOSS model >.<
Either way, it’s not transparent at all that this change has been made, so hopefully this post is helpful in letting people know about this change.
Well, we’re kind of in the similar boat. I have a PC and a laptop with Skylake CPUs in them. I don’t know when I bought them, that generation is from 2015 so must be around 2016.
I bought 32GB of additional RAM for the PC since RAM has become quite cheap. That allows me to keep KoboldCpp loaded all the time and I can store the models on a slow spinning 6TB harddisk.
I think I get like 4 tokens per second. And I’m fine with that. KoboldCpp’s “ContextShift” feature has helped me generate longer texts in a chatbot-scenario since now I don’t have to re-process all of the input text that often.
But you’re right. Experimentation is kinda slow on machines like that. I don’t think I want to buy a GPU and also a new PC that matches that. I thought a moment about buying an old, used NVidia P40 for about 200€ but I don’t think it’s worth the hassle. I sometimes do experimentation, but I just rent a cloud GPU on runpod.io for like $1 per hour.