Basically, the more vram you have, the better the contextual understanding, their memory is. Otherwise you’d have a bot that maybe knows to only contextualize the last couple messages.
Hmm, if only there was some hardware analogue for long-term memory.
I guess I’m wondering if there’s some way to bake the contextual understanding into the model instead of keeping it all in vram. Like if you’re talking to a person and you refer to something that happened a year ago, you might have to provide a little context and it might take them a minute, but eventually, they’ll usually remember. Same with AI, you could say, “hey remember when we talked about [x]?” and then it would recontextualize by bringing that conversation back into vram.
Seems like more or less what people do with Stable Diffusion by training custom models, or LORAs, or embeddings. It would just be interesting if it was a more automatic process as part of interacting with the AI - the model is always being updated with information about your preferences instead of having to be told explicitly.
You’ll want to use a quantised model on your GPU. You could also use the CPU and offload some parts to the GPU with llama.cpp (an option in oobabooga). Llama.cpp models are in the GGUF format.
Are there any Open Source girlfriends that we can download and compile?
Hey now, I don’t want anyone looking at my girlfriend’s source code. That’s personal!
it’s okay, dude, we all already did…
Removed by mod
Does it make it faster if the GPU has waifu stickers on it?
Removed by mod
Define “it”
Because waifu stickers may indeed speed up “it” for some definition of “it”
itll do the opposite im afraid, OW! Hot… umm whats that awful smell of burning plastic.
Hmm, if only there was some hardware analogue for long-term memory.
What are you trying to say? Do you understand what the problem is?
I guess I’m wondering if there’s some way to bake the contextual understanding into the model instead of keeping it all in vram. Like if you’re talking to a person and you refer to something that happened a year ago, you might have to provide a little context and it might take them a minute, but eventually, they’ll usually remember. Same with AI, you could say, “hey remember when we talked about [x]?” and then it would recontextualize by bringing that conversation back into vram.
Seems like more or less what people do with Stable Diffusion by training custom models, or LORAs, or embeddings. It would just be interesting if it was a more automatic process as part of interacting with the AI - the model is always being updated with information about your preferences instead of having to be told explicitly.
But mostly it was just a joke.
Removed by mod
Pretty easy to roll your own with Kobold.cpp and various open model weights found on HuggingFace.
Also for an interface, I’d recommend KoboldLite for writing or assistant and SillyTavern for chat/RP.
Removed by mod
You’ll want to use a quantised model on your GPU. You could also use the CPU and offload some parts to the GPU with llama.cpp (an option in oobabooga). Llama.cpp models are in the GGUF format.
Ask Krieger
i second this request. please
Removed by mod
deleted by creator
Living in the future is so goddamn weird.