LocalLLaMAEnglish · 3 months ago

unsure on how to quantize model

LocalLLaMAEnglish · 3 months ago

I was experimenting with oobabooga trying to run this model but due to it’s size it wasn’t going to fit in ram, so i tried to quantize it using llama.cpp, and that worked, but due to the gguf format it was only running on the cpu. searching for ways to quantize the model while keeping it in safetensors returned nothing; so is there any way to do that?

I’m sorry if this is a stupid question, i still know almost nothing of this field

Chat

Universal Monk
link
fedilink
English
arrow-up
2
arrow-down
1·
2 months ago
Thank you for this!

LocalLLaMA

localllama

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Welcome to LocalLLama! This is a community to discuss local large language models such as LLama, Deepseek, Mistral, and Qwen.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support eachother and share our enthusiasm in a positive constructive way.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

43 users / day
110 users / week
247 users / month
640 users / 6 months
575 local subscribers
2.78K subscribers
287 Posts
1.27K Comments
Modlog