Proton's biased article on Deepseek

JOMusic@lemmy.ml · 1 month ago

Proton's biased article on Deepseek

lily33@lemm.ee · 1 month ago

To be fair, most people can’t actually self-host Deepseek, but there already are other providers offering API access to it.

halcyoncmdr@lemmy.world · 1 month ago

There are plenty of step-by-step guides to run Deepseek locally. Hell, someone even had it running on a Raspberry Pi. It seems to be much more efficient than other current alternatives.

That’s about as openly available to self host as you can get without a 1-button installer.

tekato@lemmy.world · 1 month ago

You can run an imitation of the DeepSeek R1 model, but not the actual one unless you literally buy a dozen of whatever NVIDIA’s top GPU is at the moment.

lily33@lemm.ee · 1 month ago

A server grade CPU with a lot of RAM and memory bandwidth would work reasonable well, and cost “only” ~$10k rather than 100k+…

alcoholicorn@lemmy.ml · 1 month ago

I saw posts about people running it well enough for testing purposes on an NVMe.

Dyf_Tfh@lemmy.sdf.org · edit-2 1 month ago

Those are not deepseek R1. They are unrelated models like llama3 from Meta or Qwen from Alibaba “distilled” by deepseek.

This is a common method to smarten a smaller model from a larger one.

Ollama should have never labelled them deepseek:8B/32B. Way too many people misunderstood that.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

I’m running deepseek-r1:14b-qwen-distill-fp16 locally and it produces really good results I find. Like yeah it’s a reduced version of the online one, but it’s still far better than anything else I’ve tried running locally.

morrowind@lemmy.ml · edit-2 1 month ago

Have you compared it with the regular qwen? It was also very good

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

The main difference is speed and memory usage. Qwen is a full-sized, high-parameter model while qwen-distill is a smaller model created using knowledge distillation to mimic qwen’s outputs. If you have the resources to run qwen fast then I’d just go with that.

morrowind@lemmy.ml · 1 month ago

I think you’re confusing the two. I’m talking about the regular qwen before it was finetuned by deep seek, not the regular deepseek

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

I haven’t actually used that one, but doesn’t the same point apply here too? The whole point of DeepSeek is in distillation that makes runtime requirements smaller.

morrowind@lemmy.ml · 1 month ago

No cause I was already running regular (non-deepseek) qwen 14B, admittedly a heavily quantized and uncensored version, so I was just curious if it would be any better