As Twitter destroys its brand by renaming itself X, Mastodon user numbers are again soaring

Ive been playing around with the deepseek R1 distills. Qwen 14b and 32b specifically.

So far its very cool to see models really going after this current CoT meta by mimicing internal thinking monologues. Seeing a model go “but wait…” “Hold on, let me check again…” “Aha! So…” Kind of makes it feel more natural in its eventual conclusions.

I don’t like how it can get caught in looping thought processes and im not sure how much all the extra tokens spent really go towards a “better” answer/solution.

What really needs to be ironed out is the reading comprehension seems to be lower th average as it misses small details in tricky questions and makes assumptions about what youre trying to ask like wanting a recipe for coconut oil cookies but only seeing coconut and giving a coconut cookie recipe with regular butter.

Its exciting to see models operate in a kind of a new way.

Thoughts on new deepseek R1 distill models

Thoughts on new deepseek R1 distill models