- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
This article says some funny things:
While more advanced features will ultimately require an internet connection
Ok, then?
On-device processes could help eliminate certain controversies found with server-side AI tools. For example, these tools have been known to hallucinate, meaning they make up information confidently.
What? How would on-device processes have any effect on hallucination in LLMs?
Or are you trying to tell us that this article was written by an LLM and that the whole thing is a confidently made up hallucination?
That’s what they all say. But a lot of these so called AI features require power more than what a phone has. Offloading to a server is sometimes a must.
Quantised models can be surprisingly small. And if Apple aren’t targeting LLMs for local use, more specific/tailored models absolutely can run on device.
That said, given the precedent sent by Siri, their next progression of Siri into an LLM will absolutely require network connection and be executed server side.
Samsung’s version on One UI 6.1 lets you toggle between running the local models on the phone’s NPU versus connecting to their servers.
The local version is slightly slower and produces worse results, but can be used for privacy or without the internet. The remote version is what you’d expect.
The thing is, these AI features are just features already present in some way or another, just emphasizing content generation and slapping AI branding.
Sure if you’re running large models like gpt, smaller models tailored to specific use cases can absolutely run on phones. Whether or not they get there implementation down right is a different story though
So it will very slowly find some results on the web for you.
you’d be surprised how fast a model can be if you narrow the scope, quantize, and target specific hardware, like the AI hardware features they’re announcing.
not a 1-1, but a quantized Mistral 7B runs at ~35 tokens/sec on my M2. that’s not even as optimized as it could be. it can write simple scripts and do some decent writing prompts.
they could get really narrow in scope (super simple RAG, limited responses, etc), quantize down to even something like 4 bit, and run it on custom accelerated hardware. it doesn’t have to reproduce Shakespeare, but i can imagine a PoC that runs circles around Siri in semantic understanding and generated responses. being able to reach out on Slack to the engineers that built the NPU stack ain’t bad neither.
Isn’t Siri server-side now?
Siri was originally in the cloud, but Apple has been trying to handle more Siri requests locally so that requests can be handled faster and without internet access.
Hopefully Apple will bump up the memory and storage because of this, it’s long overdue.