It does produce good code. It does not reliably produce good code. I am a programmer, it makes my job 10x faster and I just have to fix a few bugs in the code it usually generates. Over time, I learned what it is good at (UI code, converting things, boilerplate) and what it struggles with (anything involving newer tech, algorithmic understanding, etc.)
I often refer to it as my intern: It acts like an academically trained, not particularly competent, but very motivated, fast typing intern.
But then I am also working on the field. Prompting it correctly is too often dismissed as a skill (I used to dismiss it too). It needs more understanding than people give it credit for.
I think that like many IT tech it will go from being a dev tool to everyday tool gradually.
All the pieces of the puzzle to be able to control a computer by voice using only natural language are there. You don’t realize how big it is. Companies haven’t assembled it yet because it is actually harder to monetize on it than code it. I think probably Apple is in the best position for it. Microsoft is going to attempt and will fail like usual and Google will probably put a half-assed attempt at it. I’ll personally go for the open source version of it.
All the criticism for artificial intelligence and deployments of it like ChatGPT right now I see as people not being able to hold something in their hand. This is far more of an abstraction than a new phone and when people can’t grock that immediately or they play with it for 5 minutes and dismiss it because it gave them a form-fill looking answer when they gave it some para-literate 5 word question, then they’re obviously going to be unimpressed and walk away.
If you spend any amount of time actually trying to figure out what to say to it in order to get it to produce actual information it’s one of the most compelling new ways to interface with a computer since the MOAD and I would imagine ultimately will be the most compelling in the end.
Like put it this way, I don’t know if this will actually end up producing AGI but, like… This thing is a 3 year old.
And it’s a 3 year old that can write basic coding implementations and give you at least, maybe in some cases much better than, high school level comprehension s of most of the English (and quickly building to other languages) written world.
I am pretty sure that there are ASIC being put in production as we speak with Whisper embeded. Expect a 4 dollars chip to add voice recognition and a basic LLM to any appliance.
Now you made me interested in learning how to prompt these things. From what I have tried, I saw that appending some more descriptional sentences after the actual prompt usually makes loads of sense. But once you add too many sentences, the model tends to write way longer replies too. This is obviously something which happens in real life too, so maybe that is just the natural way…
Yea, having worked in the IT field and knowing a few languages myself, I think that as far as code goes, it can be ok for basically laying out the structure of what you are trying to do. It’s typically the details that it misses in my experience. In that sense, it definitely can be used similarly to an IDE.
Hey I heard that intern metaphor before somewhere… No Boilerplate?
EDIT: Dumb me, I replied before reading the enitre message. What you say is exactly how I feel, there are some real big possibilities here. Currently the closest thing to that using a computer with only your voice would be something like ollama combined with open web ui and their calling feature and some tool functions.
Install text-generation-webui, check their “whisper stt” option, and you can talk with a computer. As a non native I prefer to read the english output than listen to it but they do provide TTS as well.
I use it almost daily.
It does produce good code. It does not reliably produce good code. I am a programmer, it makes my job 10x faster and I just have to fix a few bugs in the code it usually generates. Over time, I learned what it is good at (UI code, converting things, boilerplate) and what it struggles with (anything involving newer tech, algorithmic understanding, etc.)
I often refer to it as my intern: It acts like an academically trained, not particularly competent, but very motivated, fast typing intern.
But then I am also working on the field. Prompting it correctly is too often dismissed as a skill (I used to dismiss it too). It needs more understanding than people give it credit for.
I think that like many IT tech it will go from being a dev tool to everyday tool gradually.
All the pieces of the puzzle to be able to control a computer by voice using only natural language are there. You don’t realize how big it is. Companies haven’t assembled it yet because it is actually harder to monetize on it than code it. I think probably Apple is in the best position for it. Microsoft is going to attempt and will fail like usual and Google will probably put a half-assed attempt at it. I’ll personally go for the open source version of it.
Yes, thank you, this.
All the criticism for artificial intelligence and deployments of it like ChatGPT right now I see as people not being able to hold something in their hand. This is far more of an abstraction than a new phone and when people can’t grock that immediately or they play with it for 5 minutes and dismiss it because it gave them a form-fill looking answer when they gave it some para-literate 5 word question, then they’re obviously going to be unimpressed and walk away.
If you spend any amount of time actually trying to figure out what to say to it in order to get it to produce actual information it’s one of the most compelling new ways to interface with a computer since the MOAD and I would imagine ultimately will be the most compelling in the end.
Like put it this way, I don’t know if this will actually end up producing AGI but, like… This thing is a 3 year old.
And it’s a 3 year old that can write basic coding implementations and give you at least, maybe in some cases much better than, high school level comprehension s of most of the English (and quickly building to other languages) written world.
This is the dumbest it will ever be…
Also, as a side effect, we just solve speech recognition. In a year or two, speaking to machines will be the default interface.
Style TTs2 for output. Localllama with a high quant on 4x 4090’. Personal AI assistant running on your local homelab for <30k.
I kinda see it as a home appliance or vehicle level purchase.
I am pretty sure that there are ASIC being put in production as we speak with Whisper embeded. Expect a 4 dollars chip to add voice recognition and a basic LLM to any appliance.
Now you made me interested in learning how to prompt these things. From what I have tried, I saw that appending some more descriptional sentences after the actual prompt usually makes loads of sense. But once you add too many sentences, the model tends to write way longer replies too. This is obviously something which happens in real life too, so maybe that is just the natural way…
Yea, having worked in the IT field and knowing a few languages myself, I think that as far as code goes, it can be ok for basically laying out the structure of what you are trying to do. It’s typically the details that it misses in my experience. In that sense, it definitely can be used similarly to an IDE.
Hey I heard that intern metaphor before somewhere… No Boilerplate?
EDIT: Dumb me, I replied before reading the enitre message. What you say is exactly how I feel, there are some real big possibilities here. Currently the closest thing to that using a computer with only your voice would be something like ollama combined with open web ui and their calling feature and some tool functions.
Install text-generation-webui, check their “whisper stt” option, and you can talk with a computer. As a non native I prefer to read the english output than listen to it but they do provide TTS as well.