A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

assassin_aragorn@lemmy.world · 2 years ago

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

darth_helmet · 2 years ago

https://www.understandingai.org/p/large-language-models-explained-with I don’t think you’re intending to be purposefully misleading, but I would recommend checking this article out because the pachinko analogy is not accurate, really. There are several layers of considerations that the model makes when analyzing context to derive meaning. How well these models do with analogies is, I think, a compelling case for the model having, if not “knowledge” of something, at least a good enough analogue to knowledge to be useful.

Training a model on the way we use language is also training the model on how we think, or at least how we express our thoughts. There’s still a ton of gaps to work on before it’s an AGI, but LLMs are on to what’s looking more and more like the right path to getting there.

orclev@lemmy.world · 2 years ago

While it glosses over a lot of details it’s not fundamentally wrong in any fashion. A LLM does not in any meaningful fashion “know” anything. Training an LLM is training it on what words are used in relation to each other in different contexts. It’s like training someone to sing a song in a foreign language they don’t know. They can repeat the sounds and may even recognize when certain words often occur in proximity to each other, but that’s a far cry from actually understanding those words.

A LLM is in no way shape or form anything even remotely like a AGI. I wouldn’t even classify a LLM as AI. LLM are machine learning.

The entire point I was trying to make though is that a LLM does not store specific training data, rather what it stores is more like the hashed results of its training data. It’s a one way transform, there is absolutely no way to start at the finished model and drive it backwards to derive its training input. You could probably show from its output that it’s highly likely some specific piece of data was used to train it, but even that isn’t absolutely certain. Nor can you point at any given piece of the model and say what specific part of the training data it corresponds to or vice versa. Because of that it’s impossible to pluck out some specfic piece of data from the model. The only way to remove data from the model is to throw the model away and train a new model from the original training data with the specific data removed from it.