The Math Behind “The Curse of Dimensionality”

Renneder · 7 months ago

The Math Behind “The Curse of Dimensionality”

ladicius@lemmy.world · 7 months ago

“Counterintuitive” is an understatement. My brain already hurts after reading only the first paragraphs.

oDDmON@lemmy.world · 7 months ago

Bulled my way thru a quarter of the article and am exhausted. People actually understand that word salad?

Jesus_666@lemmy.world · 7 months ago

It helps if you had some of this stuff in college.

The gist of it is that you want to have your data in many dimensions because that allows you to fit your facts in more easily – but multidimensional data starts behaving in some really counterintuitive ways. For example, data points become harder to distinguish for complicated math reasons.

That’s really annoying and people doing AI should be mindful of it or their AI ends up underperforming.

I’m more familiar with the course of dimensionality with respect to clustering (aka looking at a bunch of data points and trying to find groups). Clustering quickly becomes really expensive as your dimensions go up and that can make your entire approach prohibitively expensive.

Note that the article mentioned doing kNN (k-Nearest-Neighbor, a clustering algorithm) in 700 dimensions, which kinda sounds like a very good reason why training a major AI model takes obscene amounts of resources.

MoonManKipper@lemmy.world · 7 months ago

It’s sounds worse than it is - if you think ‘independent variables’ or just elements in a big vector it’s all a lot easier to get your head around

AuroraZzz@lemmy.world · 7 months ago

For those that have trouble understanding what is being talked about here. A little background, data is often multiplied to higher dimensions in ai in order to make decisions and determine important information. This article is saying high dimensions and complicated machine learning models should only be used for tasks that converge on a specific point, such as feature selection (selecting the most relevant column or feature in a table of data). This is because as dimensions increase, even though the space increases, the distance between the data points that can be placed in that space decreases, this forces complicated ai models to converge on a specific point or opinion and overfit their answers to what they were trained to do (as opposed to thinking out of the box and coming up with new ideas). The author cautions that using very high dimensions to calculate data leads to overfitting and recommends a managed approach of using a lower dimension to train neural networks/machine learning models

gimpchrist @lemmy.world · 7 months ago

So… too much brain in an AI leads to repetitive answers? Instead of new answers? And the less brain, the more… I don’t know?