What’s the big deal?
So, you’re an emerging Data Scientist or you’re dabbling in data analytics and you hear the “Curse of Dimensionality” mentioned a lot. We’ll maybe I can help to clear it up!
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.
The thing about the curse of dimensionality is that when the features or dimensions increase, the volume of the space increases as well, the volume of the space increases so fast that the available data become very sparse. This sparsity is problematic for any method that requires statistical significance. To obtain a statistical and reliable result, the amount of data needed to support the result often grows exponentially with the dimensions. Also, organizing and searching data often relies on detecting areas where objects form groups with similarities.
In high-dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient. We want to be able to see the data and what similarities each point has from each other; this is not possible when each point is quite far ways or the space between each point is too big. And that’s what is referred to as the curse of dimensionality.
In machine learning problems that involve learning a “state-of-nature” from a finite number of data samples in high-dimensional feature space with each feature having a range of possible values, typically an enormous amount of training data is required to ensure that there are several samples with each combination of values.
A typical rule of thumb is that there should be at least 5 training examples for each dimension in the representation. In machine learning and insofar as predictive performance is concerned, the curse of dimensionality is used interchangeably with the peaking phenomenon, which is also known as the Hughes phenomenon. This phenomenon states that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily.
To mitigate the problems associated with high dimensional data a suite of techniques generally referred to as ‘Dimensionality reduction techniques. Dimensionality reduction techniques are what we call Feature selections.
Feature selection is the process of choosing the most relevant features in our data to give to our model. By limiting the number of features we use (rather than just feeding the model the unmodified data), we can often speed up training and improve accuracy, or both.
Within feature selection, there are three categories, Wrapper, Filter, and Embedded Methods.
Wrapper methods are probably the best approach to feature selection (in terms of accuracy), but they also require the most computational resources. Some of them include:
- Recursive Feature Elimination
- Forward Feature Selection
Filter methods the features are evaluated against a proxy rather than cross-validation accuracy. This is not as accurate because the proxy can at best only be correlated with the cross-validation accuracy.
- Mutual Information
- Pointwise Mutual Information
- Pearson Correlation
Embedded methods are just a catch-all for feature selection in machine learning methods that don’t fall into the previous two buckets. L1 regularization is the prime example of an embedded method. It acts in a manner akin to feature selection because the L1 norm penalty encourages sparsity on the model’s weights. Because most of the weight’s coefficients are driven towards zero, they are effectively “selected out” by the regularizer.
Hopefully, this helped you get an idea of what the curse of dimensionality is and better help your modeling process.