one counter-example: face recognition using 100k features (http://research.micro...

cortexman · on May 13, 2014

Not really. The article mentions that using linear methods (i.e., LIBLINEAR) is one way to avoid the curse. LIBLINEAR is specifically designed for situations in which you have many features and relatively few training instances. When using a linear classifier it may make sense to simply generate as many features as you can, and then use, i.e., lasso regression in order to do feature selection. http://www.csie.ntu.edu.tw/~cjlin/liblinear/

lightsidelabs · on May 13, 2014

Not strictly as many features as you can - there are many ways that you can add huge numbers of highly correlated and redundant features that limit the effectiveness of both the classifier as well as selection or regularization methods.

A simple example of this is in natural language processing. Adding dependency or phrase structure parse features to an n-gram bag-of-words model might result in an order of magnitude increase in the number of dimensions in your feature space, and ends up harming classification accuracy, even with tightly controlled and elegant feature selection methods.

kmike84 · on May 13, 2014

But the article used a linear model to demonstrate the curse, and the model was overfit just with 3 dimensions. There is clearly something missing: for example, for text data it is not uncommon to have thousands or hundred thousands of dimensions, and algorithms work fine.

I think the missing piece is regularisation. It doesn't have to do feature selection and actually reduce the number of dimensions, but you're right that using L1 for such data is usually a good idea.

ced · on May 13, 2014

The article had very few data points, that's why it worked with 3 dimensions. The deciding factor is how N (effective number of data points) compares with p (effective number of features).