But the article used a linear model to demonstrate the curse, and the model was overfit just with 3 dimensions. There is clearly something missing: for example, for text data it is not uncommon to have thousands or hundred thousands of dimensions, and algorithms work fine.
I think the missing piece is regularisation. It doesn't have to do feature selection and actually reduce the number of dimensions, but you're right that using L1 for such data is usually a good idea.
The article had very few data points, that's why it worked with 3 dimensions. The deciding factor is how N (effective number of data points) compares with p (effective number of features).
I think the missing piece is regularisation. It doesn't have to do feature selection and actually reduce the number of dimensions, but you're right that using L1 for such data is usually a good idea.