Using Principal Component Analysis on Images

TTPrograms · on Aug 17, 2015

PCA typically implies that the high dimensional data (images) lie on a low dimensional linear manifold (the principal components). In this case the data of interest (images of people in dresses) likely lies in a very non-linear manifold, so non-linear methods would likely result in much more appropriate models. This is part of why deep learning and tensor methods work so well on image classification tasks.

Nonlinear methods would require much more data, unfortunately.

noelwelsh · on Aug 18, 2015

If your goal is to accurately recreate dresses, then yes, a non-linear method would work better. But, I don't think that makes this work uninteresting. For a start, PCA is very simple to implement and train, which gives it hackability that non-linear methods typically don't have. PCA makes a better tool if you just want to play around. Exploring non-realistic output is also interesting for its own sake. I find the "ghost" effect that PCA generates quite nice.

dthal · on Aug 18, 2015

Nonlinear methods can result in a smaller (lower-dimensional) representation, but they are non-linear so they are harder to use and usually require more data. On the other hand, PCA is easy and having 50 or more principal components is often not a problem, unless you are doing visualization. With the additional representational capacity from just keeping extra dimensions you can still get good reconstructions.

GFK_of_xmaspast · on Aug 18, 2015

I'd be interested to know how the singular values are distributed and how many you need before you can cut off the rest. The reconstruction example she shows looks pretty ok after only 70 components (out of presumably a lot lot more).

Actually, I'd be even more interested in an NMF decomposition of the data, the weights for the 10-component approximation are [-17541.81, -12749.33, -3766.29, 2005.28, 4193.08, 6832.55, -6704.90, -2135.51, 1112.27, 7627.80]. and I wonder if a purely-additive approach will work better.

delhanty · on Aug 17, 2015

> This is part of why deep learning and tensor methods work so well on image classification tasks.

That statement sounds informative. Do you have a good non-specialist's reference?

jamessb · on Aug 17, 2015

You might be interested by colah's (Christopher Olah's) blog: https://colah.github.io/

Two articles in particular are good introductions to looking at neural networks in terms of higher dimensional data lying on lower-dimesnional manifolds:

Neural Networks, Manifolds, and Topology: https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Visualizing MNIST: An Exploration of Dimensionality Reduction: https://colah.github.io/posts/2014-10-Visualizing-MNIST/

delhanty · on Aug 18, 2015

Thank you for the useful links.

leni536 · on Aug 17, 2015

> Using components to recreate image

Well, it seems like to me that the recreated image is in the "training dataset", otherwise the recreation seems too accurate. I would be interested how it handles recreating new, but similar images.

dthal · on Aug 18, 2015

Scroll down to where it says:

   It even works for dresses that were not in the training set:

Yes, that first dress is in the training data, but she did do reconstructions on dresses not used to build the PCA basis. They look decent, but they aren't as good as that first example. And, as she points out, they can't reproduce patterns that are not in the initial data very well, and can't reproduce accessories that were not in the initial data at all.

geb · on Aug 18, 2015

Good suggestion! Added a section to the post about this (and gave you credit for the suggestion at the bottom of the article).

leni536 · on Aug 18, 2015

Nice! Now these look plausible but it's still surprising how accurate these are.

Note since this is a linear approach, the choice of the colorspace can have a large impact on the results. You didn't mention the colorspace, my best bet is that you used sRGB. Maybe you can try it on a linear colorspace too.

Edit: According to the source you use PIL.Image.getdata(), which according to the doc returns "pixel values", then they have an RGB->XYZ conversion example that only works for linear colorspace. So it suggests that they already return linear RGB values, but many software mess these things up so I'm not 100% sure.

astrosi · on Aug 17, 2015

Likewise with the "predictions" of the authors likes/dislikes. Testing how the model will perform on an independent data-set (or at least cross validation [1]) would be much more interesting.

[1] https://en.wikipedia.org/wiki/Cross-validation_(statistics)

gwern · on Aug 18, 2015

The other thing I wondered about the predictions: she apparently rated all of the dresses, and the top/bottom matched the ratings. Fair enough. But what about the residuals, the missclassified ones - the ones where the logistic regression predicts a high or low score and her rating was actually the opposite? That might be interesting to look at.

dspeyer · on Aug 18, 2015

That's there. Search for:

> The misclassifications are interesting too

One problem seems to be that it concluded she'd dislike anything the exact opposite color from her favorite shade of red. A common flaw in linear models.

dthal · on Aug 18, 2015

The blog post seems to be getting modified at this moment. When I first saw it, it didn't have anything about the misclassifications, but that has been added now.

dthal · on Aug 18, 2015

I've only glanced at her code, but it looks like[1] the predictions are from held-out data.

EDIT: All of the data was used in forming the PCA basis, but that isn't (necessarily) an error, depending on the use-case. And the logistic regression model was evaluated on held-out data.

[1]https://github.com/graceavery/Eigenstyle/blob/master/visuals...

a-dub · on Aug 17, 2015

This is actually pretty easy, you just walk the top few basis vectors in the low-d subspace and reproject back into the original space. Sounds complicated, but it's really not.

animefan · on Aug 18, 2015

PCA implicitly uses a quadratic form on a vector space. When done the usual way, this is just the identity matrix. But for image data, I think you would get better results using a quadratic form based on a kernel, so that Q_ijkl where (i,j) and (k,l) are pixel coordinates, is given by Q_ijkl = t((i-k)^2 + (j,l)^2) for some kernel function t.

stared · on Aug 18, 2015

Nice to see it (especially on other image data than just faces). But typically non-negative matrix decomposition creates less ghostly images, with an easier interpretation.

See e.g. http://www.quantumblah.org/?p=428 or http://scikit-learn.org/stable/auto_examples/decomposition/p... (comparison of PCA and non-negative matrix factorization).

Of course, the non-negative factorization comes at some cost (mostly: much higher computational complexity), but it may be worth trying.

_jgvg · on Aug 18, 2015

"These aren’t that bad. I do kinda like them, but think they’d be nicer with some minor adjustments (slightly less form-fitting, slightly less loud pattern, slightly brighter color)."

Would it be possible to use the like/dislike system to make/suggest the minor changes to the dresses?

dthal · on Aug 18, 2015

Is this dataset available anywhere? I see the code on github, but I don't see the data anywhere.

got2surf · on Aug 17, 2015

Really interesting, does anyone know more information on how the initial eigendresses were computed? Was there image processing to distill the dress into major components, hand tagging of variables (like short/long, color, etc), or just raw pixel data processed?

blackle · on Aug 17, 2015

I'm thinking it's just raw pixel data, and the description of each eigendress's meaning was author's interpretation

geb · on Aug 18, 2015

The only manual classification used was separating dresses into "like" and "dislike" categories. The rest is just pixel data.

thisjepisje · on Aug 17, 2015

You might wanna do a search for "eigenfaces" (as opposed to "eigendresses" :P)

https://en.wikipedia.org/wiki/Eigenface

undergrowth54 · on Aug 17, 2015

Reading that, I can't help but wonder how similar a machine-learned eigenface is to an infant-learned one. http://lawcomic.net/guide/?p=3273

I wonder if there is some way to simulate the effect of growing up among people of different races on eyewitness identification and test if it matches what this lawyer is saying at http://lawcomic.net/guide/?p=3282

dspeyer · on Aug 18, 2015

As I understand it, eigenfaces require preprocessing to orient and align the images. That doesn't seem to have come up here. Maybe Amazon is really consistent about centering models in dress photos?

dthal · on Aug 18, 2015

>> Was there image processing to distill the dress into major components, hand tagging of variables (like short/long, color, etc), or just raw pixel data processed?

Its pretty clear that this is PCA on raw pixels of the images, so no, there is no hand-tagging or anything else. The eigendresses are just the eigenvectors from PCA projected back into the original, image-scale, basis.