Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using Principal Component Analysis on Images (thehackerati.com)
189 points by geb on Aug 17, 2015 | hide | past | favorite | 28 comments


PCA typically implies that the high dimensional data (images) lie on a low dimensional linear manifold (the principal components). In this case the data of interest (images of people in dresses) likely lies in a very non-linear manifold, so non-linear methods would likely result in much more appropriate models. This is part of why deep learning and tensor methods work so well on image classification tasks.

Nonlinear methods would require much more data, unfortunately.


If your goal is to accurately recreate dresses, then yes, a non-linear method would work better. But, I don't think that makes this work uninteresting. For a start, PCA is very simple to implement and train, which gives it hackability that non-linear methods typically don't have. PCA makes a better tool if you just want to play around. Exploring non-realistic output is also interesting for its own sake. I find the "ghost" effect that PCA generates quite nice.


Nonlinear methods can result in a smaller (lower-dimensional) representation, but they are non-linear so they are harder to use and usually require more data. On the other hand, PCA is easy and having 50 or more principal components is often not a problem, unless you are doing visualization. With the additional representational capacity from just keeping extra dimensions you can still get good reconstructions.


I'd be interested to know how the singular values are distributed and how many you need before you can cut off the rest. The reconstruction example she shows looks pretty ok after only 70 components (out of presumably a lot lot more).

Actually, I'd be even more interested in an NMF decomposition of the data, the weights for the 10-component approximation are [-17541.81, -12749.33, -3766.29, 2005.28, 4193.08, 6832.55, -6704.90, -2135.51, 1112.27, 7627.80]. and I wonder if a purely-additive approach will work better.


> This is part of why deep learning and tensor methods work so well on image classification tasks.

That statement sounds informative. Do you have a good non-specialist's reference?


You might be interested by colah's (Christopher Olah's) blog: https://colah.github.io/

Two articles in particular are good introductions to looking at neural networks in terms of higher dimensional data lying on lower-dimesnional manifolds:

Neural Networks, Manifolds, and Topology: https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Visualizing MNIST: An Exploration of Dimensionality Reduction: https://colah.github.io/posts/2014-10-Visualizing-MNIST/


Thank you for the useful links.


> Using components to recreate image

Well, it seems like to me that the recreated image is in the "training dataset", otherwise the recreation seems too accurate. I would be interested how it handles recreating new, but similar images.


Scroll down to where it says:

   It even works for dresses that were not in the training set:   
Yes, that first dress is in the training data, but she did do reconstructions on dresses not used to build the PCA basis. They look decent, but they aren't as good as that first example. And, as she points out, they can't reproduce patterns that are not in the initial data very well, and can't reproduce accessories that were not in the initial data at all.


Good suggestion! Added a section to the post about this (and gave you credit for the suggestion at the bottom of the article).


Nice! Now these look plausible but it's still surprising how accurate these are.

Note since this is a linear approach, the choice of the colorspace can have a large impact on the results. You didn't mention the colorspace, my best bet is that you used sRGB. Maybe you can try it on a linear colorspace too.

Edit: According to the source you use PIL.Image.getdata(), which according to the doc returns "pixel values", then they have an RGB->XYZ conversion example that only works for linear colorspace. So it suggests that they already return linear RGB values, but many software mess these things up so I'm not 100% sure.


Likewise with the "predictions" of the authors likes/dislikes. Testing how the model will perform on an independent data-set (or at least cross validation [1]) would be much more interesting.

[1] https://en.wikipedia.org/wiki/Cross-validation_(statistics)


The other thing I wondered about the predictions: she apparently rated all of the dresses, and the top/bottom matched the ratings. Fair enough. But what about the residuals, the missclassified ones - the ones where the logistic regression predicts a high or low score and her rating was actually the opposite? That might be interesting to look at.


That's there. Search for:

> The misclassifications are interesting too

One problem seems to be that it concluded she'd dislike anything the exact opposite color from her favorite shade of red. A common flaw in linear models.


The blog post seems to be getting modified at this moment. When I first saw it, it didn't have anything about the misclassifications, but that has been added now.


I've only glanced at her code, but it looks like[1] the predictions are from held-out data.

EDIT: All of the data was used in forming the PCA basis, but that isn't (necessarily) an error, depending on the use-case. And the logistic regression model was evaluated on held-out data.

[1]https://github.com/graceavery/Eigenstyle/blob/master/visuals...


This is actually pretty easy, you just walk the top few basis vectors in the low-d subspace and reproject back into the original space. Sounds complicated, but it's really not.


PCA implicitly uses a quadratic form on a vector space. When done the usual way, this is just the identity matrix. But for image data, I think you would get better results using a quadratic form based on a kernel, so that Q_ijkl where (i,j) and (k,l) are pixel coordinates, is given by Q_ijkl = t((i-k)^2 + (j,l)^2) for some kernel function t.


Nice to see it (especially on other image data than just faces). But typically non-negative matrix decomposition creates less ghostly images, with an easier interpretation.

See e.g. http://www.quantumblah.org/?p=428 or http://scikit-learn.org/stable/auto_examples/decomposition/p... (comparison of PCA and non-negative matrix factorization).

Of course, the non-negative factorization comes at some cost (mostly: much higher computational complexity), but it may be worth trying.


"These aren’t that bad. I do kinda like them, but think they’d be nicer with some minor adjustments (slightly less form-fitting, slightly less loud pattern, slightly brighter color)."

Would it be possible to use the like/dislike system to make/suggest the minor changes to the dresses?


Is this dataset available anywhere? I see the code on github, but I don't see the data anywhere.


Really interesting, does anyone know more information on how the initial eigendresses were computed? Was there image processing to distill the dress into major components, hand tagging of variables (like short/long, color, etc), or just raw pixel data processed?


I'm thinking it's just raw pixel data, and the description of each eigendress's meaning was author's interpretation


The only manual classification used was separating dresses into "like" and "dislike" categories. The rest is just pixel data.


You might wanna do a search for "eigenfaces" (as opposed to "eigendresses" :P)

https://en.wikipedia.org/wiki/Eigenface


Reading that, I can't help but wonder how similar a machine-learned eigenface is to an infant-learned one. http://lawcomic.net/guide/?p=3273

I wonder if there is some way to simulate the effect of growing up among people of different races on eyewitness identification and test if it matches what this lawyer is saying at http://lawcomic.net/guide/?p=3282


As I understand it, eigenfaces require preprocessing to orient and align the images. That doesn't seem to have come up here. Maybe Amazon is really consistent about centering models in dress photos?


>> Was there image processing to distill the dress into major components, hand tagging of variables (like short/long, color, etc), or just raw pixel data processed?

Its pretty clear that this is PCA on raw pixels of the images, so no, there is no hand-tagging or anything else. The eigendresses are just the eigenvectors from PCA projected back into the original, image-scale, basis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: