What was the breakthrough?

wookietrader · on Nov 24, 2012

The breakthrough was the insight that while you cannot train a deep neural net at once with backprop, you can train one layer after the other greedily with an unsupervised objective and later fine tune it with standard backprop.

Years later, Swiss researchers (Dan Ciresan et al) found that you can train neural nets with backprop, but you need lots of training time and lots of data. You can only achieve this by making use GPUs, otherwise it would take months.

iskander · on Nov 24, 2012

You can't train fully connected deep models with backprop, or at least not easily or well. An alternative solution to this problem is spatial weight pooling (Yann's convolutional networks) which play well with SGD.

mark_l_watson · on Nov 24, 2012

That is correct. The problem is that the gradients get smaller and smaller as you back propagate back towards the input layer. So learning on the front part of the net is slow. Hinton has a lot of good material about htis in his Coursera lectures.

wookietrader · on Nov 24, 2012

Yes you can.

Check out the publications by Ciresan on MNIST, have a look at Hinton's dropout paper or at the Kaggle competition that used deep nets. Or try it yourself and spend a descent amount of time on hyper parameter tuning. :)

iskander · on Nov 25, 2012

Which of Ciresan's projects are you referring to? Everything I've seen by him uses convolutional layers of some sort.

rm999 · on Nov 24, 2012

The first time I saw a paper on feasible deep networks was at NIPS 2006, specifically this paper: http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf

It's been awhile since I read the paper, but as I recall it involved training an unsupervised model layer-by-layer (training a layer, freezing the weights, then training another layer on top of it).

mark_l_watson · on Nov 24, 2012

http://www.socher.org/index.php/DeepLearningTutorial/DeepLea... is also a good reference. I wrote a short blog post this morning on the same subject http://blog.markwatson.com/2012/11/deep-learning.html

fiatmoney · on Nov 24, 2012

Contrastive Divergence.

The deep learning / RBM tutorial here is quite good and explains the technique.

http://deeplearning.net/tutorial/rbm.html#rbm