Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Since I see some misunderstanding about deep learning, let me explain the fundamental idea: It's about reusing intermediate work.

The intuition is let's say I told you to write a complicated computer program. Let's say I told you that you could use routines and subroutines, but you couldn't use subsubroutines, or deeper levels of abstraction. In this restricted case, you could write any computer program, but you would have to use a lot of code-copying. With arbitrary levels of abstraction, you could do code reuse much more elegantly, and your code would be more compact.

Here is a more formal description: If you have a complicated non-linear function, you can describe it similarly to a circuit. If you restrict the depth of the circuit, you can in principle represent any function, but you need a really wide (exponentially wide) circuit. This can lead to overfitting. (Occam's Razor) By comparison, with a deep circuit, you can represent arbitrary functions compactly.

Standard SVMs and random forests can be shown, mathematically, to have a limited number of layers (circuit depth).

It turns out that expressing deep models using neural networks is quite convenient.

I gave an introduction to deep learning in 2009 that describes these intuitions: http://vimeo.com/7977427



If you restrict the depth of the circuit, you can in principle represent any function, but you need a really wide (exponentially wide) circuit.

Are you sure it's exponential ?

If you look at binary functions (ie. boolean circuits) any such function can be represented by a single layer function whose size is linear in the number of gates of the original function (I think it's 3 or 4 variables per gate) by converting to conjunctive normal form.

Of course it's not obvious that a similar scaling exists for non-binary functions but I'd be a bit surprised if increasing depth led to an exponential gain in representational efficiency.


I am not sure in the sense of: If I were dropped on a desert island, I could derive a water-tight proof of this result from scratch.

I am confident, though, based upon my reading of secondary sources written by people that I trust.

From one of Bengio's works (http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf): "More interestingly, there are functions computable with a polynomial-size logic gates circuit of depth k that require exponential size when restricted to depth k − 1 (Hastad, 1986)."


I think my argument was mistaken. The CNF form I was thinking of involves adding unknown variables so it doesn't actually allow you to compute the function in one step.


Computing a sum modulo 2 (cumulative xor) of n boolean inputs requires an exponential number of elements if you only have or, not and and gates to work with. (regardless of circuit depth, actually).


Reusing intermediate work? I don't think this is a good intuition. Using several levels of abstraction is more like it.

For example, in face recognition, first level - could be pixels. Second level - edges and corners: http://www.cs.nyu.edu/~yann/research/deep/images/ff1.gif Third - parts of the face: http://people.cs.umass.edu/~elm/images/face_feature.jpg




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: