I haven't finished reading TFA yet, but so far it's really good, because it sounds like Chomsky is actually getting to the point.... which he sometimes does, and sometimes absolutely doesn't (or so it often seems to me).
If you're interested in this area, which you might call the philosophy of artificial mind, or philosophy of cognitive science, I strongly suggest reading the link to Norvig's article (linked in TFA, but here it is again: http://norvig.com/chomsky.html ). In particular, I'd suggest reading it before reading the actual interview with Chomsky (maybe ideally reading it after the prefatory part of TFA).
My own inclination is that on the face of it, I find Norvig's approach less satisfying, as Chomsky appears to, but upon much consideration my current belief is that Chomsky's approach is too mystical and too just-so, and Norvig's approach at least has the merit of bearing fruit... fruit that one day might be concentrated into a concise and elegant theory.
You seem to agree with Norvig that doing massive data analysis on language will come to a scientific understanding, which would be a first in science.
Chomsky doesn't. If anything, Chomsky is grounded in reality, and Norvig and AI researchers are grounded in hope that this way of mapping out something will create meaningful understanding of the system.
This reminds me a bit of what scientists in other fields refer to as "empirical equations", which are equations fit from data without any particular theoretical backing or reason to believe that their components are a good model of reality. They're useful in that they may predict observations well, especially over a specific range of observables, but they don't necessarily give us an understanding of what's going on. An example is the historical Prony equation for hydraulic pressure loss (http://en.wikipedia.org/wiki/Prony_equation), which doesn't actually correctly model what happens in fluid flow, but does happen to empirically produce fairly good results over a range of values, partly through the use of two magic numbers fit from data.
Another example might be the winning entry in the Netflix competition. If your goal is to predict film preferences (which is Netflix's goal, of course), it appears to give pretty good estimates. But I don't think even its authors would claim that it's a a scientific model of how humans form preferences.
In both cases the underlying problem is that there are fairly general functional forms, such as a few terms of a Taylor series, or a forest of decision trees, that can empirically model almost anything to a certain degree of accuracy, given enough data, even if the underlying process looks nothing like them. Therefore they can give accurate predictions that work in practice, without being accurate models of what's happening in the underlying system. Chomsky appears to be of the opinion that statistical NLP systems are more of that variety, so may be good engineering solutions without being good scientific models.
Exactly. Well said, and clearly articulates the point here.
What I don't get is, what Chomsky is saying is all together the standard, and yet people are insulting him for what is all together a very simple idea you expressed plainly.
Yes, NLP systems are going to have many engineering uses, and Chomsky agrees. Are they going to help in the true scientific understanding of the systems?
It's unlikely. It's likely to be "good engineering solutions without being good scientific models" as you elegantly put it.
Yes, NLP systems are going to have many engineering uses, and Chomsky agrees. Are they going to help in the true scientific understanding of the systems?
So, what we have is (1) the engineering / statistical modelling / machine learning approach, and (2) the deep theoretical "Chomsky approach".
Chomsky despises, maybe rightly so, the engineering approach because it only provides tools that work, approximately, in practice but don't provide any deep "scientific" understanding.
The deep theoretical approach has a vision of a comprehensive theory that really provides understanding. Once we manage to find the deep fundamental theory, practical applications will be a child's play.
But here's the catch: Has the Chomsky approach takes us any closer to that deep theoretical progress? Why has all the practical progress come from the engineers? What if the deep theoretical thinkers are completely lost, like the alchemists in dark medieval times, in their theories and as they despise the data-driven approach, they also refuse to let empirical observations guide them to a right direction.
I don't think looking down on the engineers and their modest practical success is any kind of merit, if your only merit is dreaming of a deep theory, but making no measurable success towards said theory.
> Why has all the practical progress come from the engineers?
I see what you are saying and actually it is a good point. Where are the robots built on Chomsky's theory? A very valid question. I don't know the answer to it, Chomsky doesn't either. But I think what you mean by practical progress isn't what he mean progress. That is his point.
You have to see where he is coming from. He is an academic his ultimate goal is to understand how things work. Training a set of neurons with input data and ending up perhaps with millions activation weights in the end is not helping that goal even if this new machine can play chess, make coffee and drive you to work. I think that is his take on it.
I say we need both. There is no reason to not strive for both. There is not reason to turn all radical and start burning books and claim one approach should completely replace the other. I hope we one day find (or find that we can't find) a good explanatory model for meaning, language, learning, personality, or conscience, but in the meantime I enjoy playing chess with my computer, and I hope pretty soon I'll have my car drive me to work by itself.
Your point is also good. "Shallow engineering" will not give us deep theoretical understanding, and we need also deep theoretical understanding. But in our need for deep theory, we should not accept just any theory. The theory should be testable, and it should eventually yield some practical applications. Theoretical thinking can get quite lost if it's not guided by at least some connection to empirical data.
Early L. Ron Hubbard presented a theory (Dianetics) on the causes of mental illness. It's a theory alright, just not a very good one, and not very testable. We would still benefit from a better theoretical understanding of human mental health and illness, but we should not take just anybody, just because he is a deep thinker and has a theory.
Why do people keeping saying that probabilistic models do not provide understanding? We flew to Mars on a few simple principles applied to massive amounts of data. Comets aren't following sophisticated orbital plans, just F=ma.
Maybe (probably) all the sophistication of natural language is an emergent property of the pile on of lots of little similarly-shaped details like atoms. Sure, high level rules are nice approximations that satisfy our human craving for patterns, but that doesn't mean those patterns are how the brain really works, quite the opposite in fact.
High level models are illustrations, useful for game programmers and artists to efficiently create simulations and plausible imaginary creations. Low level models are how things actually work.
I think I agree with you. Chomsky is astoundingly intelligent and (I think) I understand what he is getting at (i.e. develop a theory of how something happens and use it to find evidence that supports or refutes that theory, rather than using statistical methods to develop a predictive model that doesn't teach you anything about the underlying system), but you can't but feel a sense of "yeah, but how?"
You can hardly blame the AI researchers for sticking with methods that have been very successful (at least practically speaking) after they had their funding cut out from under them in the 90's for the perceived "failures of AI".
I do like the idea of developing a theory (e.g. vision is processed via algorithm in the brain represented by X) and then attempting to find evidence of that. It can help to avoid the reductionist idea that you need to model everything down to the cellular level in order to understand the brain. It's like trying to disassemble code in memory and then read the algorithm rather than examining every 1 and 0 in memory and trying to make heads or tails of them.
I am not well acquainted with Noam Chomsky and his work, but the dichotomy you refer to I'd like to call "design vs evolution".
If the extreme "design" part the spectrum is what Chomsky talks about: "develop a theory of how something happens and use it to find evidence that supports or refutes that theory", then the extreme end of the "evolution" part of the spectrum must be this:
Choose a set of rules, use genetic programming techniques to rearrange those rules. Evidence you require will be acquired and utilised by the algorithm itself. The only crux is the set of rules to choose from and implementation of the technique. IMO, the more complicated the rules and the more they interact with each other, the better.
...only then can you get results like this: "...Five individual logic cells were functionally disconnected from the rest– with no pathways that would allow them to influence the output– yet when the researcher disabled any one of them the chip lost its ability to discriminate the tones."
See also the story of the MIT magic switch that toggled between Magic and More Magic, required to make a certain machine function, with no wires behind the switch.
If you're interested in this area, which you might call the philosophy of artificial mind, or philosophy of cognitive science, I strongly suggest reading the link to Norvig's article (linked in TFA, but here it is again: http://norvig.com/chomsky.html ). In particular, I'd suggest reading it before reading the actual interview with Chomsky (maybe ideally reading it after the prefatory part of TFA).
My own inclination is that on the face of it, I find Norvig's approach less satisfying, as Chomsky appears to, but upon much consideration my current belief is that Chomsky's approach is too mystical and too just-so, and Norvig's approach at least has the merit of bearing fruit... fruit that one day might be concentrated into a concise and elegant theory.