Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Norvig has strong arguments, but this in bad faith:

"Chomsky has a philosophy based on the idea that we should focus on the deep whys and that mere explanations of reality don't matter. In this, Chomsky is in complete agreement with O'Reilly." (O'Really stands for mythmaking, religion or philosophy)

Chomsky is no mysticist - he is an old fashioned scientist looking for a parsimonious theory. Maybe there is no simple theory for language with great explanatory power but there should be some people looking for it.



> Norvig has strong arguments

What was one of those arguments? I didn't see any.


"We all saw the limitations of the old tools, and the benefits of the new"

Probabilistic models work incredibly well, much better than transformational-generative grammars.


> Probabilistic models work incredibly well, much better than transformational-generative grammars.

You've missed everything Chomsky said even though it is repeated in the article: Probabilistic models can be useful tools but they tell you nothing about the human language faculty (i.e. they are not science).


This kind of top-down approach misses the real hero - it's not the model, it's the data. 500GB of text can transform a randomly initialised neural net into a chatting, problem solving AI. And language turns babies into modern, functional adults. It doesn't matter how the model is implemented, they all learn more or less, the real hero is text. Let's talk about it more.

It would have been interesting if Chomsky's approach could have predicted at what size of text we see the emergence of AI that passes the Turing test. Or even if it predicted that there is an emergent process in there.


I'm not well-informed on the subject, but I seem to remember that Chomsky's point was exactly on the data: his hypothesis about the human language faculty being innate (a "universal grammar", or "linguistic endowment" as he's been calling it more recently) was about the so-called "poverty of the stimulus". Meaning that human infants learn human languages while being exposed to pitiably insufficient amounts of data.

Again, to my recollection, he based this on Mark E. Gold's result about language identification in the limit, which, simplifying, is that any language more complex than a finite language (in the Chomsky hierarchy of languages) is learnable only from an infinite number of positive examples, and languages more complex than regular languages also need an infinite number of negative examples. And those are labelled examples- labelled by an oracle. Since human language is usually considered to be at least context-free, and since infants are not exposed to infinite numbers of examples of their maternal languages, there must be some other element that allows them to learn such a language, and Chomsky called that a "universal grammar" etc.

Still from memory, Chomsky's proposition also took account of data that showed that human parents do not give negative examples of language to their children, they only correct by giving positive examples (e.g. a parent would correct a child's grammar by saying something along the lines of "we don't say 'eated', we say 'eaten'"; so they would label a grammar rule learned by the child as incorrect -the rule that produced 'eated'- but they wouldn't give further negative examples of the same, or other rules, that produced similarly wrong instances, only a positive example of the correct rule. That's my interpretation anyway).

Again all this is from memory, and probably half-digested. Wikipedia has an article on Gold's famous result:

https://en.wikipedia.org/wiki/Language_identification_in_the...

Incidentally, Gold's result, derived in the context of the field of Inductive Inference, a sort of precursor to modern machine learning, caused a revolution in machine learning itself. The very negative result caused Leslie Valiant to develop his PAC-Learning setting, that basically loosens the strong requirements for precision of Gold's identification in the limit, and so justified the focus of modern machine learning research to approximate, and efficient, learning. But that's another story.


A child needs about 100KB of "text" a day to learn a language. If anything the data requirements of LLMs are proof positive they can't bear any relation to the human language faculty.


I mean we could be far overfeeding our data models too. Of course data models don't take years in real time to train either.


> I mean we could be far overfeeding our data models too.

As far as I know evidence we have suggests the opposite, improvements are still mostly coming from more parameters and more data.

> data models don't take years in real time to train either.

Given the incommensurate architectures a fairer calculation of learning rate might be Wh - a human brain needs about 500Wh a day, GPT-3 was suspected to take about 1GWh to train.


No, again it's missing the point: None of this explains how the human language faculty works.


>> Chomsky is no mysticist - he is an old fashioned scientist looking for a parsimonious theory.

When did that become old fashioned?


(mostly a joke)

In the good old days a scientist was a hard thinker and published each year a paper and it was hard to gather data. (Einstein thinking about general relativity publishing after how man years?)

Nowadays a scientist scraps data from the web and hacks p-values.

In the future LLMs will feed each other their generated papers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: