Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Evidence of the context-freeness of Natural Language (1985) [pdf] (eecs.harvard.edu)
37 points by Tomte on April 23, 2017 | hide | past | favorite | 8 comments


A well known example of a natural language that is not context free is the African language Bambara, spoken in Mali.

In Bambara the way you say "whichever <something>" is "<something> o <something>". For example, the Bambara word for dog is "wulu". So to say "whichever dog" you would say "wulu o wulu".

Another construction in Brambara is "<noun> + <verb> + la". That means someone who does <verb> to <noun>. The Bambara word for "search" is "nyini", so if you wanted to say "someone who searches for dogs" it would be "wulunyinila". The word for watch is "filè". So someone who watches dogs would be "wulufilèla". There can be more than one <verb> in this construct. For example "wulunyinifilèla" would be someone who watches dog searchers".

This can be carried on indefinitely: "wulynyininyinifilènyinila" would be someone who searches for watchers of searchers for dog searchers (if I counted right...).

This can be combined with the "whichever" construct, for example "wulynyininyinifilènyinila o wulynyininyinifilènyinila" would be whichever searcher for watchers of searchers for dog searchers.

This was first discussed in this paper [1], which is cited by the submitted paper.

[1] http://babel.ucsc.edu/~hank/mrg.readings/Culy1985_Bambara.pd...


>> "wulynyininyinifilènyinila o wulynyininyinifilènyinila" would be whichever searcher for watchers of searchers for dog searchers.

This is very unwieldy and the translation is also hard to parse (even for the person who generated it!) (you weren't sure if you counted right :) and I'd wonder how weird the Bambara string sounds to native speakers.

That's not to argue against the context-sensitivity of natural language but it's worth pointing out that even the context-freeness of natural language sometimes presupposes the ability to keep an unlimited number of tokens in perfect symphony despite the fact that most people would probably not have the attention span to keep track of the grammaticality of a phrase of more than a few dozen tokens uninterrupted by punctuation signs if it was spoken rather than written down to make it easier to cross-reference, as indeed the former part of this sentence here suggests (I myself had to check what I was writing a couple of times). On the other hand, even such overly-long phrases don't sound ungrammatical (unless they are), they just sound a bit silly, so semantically "bad" somehow, rather than syntactically.

The submitted paper is well aware of this and touches upon it in several places, e.g. where they state in their claim (4) that "An ordinary number of Vs can occur in a subordinate clause of this type" and, in parentheses, add the disclaimer "subject, of course, to performance restrictions".

The paper in fact closes with an observation that it may well be that the practicalities of parsing context-sensitive languages essentially make them possible to learn and parse as if they were context-free (or, I'd say, even regular, or finite). Although learning a context-free language itself is not exactly easy (it's impossible to learn from examples, basically).


Title should be "Evidence Against the Context-freeness of Natural Language".


Thanks, I've been looking all over for this paper. I knew the result, but couldn't find the reference.

Now, could we please correct the title? It's evidence against etc. [cref yorwba, https://news.ycombinator.com/item?id=14177944]


A recent result in the same vein is that also Romanian is weakly non-context-free: http://nslonge.scripts.mit.edu/site/wp-content/uploads/2015/...

Personally, I find Shieber's refutation of section 4.4 "Clauses are Bounded in Size" unsatisfactory. I have yet to see cross-serial clauses with more than two constituents outside such papers.


What happened to Stuart Schrieber after he was a student at Harvard ?



[chuckle] To quote from that link:

My name

It's spelled S-H-I-E-B-E-R, and pronounced shee-ber. No "c". Not "ei". Not shy-ber. Not "Schreiber", the name of my near-namesake at Harvard.

It is frequently misspelled, because it is close to being consistent with German orthography, the "ie" being pronounced as a long "e", though the initial consonant would then be spelled "sch". Nonetheless, there is no "c" in the name. Why? Because it is not a German name.

The name is an American invention, applied to my ancestors upon immigration to the US from the city Szczebrzeszyn, currently a part of Poland, and mapped below. During the immigration process, the surname was shortened from the city name, probably from its Yiddish pronunciation shebreshin. Had my ancestors not left Szczebrzeszyn, my existence would have been unlikely.

A grave sin indeed, committed by the OP :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: