Google (and by extension Jeff Dean) have made their greatest contribution in that they successfully popularized niche computer science with decent implementations.
I think a point that is often lost is that Google invents very little computer science; they take obscure computer science that in most cases has already been implemented somewhere and do a very good, high visibility implementation of that computer science. They may apply it to a somewhat different problem domain but the solution already existed for the taking. Core high-scale computer science domains like HPC traditionally don't publish shiny geek-friendly know-how but there are very deep existing pools of expertise out there.
Google has excelled at branding concepts like MapReduce, Spanner, etc that already existed in real systems for many years. Google has made people aware of these technologies but it is a stretch to say they "invented" them in any material sense beyond publicizing their own implementations. I think they receive a little too much credit in many cases for invention when their primary contribution has been popularization.
What's odd to me is the focus on specific individuals. For one thing, why not Dean/Ghemawat facts, at least? I always hear they participate on an equal basis; or does Dean carry Ghemawat like Sherlock carries Watson?
Furthermore, when I actually read things about such universally-held geniuses, the hype often contrasts sharply with the moderate, logical introspections of the geniuses themselves. (Not always, like I vaguely recall of Nietzsche; but most often in my personal experience.) Quite often, the hype even patronizingly dismisses the person's own claims as mere humility. (As with Hume's coverage of Newton in his history of England.) And it often beats up on people who quite reasonably had issues with the person (as I hear with Socrates' spouse Xanthippe).
When one hears breathless accounts of someone's superpowers, a logical first reaction is, "Cool, how can I attain that?" I wish it were otherwise, but the hype is always misleading for this end.
As far as I can tell, it's storytelling. The hype fits in a certain dramatic constraint, like with profit-seeking movies which naturally can't portray too pedestrian a reality. In Dean's case, one of my suspicions is that US "nerd" audiences are more able to vicariously fantasize themselves as a "Jeff Dean" than a "Sanjay Ghemawat". Which, if true, would say something about the lack of imagination and homogeneity of such fantasizers.
(And maybe these people serve as mainstream characters everyone's supposed to be in awe of, but it just wouldn't fit within current society if everyone could follow their interests. Who'll serve you at the store and mop office floors?)
Anyway, sorry for the long random wondering about silly topics; clearly I'm procrastinating.
Largely because it started as an April Fools joke (by Kenton Varda of Protobuf fame, IIRC, who had been working on protobufs under the mentoring of Dean & Ghemawat at the time), and the joke would fall flat if it were Dean/Ghemawat facts. It just doesn't have the same ring to it.
Then it was leaked to the press, and the press wants heroes, because that's who ordinary readers can identify with. There are plenty of publications (usually in scholar journals) where Dean & Ghemawat get equal billing. They aren't Wired, and they don't usually get to the top of Hacker News (or actually, sometimes they do).
The work Jeff Dean seems to be doing is _excellent software engineering_. 'Software Engineering' here I would define as 'the application of computer science'. Which is not a far fetched definition, although many dismiss it when the topic is raised.
He does that very well, applying computer science in an engineering perspective and creating very good/performant/useful products. There's a lot of credit in doing that well.
I think they receive a little too much credit in many cases for invention when their primary contribution has been popularization.
That's being a little pedantic. Applying X for Y is definitely being inventive. That's how largely every domain works. We stand on the shoulders of giants etc.
The weakness of this contribution is often cited as the reason many types of software patents should be invalid. Putting a shiny wrapper on an old idea in an obvious but slightly different context is something upon which many software hackers would normally cast a jaundiced eye.
My point was largely that e.g. MapReduce had literally been used in the HPC world for quite some time in countless variations. Something similar occurred with Spanner where it was a minor variant implementation on a concept that existed in many systems decades old. I worked on systems that used both of these "inventions" in the 1990s; no one thought it was novel then either. When I think of "invention" in computer science, I assume some hard problem in computer science was solved. That is not the case for most of what Google is credited with in the realm of computer science.
The absence of meaningful novelty in Google's systems strikes me as a good basis for being skeptical of claims of invention. Relative obscurity of computer science does not confer invention claims on the first to popularize it.
People frequently attribute more computer science invention to me than is warranted (I do a lot of high-end algorithm work). I always try to make it clear that much of what they see as "invention" is their first exposure to obscure computer science that I happen to be familiar with and know how to exploit. Much of what is attributed to Google is similar in character; people conflate the source of their introduction with the inventor of the concept.
Can you give a citation for the MapReduce and Spanner predecessors?
I've seen similar papers that predate MapReduce (I'm curious if you will find the ones I have). But I've also noticed that most people who think MapReduce is old hat have never used it and/or misunderstand what it actually is.
And if it was already so well known, there wouldn't be hundreds of CS researchers continuing to publish both theoretical and systems of papers about it.
What paper should they be citing instead? It's hard to believe that 7995 sets of authors would ignore prior research, especially since some of those authors are a decade or two older than the MapReduce authors.
As a general point, a lot of computer science is poorly documented or difficult to find in the literature but well-known among the practitioners that use it. A great example is the recent flurry of activity (including patent activity) around "hyperdimensional hashing" and similar concepts. The irony is that the concepts are so old that some of the original patents have expired but the work is sufficiently forgotten that academics are attempting to re-patent those ideas. I am familiar with it because those algorithms are de rigueur for some types of massively parallel systems (I've been designing parallel systems for ages). No one publishes papers on it though because it is not new to those people. A large part of it is that much of the original literature is lost in paper form and cannot be found electronically. I've been burned by this personally, having done work that I later discovered had been previously done in the 1980s by a respected academic but which is not cited in any modern literature; from the standpoint of modern literature, the work is completely forgotten. But it had been done and rare people exist that are familiar with it.
So more specifically (I got back from vacation a couple hours ago, forgive the terseness):
Spanner is an old, old idea in literature and a number of bespoke systems were built on the concept. Ancient transaction processing literature mentions it, I've worked with a few crusty mainframe environments that used variants of it. It was considered a stupid idea for decades because the hardware required for a reasonable implementation was outrageously expensive for most applications. Atomic clocks and satellite-delivered time codes don't cost what they used to. I used to have a transaction processing book from the 1980s that described a couple specific implementations that were quite close to Spanner (written by a couple guys with Indian surnames, wish I remember the book). It had been written off so long ago for inexpensive commodity systems that apparently it took Google to rediscover it but it never fell out of use.
Similarly, I think a lot of people in the HPC world were surprised to discover that MapReduce was patentable. Not every problem in HPC looks like computational fluid dynamics (CFD), which is often the stereotype. If you are actually an academic then you should know better than anyone that the number of cites is misleading. The MapReduce model (though not under that name) was a standard design pattern for some non-CFD problems in HPC in every respect going back to at least the 1990s. The specific novelty claimed by MapReduce has never been clear to me. A lot of the (terrible) compilers for obscure HPC platforms back in the day transformed the code into a canonical MapReduce model. Maybe I am missing what makes Google's version unique?
If Spanner is a "variant" than it introduced something new, an improvement. You sure are jumping through hoops trying to diminish things. Most of computer science was invented in the 50, 60s and 70s when the low hanging fruit was easy to get, now a lot of it is incremental gains. Just like the astounding pace of advances in mathematics in the 1700s/1800s/1900s can't be bested in the 00s, progress is slower as the easy stuff has been mapped out already.
Besides which, there's a difference between inventing say, Newtonian Physics, and building an interplanetary ship for human space flight. It is in the application of much of computer science to extremely large systems where most of the work is being done, and frankly, experienced gathered building the world's largest database and most trafficked site gives you insight into new ways to engineer things that you don't necessarily figure out from basic application of CS principles. I could probably design a basic bridge or house given what I learned of civil engineering, but chances are I could never design the Burj Dubai with that information.
My point is, the way Google applied things like MapReduce was based on their experience of solving web services problems. HPC hyper-computing typically looks at problems of simulation, like QED/QCD, nuclear explosions, climate, finance, oil geology, et al. The difference focus leads to different pattern of application, and there is new value created in the papers published by both sides. Google's datacenter configurations don't look anything like say, BlueGene.
And when you look at what Google is doing with machine learning, few organizations have the resources and data to pull it off. Sure, lots of people have run the algorithms before on small sets, but few actually have daily data from a billion people flowing into them.
I don't think you've got the right understanding of why people are against sw patents. The argument is that applying an idea to a slightly different domain already provides enough value that it doesn't need to be encouraged by patent protection. I don't hear many people saying it has little value. The number of companies supporting themselves completely on the proceeds of selling key-value store software and services trivially shows that there is a lot of economic value in non-innovative implementations of ideas.
I think you misunderstand what I'm saying. I have lots of issues with Google, but dismissing what they've done because it is insufficiently new is just silly.
The commentary was terse (hiking in the mountains, limited IP access) but accurate. I have designed interesting algorithms and software on large-scale database and HPC systems for much of my adult life. As a consequence, I am much more familiar with these domains than most casual programmers or academics in these fields.
The important point was that ideas like MapReduce and Spanner are literally rehashes of very old ideas that never fell out of use in some domains. But they are so old that no one that uses those ideas would dream of publishing a paper on it because it has been in continuous use for ages. In most cases, a deep enough dive in the dead tree literature will often turn up examples of these in real implementations. I learned most of it from graybeards working on cool stuff when I was young, not from the literature.
I am not exactly young but both MapReduce and Spanner predate my time in computer science. BigTable was actually a novel composition of existing but obscure bits of computer science so I give Google full credit for that.
Google has shown an exceptional ability to exploit ignored or forgotten computer science. However, to the extent they exploit it there is often little invention beyond those that they borrow the ideas from. In that respect, I think the credit is somewhat misplaced with respect to invention.
Exactly. They're like Madonna, or more recently Lady Gaga, they create nothing, but they take themes from the underground and bring them to a mainstream audience (and in doing so become far richer than any of the creators).
The work that Jeff Dean and colleagues at Google are doing is pretty mind blowing [1]. Sometimes, I feel that you need Google's scale of problems to be forced to push technology that way.
The joke at my company is that everytime Google publishes about one of their technology on a paper, they already have a next-gen version of it running internally.
That's not a joke - it's true... The joke part of it is in: there are two types of systems at Google, the ones that are deprecated, and the ones that are not yet ready for production.
That's not a joke either. Inside of Google you can ask anyone whether a particular API is the version that's deprecated or not ready yet, and they will always know which one it is.
<Troy McClure>Hi, I'm Kenton Varda. You may remember me as the creator of Cap'n Proto and the LAN-party-optimized house.
Back in 2007, I created "Jeff Dean Facts" as a Google-internal April Fool's joke. I wasn't funny enough to write any of the jokes myself, but I created the app that let people submit "facts", and was blown away by the results.
If you like primary sources, check out my write-up from when I first talked publicly about it in January 2012:
I never found out if Yegge was talking about Jeff Dean here, or maybe Urs Holze:
At first it's entirely non-obvious who's responsible for Google's culture of engineering discipline: the design docs, audited code reviews, early design reviews, readability reviews, resisting introduction of new languages, unit testing and code coverage, profiling and performance testing, etc. You know. The whole gamut of processes and tools that quality engineering organizations use to ensure that code is open, readable, documented, and generally non-shoddy work.
But if you keep an eye on the emails that go out to Google's engineering staff, over time a pattern emerges: there's one superheroic dude who's keeping us all in line.
And why couldn't it be that way? Those people exist when companies start, you probably met a couple of them.
I don't know who Yegge meant, but my read on that passage (as a Googler) is that he was talking about Craig Silverstein. Several of those items are definitely attributable to Craig.
Agreed. Craig never exerted much technical influence (beyond the early years, I assume), but was always a strong cultural force in the areas of engineering quality, code reviews, readability, consistency, etc.
What! I thought Chuck Norris was the Chuck Norris of the Internet. Did Chuck Norris Facts existed before they were on the Internet? I think not. At most Chuck Norris is the Jeff Dean of the Internet.
The original Chuck Norris jokes were sarcastic. No one thought he was much of a bad-ass, but 90% of the people who stumble across Internet memes are too stupid to understand them.
That being said, Chuck Norris is a hell of a nice guy and I wish him the best, even if he isn't a bad-ass.
What has he done that you don't like? I'm not going to read through a dozen or so articles and try to guess which one you have a problem with. Going from the titles I see a lot of stuff about cancer charities, and a few articles about religion in public schools.
If its the latter, he lost that battle, just like all the other Christians.
I'm just going off of his personality. He makes a lot of USO appearances, and my best friend actually got to hang out with him and his family a few times as a teenager when Chuck rented out the bed and breakfast he worked at.
He seems pretty down to earth and nice, that doesn't mean that he doesn't hold ideas that I disagree with.
We've reached a point in history where homophobia is at roughly the same point as racism was in the 50s and 60s. Plenty of people still espouse those hateful views...but, it's long past time for society at large to stop accepting that kind of thing as merely a "difference of opinion". Hatred of black folks is not viewed as a "difference of opinion"...it is viewed as what it is: Hatred and bigotry.
If Dean has a superhuman power, then, it’s not the ability to do things perfectly in an instant. It’s the power to prioritize and optimize and deal in orders of magnitude. Put another way, it’s the power to recognize an opportunity to do something pretty well in far less time than it would take to do it perfectly.
"But if his fake accomplishments are hard to understand without a real computer-science background..."
I got no formal education, i still understood most of the puns.
There is a very big difference between having no formal education and having a background in something. Especially in computer science where so many people are self taught.
I think a point that is often lost is that Google invents very little computer science; they take obscure computer science that in most cases has already been implemented somewhere and do a very good, high visibility implementation of that computer science. They may apply it to a somewhat different problem domain but the solution already existed for the taking. Core high-scale computer science domains like HPC traditionally don't publish shiny geek-friendly know-how but there are very deep existing pools of expertise out there.
Google has excelled at branding concepts like MapReduce, Spanner, etc that already existed in real systems for many years. Google has made people aware of these technologies but it is a stretch to say they "invented" them in any material sense beyond publicizing their own implementations. I think they receive a little too much credit in many cases for invention when their primary contribution has been popularization.