Maybe I didn't make myself clear enough. There was a time when a software engine...

sgt101 · on Jan 4, 2020

I spent at least five years trying to use statistical ml and mlps to do NLP on social media comments from about 2003. Nothing like a transformer (or an rnn even) occurred to me.

I have a belief that someone in the USSR worked out a way of doing fluid dynamics that has enabled the Russians to develop hypersonics and super cavitation. This is probably rather straightforward - in the style of NS - of you know the principles. No one in the West ( or China) knows those principles, so Western torpedos and reentry vehicles are rather poor Vs Russian ones. Once you grasp how something works the fact that it's rather easy to apply in comparison to the process of getting the insight shouldn't detract from the value of the insight .

roenxi · on Jan 4, 2020

The wiki page on RNN says the early groundwork was done in 1986 and the LSTM was a 1997 innovation. If they didn't occur to you in 2003 that doesn't imply much, they are not suprising concepts.

The surprise was that in the mid-2000s suddenly GPU became so powerful that LSTM could be used to achieve interesting results. The story here isn't the models, it is the computers running the models.

sedachv · on Jan 4, 2020

> There was a time when a software engineer didn't get basic stuff. A time when languages like C were developed where there wasn't an associative data structure baked in for example. It wasn't because associative data structures are a secret tech that requires great insight to uncover. It is because the field was new and people hadn't cottoned on to how basic and important having access to hash-maps is. Times moved on. Now basically all modern languages have hash-maps as a basic data type.

That is a really weird historical fantasy. If you pull out your copy of volume 3 of Knuth and look at chapter 6, it is obvious that associative data structures were some of the first ones to be developed in the field.

The reason why hash tables became so popular is the explosion in main memory size starting in the late 1990s. The trade-offs between the possible associative data structures became less important for a lot of applications, especially when you consider how much needed to be done on secondary storage and specifically on tapes in the 1960s through the 1980s.