I remember going on one of "big data" conferences back in 2015, when it was the buzzword of the day.
The talks were all concentrated around topics like: ingesting and writing the data as quickly as possible, sharding data for the benefit of ingesting data, and centralizing IoT data from around the whole world.
Back then I had questions which were shrugged off — back in the day it seemed to me — as extremely naïve, as if they signified that I was not the "in" crowd somehow for asking them. The questions were:
1) Doesn't optimizing highly for key-value access mean you need that you need to anticipate, predict, and implement ALL of the future access patterns? What if you need to change your queries a year in? The most concrete answer I got was that of course a good architect needs to know and design for all possible ways the data will be queried! I was amazed at either the level of prowess of the said architects — such predictive powers that I couldn't ever dream of attaining! — or the level of self-delusion, as the cynic in me put it.
2) How can it be faster if you keep shoving intermediate processing elements into your pipeline? It's not like you just mindlessly keep adding queues upon queues. That had never been answered. The processing speeds of high-speed pipelines may be impressive, but if some stupid awk over CSV can do it just as quickly on commodity hardware, something _must_ be wrong.
The talks were all concentrated around topics like: ingesting and writing the data as quickly as possible, sharding data for the benefit of ingesting data, and centralizing IoT data from around the whole world.
Back then I had questions which were shrugged off — back in the day it seemed to me — as extremely naïve, as if they signified that I was not the "in" crowd somehow for asking them. The questions were:
1) Doesn't optimizing highly for key-value access mean you need that you need to anticipate, predict, and implement ALL of the future access patterns? What if you need to change your queries a year in? The most concrete answer I got was that of course a good architect needs to know and design for all possible ways the data will be queried! I was amazed at either the level of prowess of the said architects — such predictive powers that I couldn't ever dream of attaining! — or the level of self-delusion, as the cynic in me put it.
2) How can it be faster if you keep shoving intermediate processing elements into your pipeline? It's not like you just mindlessly keep adding queues upon queues. That had never been answered. The processing speeds of high-speed pipelines may be impressive, but if some stupid awk over CSV can do it just as quickly on commodity hardware, something _must_ be wrong.