It tells the history of computing by following J.C.R Licklider. As one of the directors of ARPA, he was responsible for funding research labs to work on computer research. He had a major impact on which projects got funded, and in-turn which systems are now being used 60 years later. I honestly love this book so much. If you love computers and history, its a must-read.
I just started using kSQL at work to build a data pipeline from Postgres to Elasticsearch. This will be replacing our error-prone batch processes. So far it has been a really nice experience, however I don't have it running in production yet.
Little bit concerned with how we will update the schema of the streams in production. At some point we will need to drop a stream and recreate it with a different schema, and I'm afraid we will miss some incoming data during the downtime. Confluent just released a migration tool, but I'm pretty sure this only works for adding/dropping columns and not changing the datatype of a column.
There's a whole class of interesting problems related to query evolution - and it varies greatly depending on the "environment" that you're interested in (see mjdrogalis' docs on updating a running query). Generally, the strategy that ksqlDB takes at the moment is to validate what upgrades are possible to do in-place and which are not - for the former, ksqlDB "just does it" and for the latter, we are designing a mechanism to deploy topologies side-by-side and then atomically cut over when the new topology is caught up to the old one.
There's an in-progress blog post that describes exactly this class of problems - keep an eye out for it!
I'm just going to throw out another problem since you guys are here :)
The current batch process joins some tables before sending data to elasticsearch. This means that the debezium connector doesn't write all the data I need into kafka. I was thinking I could create a materialized table in ksql with infinite retention for the other postgres tables I need to join on. Then when I stream in an update for the data I want in elasticsearch, I can join on these tables.
The issue is that a Stream-Table join only gets triggered when the stream changes. This means that when the data in the tables change we will not see these updates in elasticsearch.
I guess my only option is to join everything in our app and then produce the full message to the elastic sink topic?
I have been using CDK at work and it is fantastic. It feels like React for Infrastructure. I used to write raw Cloudformation at my last internship, and being able to write Typescript instead feels like moving from assembly language to C.
Agreed. I only found out about the Dream Machine from Hacker News (I think it was Alan Kay who recommended it) and it was absolutely the best book ever written on the topic.
Rogan and Fridman both ask very shallow questions. I want to see him get grilled by someone familiar with the subject matter. Anything less gets questions about the lowest common denominator.
I find him lacking intellectual curiosity, although I've only had limited exposure to him. Do you have something you'd recommend for a "deeper look" into him or his work?
At my school there is a 2 unit class that you must take along with the intro DS&A course that teaches bash, vim, git, testing, etc. It was definitely the class that helped me the most in my internship and also made me a Vim user.
In my experience, most CS majors at UCSD that I've met complain about the course. On the other hand, DSC doesn't have an equivalent, which I find disappointing but understandable. Maybe it'll get one in the future.
It tells the history of computing by following J.C.R Licklider. As one of the directors of ARPA, he was responsible for funding research labs to work on computer research. He had a major impact on which projects got funded, and in-turn which systems are now being used 60 years later. I honestly love this book so much. If you love computers and history, its a must-read.