Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just started using kSQL at work to build a data pipeline from Postgres to Elasticsearch. This will be replacing our error-prone batch processes. So far it has been a really nice experience, however I don't have it running in production yet.

Little bit concerned with how we will update the schema of the streams in production. At some point we will need to drop a stream and recreate it with a different schema, and I'm afraid we will miss some incoming data during the downtime. Confluent just released a migration tool, but I'm pretty sure this only works for adding/dropping columns and not changing the datatype of a column.



Hey, ksqlDB contributor from Confluent here. In addition to the migration tooling that you mentioned (https://www.confluent.io/blog/easily-manage-database-migrati...), ksqlDB also supports in-place schema evolution (https://docs.ksqldb.io/en/latest/how-to-guides/update-a-runn...). There are some constraints on what kinds of evolution are supported, but it's something we're constantly chipping away at.


There's a whole class of interesting problems related to query evolution - and it varies greatly depending on the "environment" that you're interested in (see mjdrogalis' docs on updating a running query). Generally, the strategy that ksqlDB takes at the moment is to validate what upgrades are possible to do in-place and which are not - for the former, ksqlDB "just does it" and for the latter, we are designing a mechanism to deploy topologies side-by-side and then atomically cut over when the new topology is caught up to the old one.

There's an in-progress blog post that describes exactly this class of problems - keep an eye out for it!


Nice I'll check it out!

I'm just going to throw out another problem since you guys are here :)

The current batch process joins some tables before sending data to elasticsearch. This means that the debezium connector doesn't write all the data I need into kafka. I was thinking I could create a materialized table in ksql with infinite retention for the other postgres tables I need to join on. Then when I stream in an update for the data I want in elasticsearch, I can join on these tables.

The issue is that a Stream-Table join only gets triggered when the stream changes. This means that when the data in the tables change we will not see these updates in elasticsearch.

I guess my only option is to join everything in our app and then produce the full message to the elastic sink topic?


Is it an option to model it as a table/table join? t/t triggers when there's a change on either side of the expression.

If that doesn't work, feel free to swing by our community Slack room and we can get into the weeds. :)


Oh cool need to think about this a bit more. I’ll see you in the slack channel :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: