> People use the big data frameworks as glorified distributed-job management tools
Do you have any tools you like for job management without all the distributed-systems baggage?
I've heard folks advocate for Make for this kind of thing, perhaps that or some other orchestration tool that deals with job dependency graphs would be the unix way? (Having a nice way to visualize failed step would of course be a plus; a common use-case is "re-run the intermediate pipeline, and everything downstream".)
There's a bunch, at various levels of abstraction and slightly different primary use cases: Luigi, Dask, Airflow, Celery, Dagster, Prefect, Metaflow, Snakemake, Nextflow, etc
Do you have any tools you like for job management without all the distributed-systems baggage?
I've heard folks advocate for Make for this kind of thing, perhaps that or some other orchestration tool that deals with job dependency graphs would be the unix way? (Having a nice way to visualize failed step would of course be a plus; a common use-case is "re-run the intermediate pipeline, and everything downstream".)