I do not quite get this. How does this enable someone to run ray or metaflow on ...

vtuulos · on Oct 4, 2023

Metaflow integrates with AWS Batch which many folks use for serious HPC. Internode scheduling happens through the multinode scheduling supported by AWS Batch. networking via EFA etc.

We'll blog more about this soon but you can certainly give it a try today! https://github.com/outerbounds/metaflow-ray

linksnapzz · on Oct 3, 2023

I think it said that data access is via Lustre, and communication is by Nvidia MLNX NCCL, which seems to be some kind of nvidia gpu-specific MPI type library; it would seem to be doing RDMA from GPU to GPU via fabric interconnects, so far as I can tell...