undefined | Better HN

0 pointsmilancurcic5y ago0 comments

Yes, they run in the cloud, see e.g. https://cloudrun.co (disclaimer: my side-business), but others have done it as well, for a few years now. On dedicated, shared-memory nodes, it's no different from HPC performance-wise. It can be even better because cloud instances tend to have later generation CPUs, whereas large HPC systems are typically updated every ~5 years or so. But for distributed-memory parallel runs (multi-nodes), latency increases considerably on commodity clouds which kills parallel scaling for models. Fortunately, major providers (AWS, GCP, Azure) have recently started offering low-latency interconnects for some of their VMs, so this problem will soon go away as well.

0 comments

1 comments · 1 top-level

gnufx5y ago

Indeed, basically, though you may lose from lack of direct access to the hardware. But it's typically expensive. Do AWS and GCP actually have RDMA fabrics now? The AWS "low latency" one of a year or so ago had a similar latency to what I got with 1GbE at one time.

j / k navigate · click thread line to collapse