The models (currently) fit in 24gb vram sequentially with small enough batch sizes, so a local server with consumer grade gpus wouldn't be impossible.