Thank you, but this doesn't really answer OPs or my question. Is NVLink required if you want to run an LLM model which exceeds the memory of a single GPU? What are the benchmark comparisons with and without it?
I've heard that NVLink helps with training, but not so much with inferencing.