> I've been trying to get `--device cuda` to work on my Windows machine and it's saying that torch wasn't compiled with CUDA.
I struggled with the same. Here's what worked for me:
Use pip to uninstall pytorch first, should be "pip uninstall torch" or similar.
Find the CUDA version you got installed[1]. Go to PyTorch get started page[2] and use their guide/wizard to generate the pip string, and run that. I had to change pip3 to pip FWIW, and with Cuda 11.6 installed I ended up with "pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116".
After that I could use --device cuda, and the difference was immense. On my 2080Ti it went from roughly an hour for a minute with large model, to 10-20 seconds.
[1]: https://stackoverflow.com/a/55717476
[2]: https://pytorch.org/get-started/locally/