The problem is that GPUs don't support virtual memory paging, so they can't read files nor decompress nor swap anything unless you write it yourself, which is a lot slower.
Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!