I've scoured the web page for ram requirements for the various models but I can't see anything, will it be able to run let's say the 30B open assistant llama or 65B raw llama model on a consumer gpu (let's say 3060 with 12gb vram) using this?
Not trying to take anything away, but the readme etc is very lacking in actual technical details I feel without reading through the code or actually testing it.