of course we can run any model if quantize it enough. but I think the OP was talking about the unquantized version.
You can do it via `-ot ".ffn_.*_exps.=CPU"`