I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.
edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.