Many people find Mistral 7B to be excellent, around gpt-3.5 level of good.
Mistral 7B normally requires like 20gb VRAM, but with llama.cpp and quantization, you could even run it on your phone (albeit bad quality).
Quantization >= q4_K_M seem to provide nearly as good responses as the unquantized model, and q4_K_M only needs ~7GB of VRAM.
See the table here:
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGU...
Using ollama you can get up and running even a bit faster than with llama.cpp directly (ollama uses llama.cpp under the hood).