On the human ratings, three different 7B LLMs (Two different Openchat models and a Mistral fine tune) beat a version of GPT-3.5.
(The top 9 chatbots are GPT and Claude versions. Tenth place is a 70B model. While it's great that there's so much interest in 7B models, and it's incredible that people are pushing them so far, I selfishly wish more effort would go into 13B models... since those are the biggest that my macbook can run.)