There are some 7B weight models that look competitive with GPT4 on benchmarks, because they were trained on the benchmark data. Presumably Google would know better than to train on the benchmark data, but you never know. The benchmarks also fail to capture things such as Bard refusing to tell you how to kill a process on Linux because it's unethical.