It's way too hard to build evidence for something like that. Virtually all research into PL effectiveness ends up not account for the absurd number of variables.
One metric might be how long it takes a developer to build it from scratch given all the esoteric builds in C/C++ land. Another might be lines per cve in a given language. Another still might be time to fix cves per language.
I'm thinking rust would come out on top of those, but would be interesting to measure!
It's unlikely that this evidence exists, and if the 'clone everything to get past the borrow checker' attitude is commonplace then we get stuck on what 'high quality' means.