Whenever teachers realize they're being evaluated based on some metric, they almost immediately begin gaming that metric and with that it ceases to be a good metric. I had teachers give a lot of "group tests", walk out of the classroom during tests, etc.
Standardized testing aims to correct this, but I think it's now considered by many teachers to be a joke. Even when exams are administered by independent proctors, teachers can still game the metric by "teaching the test" instead of imparting meaningful understanding of the material.
I think this is a subject that will continue to defy rigorous objective scientific analysis.