Now, you focus in on everyone who got a C and you find that everyone who got a C estimated themselves as a B student. From this you conclude that low performers overestimate their ability.
Then you look at the A students and find that they all also thought they were B students. You conclude that high performers underestimate their ability.
But this is just a statistical artifact! It'a called regression to the mean and this study does not account for it. If you isolate low-performers out of a larger group you will pretty much always find that they expected they would do better (which they were right to expect). You are just doing statistics wrong!
The study says "for low performers, the less calibrated their self-estimates were the more confident they were in their accuracy". By "calibrated" the authors mean that the actual and predicted scores were the same. In other words, the C and D students were very confident that they got A and Bs.
The authors go on to explain:
"In other words, [for low performers] the higher the discrepancy between estimated score and actual scores, the greater participants’ confidence that their estimated scores were close to their actual scores... As expected, high performers showed the opposite pattern. High levels of miscalibration predicted a decreased in SOJ [second-order judgment]..."
Suppose everyone in the class was a B student and knew it. After taking the class, most got Bs but a few got A and a few got Cs and Ds.
Focusing exclusively on the D students (low performers), we find that they all expected to get a B. For these low performing students, the more miscalibrated they were the more confident they were. This makes sense because they expected to get a B and didn't expect to get a C or D.
Now let's take a look at A students. It makes sense that the more miscalibrated they are, the less confident they are because they all expected to get a B.
"Common factor: participants’ knowledge and skills about the task performed."
I understand the corporate use case. Justifying impact of low performers and quantifying the potential results.
Still, this kind of research feels tautological. It'd be surprising if anyone actually wondered if adding more low performers helped anything.
Even in tasks that require no skill, adding a person who isn't performing means they won't perform well.
Most research in Dunning-Kruger related experiments makes a glaring assumption that results on a test are evenly distributed enough to divide those results into quartiles of equal numbers and the resulting population groups are both evenly sized and evenly distributed within a margin of error.
That is fine for some experiment, but what happens in the real world when those assumptions no longer hold? For example what happens when there is a large sample size and 80% of the tested population fails the evaluation criteria? The resulting quartiles are three different levels of failure and 1 segment of acceptable performance. There is no way to account for the negative correlation demonstrated by high performers and the performance difference between the three failing quartiles is largely irrelevant.
Fortunately, software leadership is already aware of this problem and has happily solved it by simply redefining the tasks required to do work and employing heavy use of external abstractions. In other words simply rewrite the given Dunning-Kruger evaluation criteria until enough people pass. The problem there is that it entirely ignores the conclusions of Dunning-Kruger. If almost everybody can now pass the test then suddenly the population is majority over-confident.
What makes you so sure? In general, most security certifications HR gets excited about aren't worth the paper they are printed on.
Process people by their very nature are an unsustainable part of a poisoned business model.
The other misconception is a group of persistent well-funded knuckle-dragging troglodytes are somehow less likely to discover something Einstein overlooked.
Not only evenly distributed; isn't the very first underlying assumption they make, so fundamental that they never even mention it, that the tests are more accurate than the self-evaluation? Sure, over time and across a population they probably are, but that's not (as I understood it) what they measured here.
Haven't we all been there sometimes -- took a test on something we actually know pretty well, but got questions on the one sub-area we know less about (or just had a bad day), so we got a worse test result than what actually reflects our knowledge? Or the other way, took a test on something we don't know as well as we should, but lucked out with the questions hitting exactly what little we know (or got in some lucky guesses), so the test result is better than we actually deserve? I sure have.
That's another source of uncertainty, and directly relevant to what they're trying to investigate, so it feels like a big minus that they just totally ignore it.
I also think these self-assessment vs actual performance studies don’t control for post-assessment cognitive stress. Stress almost always impairs judgment, and I wonder if asking for a self-assessment on the day of the exam and sometime after the exam would show a difference. If stress is a factor for self-assessment, then both high and low performers will score themselves more accurately given more time after a test.
Looking at the study design of this paper, I am not sure how the authors themselves would assess its strength for the kind of broad claim they’re making…And we’ve already seen many studies on this type of claim, so I am confused why the authors didn’t ask the “next step” type of question as I mentioned above.