Also wanted to point out that in general there is no issue with looking at y - x ~ x, this is called the residual plot, and is specifically used to compare an estimate of some value vs. the value itself.
That being said, the author seems very confident in their conclusion, and from the comments seems to have read a lot of related analyses, so I might be missing something. ¯\_(ツ)_/¯
DK doesn't mean no correlation, it means inverse correlation. It's the correct analysis at the bottom that shows what no correlation actually looks like (at least no correlation in tend, there is heteroskedasticity).
> a world in which people are very bad at estimating their own skill, therefore, statistically, people with lower skills tend to overestimate their skills, and experts tend to underestimate it.
Be careful here, the conclusion you drew doesn't actually follow.
> y - x ~ x, this is called the residual plot
You're giving x and y meaning that they don't have. In the article these are uncorrelated random variables - the plot of y-x ~ x will always look that way. That's however not the case if you're plotting y_hat - y ~ y_hat for a y_hat taken out of a model. That won't be a random variable in your setup.
Edit: note on heteroskedasticity
No, it means that people's self-assessment, their prediction of what their test scores will be, is uncorrelated (or more precisely weakly correlated--that's what the original D-K data showed) with their actual test scores. Which is not what we would expect: we would expect that their predictions of their test scores would be strongly (or at least more strongly) correlated with their actual test scores. The question the D-K effect raises is why that is not the case, and it's a valid question--one which this article does not even attempt to answer.
Both are wrong. DK effect describes a weak positive correlation, but weaker than we intuitively expect. The top quartile still estimates their ability better than the bottom - positive correlation. But this is still below their actual ability. The bottom quartile correctly predicts they score worse than the top, but they overestimate their own ability ie they underestimate how much better the top quartile actually performs.
Not sure I follow, inverse correlation between what? The analysis at the bottom (assuming you mean fig. 11) is too dense to show if there's a correlation or not (between skill and self-assessment bias), and looking at the relevant figure from the paper itself gives me the impression that there is a correlation.
>> a world in which people are very bad at estimating their own skill, therefore, statistically, people with lower skills tend to overestimate their skills, and experts tend to underestimate it.
> Be careful here, the conclusion you drew doesn't actually follow.
Can you elaborate? If X and Y are two independent random variables, X representing skill and Y representing self-assessment of skill, X and Y will be negatively correlated - this is exactly what the first part of the article is about, although from my perspective it's the author who's drawing the wrong conclusion.
>> y - x ~ x, this is called the residual plot
> You're giving x and y meaning that they don't have. In the article these are uncorrelated random variables - the plot of y-x ~ x will always look that way. That's however not the case if you're plotting y_hat - y ~ y_hat for a y_hat taken out of a model. That won't be a random variable in your setup.
Not following again. Other than calling x y_hat, and having y_hat be your own estimate vs. x be the subjects' estimate, what is the distinction? What do you mean by "the plot of y-x ~ x will always look that way" - what way? The shape of the plot will necessarily depend on the relationship between x and y.
How does that not follow? It's just regression to the mean.
Kidding. Well, half-kidding, I did kind of find the tone a bit biting and dismissive, especially towards one of the commenters that were pointing out exactly what you did.
It’s an interesting question to ask whether ask whether the uniformly random data “really” exhibits DK or not, and whether that’s interesting. A world where people have 0 ability to assess their own skill and resort to making uniformly random guesses at it is kind of interesting, and of course in such a world more skilled people would end up on average underestimating themselves and vice versa.
But I think the author’s right that obviously nothing psychological is happening here. There’s the psychological effect of no one being able to assess themselves, but the fact that unskilled people overestimate themselves in this world has nothing to do with the fact that they are unskilled.
If the results from DK were similar to the random data results, I'd agree. But the DK results do show some correlation between skill and self-assessment ability.
DK effect is not that low skill people are overconfident and high skill people are underconfident. It is specifically that low skill people are more overconfident than high skill people are underconfident. i.e. if someone's estimated skill is true_skill+bias+noise, then bias_lowskill > -bias_highskill.
This is very clear in the original DK paper, they specifically focus on the supposed metacognitive deficiencies of low-skill people.
The article argues that the graphs supposedly demonstrating this fact, can also be generated from a model that does not have this difference, i.e. where bias_lowskill == bias_highskill.
EDIT: My characterization of the article is not correct, see here[1] for a visualization of the point I'm trying to make.
[1] http://emilkirkegaard.dk/understanding_statistics/?app=Dunni...
But as I understand it, it doesn't: In the graph generated using random data, the lines intersect in the middle (bias_lowskill == bias_highskill), whereas in DK's paper they intersect in the upper right (so bias_lowskill != bias_highskill).
I'll admit that part of it also comes from personal experience, at work and elsewhere. I've met some catastrophically incompetent people were completely oblivious to their own incompetence, and this has very often felt like that the more incompetent they were, the more likely they were to be try to do stuff that was waaaay out of their comfort zone, which would make even experienced, competent people tread carefully.
But even ignoring personal experiences, I'm not convinced by the arguments either. I understand what they are saying, but I don't see how this disproves the DK effect.
Even if everyone is equally bad at estimating their own skill, so that their estimate is essentially a completely random variable, then we would expect the self-assessment score average to be around 50. If I understand it correctly, this is essentially what figure 9 is demonstrating.
But that figure still says that worse performers are then likely to overestimate their own ability, just as much as it says that better performers are bad at it.
If we look at the original DK figure and contrast it with figure 9 with random data, then I think one way of interpreting the differences is that, yes, worse performers are indeed bad at self-assessment, but they're just kind of bad at it as if their self-assessment is a completely random variable. It then seems to keep being essentially random but as people's skills improve, the distance between their score and their self-assessment becomes a bit tighter.. so in conclusion: most people are pretty bad at self-assessment, but skilled people are a bit less so.
The end result is still that people in the bottom quartiles are going to over-estimate their own ability.
I don't know, maybe this is way out in the weeds. Please school me.
To correct correlate the two variables (assessment and actual-score) you need to correlate the actual data, not measures of its caracteristics (average being one of them).
The actual data is shown. Even by eye it’s possible to see no strong (or significant) correlation exists.
If people are all pretty-much crap at estimating their own skill, then you'd expect all estimates to be roughly their actual skill, plus-or-minus some random error-margin with some kind of probabilistic distribution (Gaussian?).
If that were the case, then high-skilled people would be more likely to underestimate their skill (because their actual skill is greater than the mean). That (I think) is an example of "reversion to the mean".
If that reasoning is right, then the DK claim may be true, but it says nothing about the comparative estimating propensities of high-skilled and low-skilled people. It just says that low-skilled people tend to overestimate their skill, and vice-versa. But that's exactly what you'd expect, amirite?
> most people are pretty bad at self-assessment, but skilled people are a bit less so
This is pretty much the conclusion of the article, except that it isn't the tautologic DK or figure 9 that shows it, but Nuhfer's figure 11.
This is like comparing x - y to x, and if you do this, you will get a correlation no matter what.
This is a statistical claim, supported by the DK graph (but not the random data thought experiment from the article).
> It has nothing to do with unskilled people thinking that they know everything
This reads to me as a claim about the psychological reason for the statistical pattern, which I don't think is either supported nor contradicted by data, both in the article and in the original paper.
This article seemed very unconvincing -- and this part noted above, early on in the article set the tone that I felt like the author didn't know what they were doing. And even after reading it all, I felt like the standard lay use of DK remained valid.
This just felt like the type of thing I would have thought about as an undergrad, started to write it, and then realized it didn't make sense halfway through it. Or maybe I just missed something as well...
They are, but honestly all that can be concluded safely IMHO is that the original D-K graph doesn't support that the widely discussed "effect" which their conclusion describes exists. Therefore unless there is more evidence from some other subsequent study there may not be any evidence for it at all, and if that's the case then there's potentially no proof it exists.
However, even if you prove that their data is not evidence, that doesn't actually say anything about whether the effect exists or not, just that the D-K paper isn't evidence of such an effect.
I don't think that's enough for the author to conclude that "DK is autocorrelation". A more careful conclusion would be that "the DK data do not support DK's conclusion"... but of course that's much less likely to attract click throughs.
That's not really the intended interpretation of "null" in "null hypothesis". "Null" does not mean "contrary to the effect you're testing". "Null" means "do not assume dependencies anywhere" and so your description is backwards.
As I've written elsewhere this all comes down to the prior, based on my life experience the prior "people are generally capable of self-assessing their performance" is much more likely than "people have absolutely no ability to self-assess their performance."
To state it differently, when you assume no dependencies anywhere, you've already jumped to a conclusion that is more far-fetched to me than the results of DK. Do you really think people have zero ability to self-assess their own performance? In all domains and all contexts, as this is your "null hypothesis"?
Dunning Kruger effect: lower performers overestimate their ability, and higher performers underestimate their ability.
How did they find that? They asked participants to take a test and then had them do a self-assessment. Both were standardized from 0-100. They rated a participant's self-assessment accuracy by "self-assessment minus test score."
What's wrong with that method? You can't arrogantly self-assess as though you got a 130, and you can't humbly say that you got -50. Because of the standardization, you're bound by 0 and 100. This method makes it almost impossible for higher performers to overestimate their ability and for lower performers to underestimate.
What they actually found was that higher performers tend to be better at self-assessment. Lower performers are less accurate, but in both directions (not just overconfident).
https://digitalcommons.usf.edu/numeracy/vol9/iss1/art4/ https://digitalcommons.usf.edu/numeracy/vol10/iss1/art4/
The point is, that the original DK paper is bullshit. At least, this plot is. And people tend to miss it, until they start to carefully read the labels and think about the caveats. In fact, as presented here it looks like it shouldn't even be accepted as a valid study, this is outright deceptive, maliciously so. If there is assumed to be a correlation between x & y, how about we start by plotting x against y then? I know, it may be messy. It almost certainly will be. Because of that, I personally won't even be offended (but some people might) by you removing the outliers and producing the unnaturally clean version of the plot in the end to highlight the main idea. Then some statistical tests to make the results quantified. But here we see nothing, it really is just comparing x to x.
IMO, this is pretty much the invariant of most of the problems of academic research in the last God-knows-how-many decades (maybe always was, I don't know). Computer science papers without the code. Data science papers without the data. Yeah-yeah, I've heard hundreds of excuses why researchers do it like that. But it's pointless, such "research" shouldn't be accepted by anybody. Either you make your findings actually public by providing everything to replicate every single step of your study (which is supposed to be the point), or you just don't publish anything and keep the research proprietary (I mean, obviously it's never black and white, there always will be concerns about test-subject anonymity, etc. — but it's ridiculous to discuss that when the accepted standard even in "proper" sciences are 20 pages of dense text which might never even get to the point of the study, i.e., actually showing the data to any extent.)
Neither do I. Basically what the article actually shows is that these two statements are equivalent:
(1) People with low test scores tend to overpredict their test scores, while people with high test scores tend to underpredict their test scores.
(2) People's predictions of their test scores are uncorrelated (or more precisely very weakly correlated [1]) with their actual test scores.
This is not a statement that the D-K effect is wrong. It's just restating what the D-K effect is in different words. All the talk about "autocorrelation" is just another way of saying that, if people's predictions of their test scores are only weakly correlated with their test scores, then people with low test scores will have to overpredict their test scores (because there's virtually no room to underpredict them--there's a minimum possible test score and their actual score is already close to it), and people with high test scores will have to underpredict them (because there's virtually no room to overpredict them--there's a maximum possible test score and their actual score is already close to it). But the real question is: why are x and y so weakly correlated? Why are people's predictions of their test scores so weakly correlated with their actual test scores? That is not what one would intuitively expect. That is the question the D-K effect raises, and the author not only doesn't answer it, he doesn't even see it.
Also, this statement in the description of the Nuhfer research doesn't make sense:
"What’s important here is that people’s ‘skill’ is measured independently from their test performance and self assessment."
Um, the test performance is the people's "skill". And in the original D-K research, it was "measured independently" from the people's self-assessment (their prediction of their test performance).
[1] Notice that in the "uncorrelated data" graph, Figure 10, the red line is basically horizontal. That's what you get when x and y are uncorrelated. But in the original D-K graph, Figure 2, the thick black line is not horizontal--it slopes upward. That's what you get when x and y are weakly correlated. If the author had put in a weak correlation between x and y in his own experiment, he would have gotten a graph that looked like Figure 2. But of course that still would do nothing to explain why x and y are so weakly correlated, which is the actual question.
The line for actual ability is basically x=y. If you scored 10%, you're in the bottom quartile. If you scored 100%, you're in the top quartile. That line isn't really data, just something for comparison. The perceived ability line is the one that utilizes the data. It seems to show that once you average out what everyone rated themselves, it ends up kind of in the middle between ~55-70%. So the people who scored 10% assumed, on average, they would score around 55%. The people who scored 100% assumed, on average, that they would score about 75%. That makes the average expected score much higher than the actual score on the low end and somewhat lower than the actual score on the high end.
I'd interpret this as the bottom quartile thinks they're average and the top quartile thinks they're a bit above average. So basically everyone thinks they're average-ish, but the people who did worst on the test were the most wrong about that. But then again, I can't remember what the questions on the test were even about and just the single graph isn't terribly useful to argue over because it's missing all of the context of the paper.
Now that I've sat and interpreted the graph using my own set of notions about what the numbers mean and what the graph actually shows, I feel like this ought to be used as one of those life lessons about how quotes and diagrams outside of their context within a paper are the epitome of the phrase "lies, damn lies, and statistics." Statistics aren't always lies, but they're incredibly easy to bend to your own biases and assumptions.
But the article presents the results from just such a new experiment. In this one they used university education level (sophomore through to professor) and measured skill self-assessment level within those groups. Higher education level (a good proxy for skill on the test used, which was about science literacy) was found to be associated with more accurate skill self-assessment but the bias of lower-skilled people overestimating their skills was not observed.
That's just one study of course, but it sounds like a much better designed one than the original and does constitute actual evidence that the Dunning-Kruger effect doesn't exist.
I don't understand this part. "Completely uncorrelated data" is usually taken to represent the null hypothesis, but that's not the case here. In the DK paper, the implicit null hypothesis is "people of all skill levels are good at estimating their performance". In this case the "completely uncorrelated data" matches an alternative hypothesis, "people's skills have nothing to do with their ability to estimate their performance in tasks testing that skill". This hypothesis doesn't outright contradict the DK proposed hypothesis (and is certainly not the DK null hypothesis), so getting similar results is unsurprising to me, and I'm not sure that we learn from it anything about the DK results.
As for the other study cited, the figure shown in the article doesn't give a lot of information on density, and looking at the paper itself, figure 4 does actually seem to show that self-assessment gradually shifts left with increasing level of education.
(Edited for accuracy).
I had the same impulse that the analysis did not disprove DK, but after sitting with it for overlong I agree with the analysis.
I think there are two competing DK effect definitions that are being conflated, one descriptive and one explanatory:
1. DK shows that less skilled people overestimate their ability, and highly skilled people underestimate it
2. DK shows that people's estimation of their ability is causally determined by their actual ability
I believe you are claiming, correctly, that the article does not disprove the first definition that explains the observation, but I think the article is trying to disprove the second definition that explains why it occurs.
In other words: Yes, there is an observable Dunning-Kruger effect in the sense that we're bad at self evaluation. Is that effect attributable to one's actual level of competence? The evidence for that appears to be a statistical artifact, and further experiments seem to disprove that conjecture.
I'm not a statistician or a psychologist.
If you adjust the experiment design to avoid introducing the auto-correlation you get data that doesn't show the DK effect at all. Some might take issue with the adjusted experiment as using seniority related categories like "sophomore" and "junior" as skill levels has its own issues. To show the DK effect is real you need to come up with a better adjusted experiment that avoids the autocorrelation while still generating data that generates the effect. It's unclear if that's possible.
No DK doesn't say no bluster, no proclamations or no artificial assertions of expertise. It doesn't even say that the overestimates are just as prevalent among experts as laypeople. All it says is as near as we can tell the effect size of the overestimation is the statistical autocorrelation and our best efforts to produce the same effect without relying on the autocorrelation have failed.
I think there are a lot of ways to accept the anecdotes you mentioned occur that need much weaker assertions than DK as a psychological phenomenon and would hesitate to jump to DK based on that information.
This all gets really murky quickly in practice because of what "low" and "high" competence means, and what constitutes the actual scope of expertise with reference to a particular scenario.
>The V-tail design gained a reputation as the "forked-tail doctor killer",[16] due to crashes by overconfident wealthy amateur pilots,[17] fatal accidents, and inflight breakups.[18] "Doctor killer" has sometimes been used to describe the conventional-tailed version, as well.
It is, though. This article says that if people are bad at estimating their skill (towards randomness / the midpoint), then bad people will overestimate, and good people will underestimate.
The psychological part is that people indeed will assume they're closer to the mean than they actually are. DK effect would not be seen if people correctly estimated their skill, nor if experts overestimated, nor if the incompetent underestimated.
>If you adjust the experiment design to avoid introducing the auto-correlation you get data that doesn't show the DK effect at all.
This is even in the article! Yet some people are making claims against this despite references to the contrary.
For the Dunning-Kruger effect to have psychological significance, you must quantify this, and show that they overestimate their abilities more resp. less than expected.
Regardless of how you set your expectation/null hypothesis, the absence of the effect would mean that the lowest scoring quartile would on average estimate their abilities to lie at 50% or below. It is found however that people in the lowest scoring quartile position themselves in the third or fourth quartile.
I'm not saying that this is necessarily deep or unexpected, just that the article only shows that the qualitative statement is true regardless of any psychological factors, and not that the Dunning-Kruger effect doesn't exist
The point of DK is that when you don’t know shit, any non-degenerate self assessment will result in overestimating your ability. In short, “there are more natural numbers above smaller natural numbers than bigger ones”. This doesn’t have to do with psychology, and it’s expected that it appears when evaluating random data. That’s a good thing! It means DK exists even when us pesky humans aren’t involved at all, not that DK doesn’t exist at all.
I think there's both a component of numbers and psychology here. If the dispersion in perceived score caused by inaccuracy is wide enough to touch the bounds, it will force a trend towards the mean. This effect is possibly exacerbated by a tendency of perception to stray from "extremes", so subjects with a score near the edges will trend to the mean more strongly as they are unlikely to rate themselves the very best or very worst.
In a simplified experiment where we give people a 3 question quiz, those who got 2 questions right have one overestimation option, 3, and two underestimation options, 0 and 1. So it's very easy to adjust for autocorrelation by checking if a large group of 2-scorers underestimate more than twice as often as they overestimate. Then we see how their tendencies compare against 1-scorers and how they deviate from naturally overestimating more than twice as often as underestimating.
I haven't reviewed these types of papers, but if nobody made even that basic adjustment in their analysis, how many others have been missed in experiments like this?
The stasticial phenomenon exists, surely - but I think it will be very confusing for everyone to re-use the same name.
I get what you are trying to say, but this isn’t true… every natural number has the same amount of numbers greater and smaller… an infinite number.
(Now rate your confidence in making that assertion.)
In the world of biology, you're observing the world around you. Same for physics, chemistry, et al. This means that you can set up proper controls to obscure your own presence from any potential results (e.g., isolate everything in another room, use cameras to avoid being near animals, etc.)
Psychology has the same nightmare as quantum physics: pre-existing thoughts and beliefs literally define what results you end up with.
I'm convinced that psych is a victim of the "new" way of doing science: treating the Scientific Method™ as a self-evident concept instead of regarding science as a vastly certain domain of metaphysics.
https://samzdat.com/2018/05/19/science-under-high-modernism/
It would be kind of ironic if psychologists were more susceptible to the "Pop-Baconian" simplification of science ?
And damning if what I've heard about psychology going through multiple paradigms during the 20th century alone is true.
But then, indeed, also understandable, due to the "softness" and "paradigmlessness" of the subject matter, as Kuhn had pointedout the later back then ?
It's still sad how we now have detailed theories and histories of science, but in practice scientists show no interest in trying to learn from them, nor the mistakes of their predecessors?
But then maybe that would have too high of a cost. (Consider all those successful projects where the founders later say : "we didn't knew what we were getting into / that this was considered impossible".)
Bonus : Wiseman & Schlitz’s attempts to do an adversarial super-controlled parapsychological experiment : ( IV. )
https://slatestarcodex.com/2014/04/28/the-control-group-is-o...
So, implicit in the standards for publishing new research in psychology is "We think there is much greater than a 5% chance that our entire field is wrong" which is not a great place to start from.
Given that fact, it logically follows that people who score low ability tests will more often than not have overestimated their ability (and the same on the other end of the spectrum).
You can frame this effect as autocorrelation if you wish or just as a logical consequence. But that's missing the point.
The point is: why on earth are humans so bad at estimating their own competence level as to make it practically indistinguishable from random guesses.
Or is it?
The trick lies in the fact that when asked to judge your competence you're given a range (e.g. 0-10) and both competent people and incompetent people have access to the whole range when taking a self-assessment. I.e. if less competent people were on average more aware of their incompetence they may be less likely to rate themselves 5 or 6, but yet the data shows that no matter what competence level you have on average you self-assess more or less the same.
This seems to imply that your incompetence indeed doesn't allow you to truly appreciate the full range of skills that are required to reach a higher level of competence.
In other words, the DK effect itself is the cause of the random distribution of the skill self-assessment (which in turn is the cause of the overestimation secondary effect)
But a comment above quoting the more recent paper presents a contradictory conclusion: that humans can self-assess with some accuracy. So now I’m confused again.
Suppose you make 1000 people take a test. Suppose all 1000 of these people are utterly incapable of evaluating themselves, so they just estimate their grade as a uniform random variable between 0-100, with an average of 50.
You plot the grades of each of the 4 quartiles and it shows a linear increase as expected. Let's say the bottom quartile had an average of 20, and the top had 80. But the average of estimated grades for each quartile is 50. Therefore, people who didn't do well ended up overestimating their score, while people who did well underestimated it.
In reality, nobody had any clue how to estimate their own success. Yet we see the Dunning-Kruger effect in the plot.
> In reality, nobody had any clue how to estimate their own success.
Wouldn't that mean unskilled people tend to overestimate their skill, and experts tend to underestimate it? Why is there a contradiction with DK's conclusions?
I think it's because the original paper speculates far beyond it:
> The authors suggest that this overestimation occurs, in part, because people who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it.
The argument about autocorrelation says this "dual burden" doesn't need to be there to observe the effect.
https://www.avaresearch.com/files/UnskilledAndUnawareOfIt.pd...
IMO that explains why Dunning Kruger seems intuitively correct even if the conclusion they drew isn't actually correct.
How is this helpful? You won't know whether someone is "overestimating" their ability until you learn both their estimated and actual performance, at which point you don't need to guess whether they're "likely" to have poor actual performance.
I think you're misreading the point of my comment.
Another way to show this would have been to keep the auto correlation plot, but compare it to the same plot with statistical noise. With infinite random data, the expected value for self-assessment would be 50% score, regardless of actual score - a flat line through the chart. It would then be significant to find a non-flat line, as DK did.
It’s not inconceivable that with a smaller sample, you’d get come biasing, where lesser skilled people would over estimate, and higher skilled people under estimate.
The follow up studies seem to suggest there’s not really a bias like that, but that there is a “honing” of the general ability to estimate your own outcome, which makes sense.
> Although there is no hint of a Dunning-Kruger effect, Figure 11 does show an interesting pattern. Moving from left to right, the spread in self-assessment error tends to decrease with more education. In other words, professors are generally better at assessing their ability than are freshmen. That makes sense. Notice, though, that this increasing accuracy is different than the Dunning-Kruger effect, which is about systemic bias in the average assessment. No such bias exists in Nuhfer’s data.
Critiques cite the work being critiqued (yes, the referenced critiques in TFA cite the Dunning-Kruger study). Also, a 23 year-old paper will inevitably get cited more than 6 year-old papers. But yeah...the inertia in Science is real. That conservatism's a feature, not a bug.
Psychology's probably the discipline with the shortest "half-life of knowledge. https://en.wikipedia.org/wiki/Half-life_of_knowledge
> An engineering degree went from having a half life of 35 years in ca. 1930 to about 10 years in 1960. A Delphi Poll showed that the half life of psychology as measured in 2016 ranged from 3.3 to 19 years depending on the specialty, with an average of a little over 7 years.
This is very interesting and makes me wonder what it is for tech careers, e.g. web devs, data scientists etc.
Kidding aside, it seems you could estimate it by asking, what portion of the knowledge I use did I learn 20 years ago? Then 10, 5, 1. For me it seems to be somewhere around ten years.
When you generate a random "actual" score near the top, the random "perceived" score has a higher chance of being below the "actual" the numerical below is larger than the one above, and vice-versa. E.g. a "test subject" with an actual score of 80% has a (uniform random) 20% chance of overestimating their ability and an 80% of underestimating it. For an actual score of 20%, they have an 80% chance of overestimating.
The least competent person cannot underestimate their relative competency. Any not exactly accurate estimate they do is an overestimate.
Correspondingly, the most competent person cannot overestimate their relative competency.
This leads to the perception of bias where there is none, except a trivial tautological one.
In the bottom row I added a Dunning-Kruger effect, at a test score of 0.7 the self assessment is perfect, below and above that the self assessment is off by 0.5 times the distance of the test score from 0.7. Otherwise the bottom charts are the same, no random variation on the left, ±0.1 in the middle and ±0.2 on the right. You can see that the edge effect is less important as the data points are steered away from the corners.
I will admit that the original Dunning-Kruger chart could or could not show a real effect, really depends on how they aggregated the data and how noisy self assessments are. But if you have a raw data set like the one I generated, you could easily determine if there is an effect. If one could find such a data set, I would like to have a look.
This article is absolutely dripping with condescension throughout and is really pushing a "gotcha" that doesn't exist. It then argues basic statistics, generates a DK-looking graph from random data, and then claims the phenomena doesn't exist. When in fact, as other people have commented, when people are bad at estimating their own ability (i.e. random), the DK effect still exists; it falls out of statistics.
Sigh, the author misunderstood the very definition of the DK effect:
> "The Dunning–Kruger effect is the cognitive bias whereby people with low ability at a task overestimate their ability. Some researchers also include in their definition the opposite effect for high performers: their tendency to underestimate their skills."
In all the examples, this holds, even if the assessment ability is totally random. Even if every quartile gives themself an average score, like the random data generated here. The author seems to think that it should be even more lopsided or something to demonstrate the effect. (I mean, honestly, what are they expecting, a line above 50th percentile? A line with negative slope? What?)
If there were no DK effect, the two lines would be the same.
Instead, if we go back and look at the original data, we see indeed, the two lines are not the same, the average for the bottom quantile is over 50%, there is some small increase in perceived ability associated with actual ability (and not the opposite).
The sin here isn't some autocorrelation gotcha, but rather, DK should have put error bars on the graph. If it was totally random, the error bars would be all over the place.
He also points out that the problem is that there's nothing below zero and nothing above 100. You can't have people who estimate beyond that. He uses another study and it turns out, the less knowledgeable you are about a skill, the worse you are at estimating your ability at all. In both directions.
If the lines were the absolute difference between perceived ability and actual ability, for no effect, the lines still shouldn't be the same. They should converge towards those who are knowledgeable. If anything, the difference line should be nearly a horizontal line. Because there should be greater variance in estimations at the lower end.
I fiddled with number of test questions, amounts of variation in question difficulty, various coefficients, etc.
In none of my experiments did I add a bias on the skill axis.
My conclusion is that the "slope < 1" part of the DK effect (from their original graph) is very easy to reproduce as an artifact of the methodology. I could reproduce the rough slope of the DK quartiles graph with a variety of reasonable assumptions. (One simple intuition is that there is noise in the system but people are forced to estimate their percentiles between 0 and 100, meaning that it's impossible for the actual lowest-skill person to underestimate their skill. There are probably other effects too.)
However, I didn't find an easy way using my simulation to reproduce the "intercept is high" part of the DK effect to the extent present in the DK graphs, i.e. where the lowest quartile's average self-estimated percentile is >55%. (*)
However, it strikes me that without a very careful explanation to the test subjects of exactly how their peer group was selected, it's easy to imagine everyone being wrong in the same direction.
(*) EDIT: I found a way to raise the intercept quite a lot simply by modeling that people with lower skill have higher variance (but no bias!) in their own skill estimation. This model is supported by another paper the article references.
However, if the under-performers consistently over-estimate more than the over-performers under-estimate there is still some merit to the effect, isn't there?
That is, the interesting number is the difference between integral of y-x on lower half vs the integral of y-x on the upper half. Does that make sense to anyone else?
I confess that I've never paid that much attention to the classic D-K graph, and that taking a close look at it, it is most assuredly crap. Now I want to know what the plots of the actual scores for those quartiles look like rather than %ile, or after-the-fact ranking. Yeah, it sure looks like people mostly figure they're in the 55-75 %ile ranking, if that's what that actually is, and that where in that spread they think they are correlates with their actual ranking.
Let's go down a Bayesian rabbit hole. Let's assume, as does the article, that people's self estimations are completely random rubbish: the worst people have nowhere to go but up, the best nowhere but down. Yup, completely agree.
Now let me ask a question: is self-estimation of any use in determining actual ability? The answer in this case is no: knowing one does not inform our ability to know the other in a Bayesian sense, they are not correlated.
D-K sounds valuable as a cautionary tale concerning excessive exuberance and a tendency not to learn well from experience, but aside from child-proof caps and Mr. Yuk stickers where we really want to apply the lesson is at the high-performing end of the scale and here we get into trouble immediately.
It is tempting to say "high-performers have nowhere to go but down" as though maybe we should reject those self-reporting the best performance. The classic chart hints at high performers underestimating their true performance, but it's a crappy chart; maybe they want it to be true.
But in the specific case where there is utterly no correlation and true performance is as evenly distributed as self-assessment, if we chop off the "top X self-reporting" we will chop off just as many poor performers as high performers. Yes, I hear you, and I agree, random is an edge case; I just don't believe that affects its prevalence.
Maybe it is true; alright dust off those priors and have at it.
And also, I think there is actually a tiny bit of DK going on.
And then, as you say, it gets amplified by the pseudo-literati.
It would require them to be _even worse that random_ for them to be worse at estimating their abilities, rather than simply being judged for being bad at the task. It is only human attribution bias that leads us to assume that people should already know whether they are good or bad at a task without needing to being told.
The study assumed that the results on the task are non-random, performance is objective, and that people should reasonably have been expected to have updated their uniform Bayesian priors before the study began.
If any of those are not true, we would still see the same correlation, but it wouldn't mean anything except that people shared a reasonable prior about their likely performance on the task.
People will nevertheless attribute "accurate" estimates to some kind of skill or ability, when the only thing that happened is that you lucked into scoring an average score. You could ask people how well they would do at predicting a coin flip and after the fact it would look like whoever guessed wrong over-estimated their "ability" and a person who guessed right under-estimated theirs, even though they were both exactly accurate.
This comment section clearly demonstrates the attribution bias that makes this myth appealing, though. And this blog post demonstrates how difficult it is to effectively explain the implications of Bayesian reasoning without using the concept.
They then judged people whose beliefs about grammar varied from the one book's beliefs about grammar as having over-estimated their performance. They took people out of one context, asked them how they would behave in a novel context, and everyone made an educated guess. The people who guessed correctly were judged to accurately know their own abilities, when actually they may just have gotten lucky.
Thus what Dunning-Kruger's paper actually says is that if you want people to know how you would like them to perform a task, you can't assume they will read your mind: you have to provide them with actual feedback on their performance.
> [I]f you carefully craft random data so that it does not contain a Dunning-Kruger effect, you will still find the effect. The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology[1].
> [1]: The Dunning-Kruger effect tells us nothing about the people it purports to measure. But it does tell us about the psychology of social scientists, who apparently struggle with statistics.
It seems to me that despite rudely criticizing a broad swath of academics for their lack of statistical prowess, the author here is himself guilty of a cardinal statistical sin: accepting the null hypothesis.
The fact that data resemble a random simulation in which no effect exists does not disprove the existence of such an effect. In traditional statistical language, we might say such an effect is not statistically significant, but that is different from saying that the effect is absolutely and completely the result of a statistical artifact.
The nuance of statistics is never-ending.
"There are white lies, damned lies and statistics"
Funny that all the major ML marvels are also built on statistical foundations - a tool used as much as abused.
“… Our data show that peoples' self-assessments of competence, in general, reflect a genuine competence that they can demonstrate. That finding contradicts the current consensus about the nature of self-assessment. Our results further confirm that experts are more proficient in self-assessing their abilities than novices and that women, in general, self-assess more accurately than men. The validity of interpretations of data depends strongly upon how carefully the researchers consider the numeracy that underlies graphical presentations and conclusions. Our results indicate that carefully measured self-assessments provide valid, measurable and valuable information about proficiency. …”
https://www.researchgate.net/publication/312107583_How_Rando...
I’m surprised this wasn’t flagged as something pretty silly.
On the other hand, regression to the mean rather than autocorrelation does explain how you could get a spurious Dunning-Kruger effect. Say that 100 people all have some true skill level, and all undergo an assessment. Each person's score will be equal to their true skill level plus some random noise based on how they were performing that day or how the assessment's questions matched their knowledge. There will be a statistical effect where the people who did the worst on the test tend to be people with the most negative idiosyncratic noise term. Even if they have perfect self-knowledge about their true skill, they will tend to overestimate their score on this specific assessment.
Regression to the mean has broad relevance, and explains things like why we tend to be disappointed by the sequel to a great novel.
I've never had impostor syndrome though. To have impostor syndrome, you have to be given opportunities which are significantly above what you deserve.
I did get a few opportunities in my early career which were slightly above my capabilities but not enough to make me feel like an impostor. In the past few years, all opportunities I've been given have been below my capabilities. I know based on feedback from colleagues and others.
For example, when I apply for jobs, employers often ask me "You've worked on all these amazing, challenging projects, why do you want to work on our boring project?" It's difficult to explain to them that I just need the money... They must think that with a resume like mine I should be in very high demand or a millionaire who doesn't need to work.
I've worked for a successful e-learning startup, launched successful open source projects, worked for a YC-backed company, worked on a successful blockchain project. My resume looks excellent but it doesn't translate to opportunities for some reason.
It is unnecessary to walk the reader through autocorrelation in order to achieve a poorer understanding of that simple result.
Close. It's the cognitive bias where unskilled people greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general.
including from David Dunning himself https://thepsychologist.bps.org.uk/volume-35/april-2022/dunn...
How does estimating my skill level influence skill growth, social relationships and decision making?
I think there are a bunch of useful angles to this. When there are risk/responsibility opportunities, then I need to be courageous. When it’s about learning and interacting collaboratively, then I need to be humble.
Is my reasoning flawed in some way?
Not a single word in that blogpost changes anything about that.
The final study discussed is convincing as far as I thought. By using academic rank (Freshman, Sophomore, ...) they can plot the difference between difference in score and predicted score against rank without auto-correlation. Its just that using academic rank seems a possibly unreliable metric and an unnecessary complication - why not just use data about test scores and predictions of scores which already exists in a proper statistical interpretation?
If everyone responded that they are 50% skilled (or per this article, that it's randomly distributed), then we
1. See the same graph, and
2. Bad people overestimate, and good people underestimate
This article merely describes Dunning Kruger. Accidentally proves it mathematically, but thinks that it debunks it.