Then I thought that it was a paper claiming that a bug in the seaborn plotting library in python was responsible for the decline in disruptiveness in science, which is absurd!
Finally I understood, that this is a paper that is debunking another meta paper that claimed that disruptiveness in science had declined. And this new, arxiv paper is showing that a bug in the seaborn plotting library is responsible for the mistake in the analysis that led to that widely publicized conclusion about declining disruptiveness in science. oh boy so many levels...
ETA: For those who don’t click through, the paper title is “Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science.” The first few sentences of the abstract are:
“Park et al. [1] reported a decline in the disruptiveness of scientific and technological knowledge over time. Their main finding is based on the computation of CD indices, a measure of disruption in citation networks [2], across almost 45 million papers and 3.9 million patents. Due to a factual plotting mistake, database entries with zero references were omitted in the CD index distributions, hiding a large number of outliers with a maximum CD index of one, while keeping them in the analysis [1].”
It's arxiv, not a press release. :)
> floating point errors could cause the largest datapoint(s) to be silently dropped
However, the paper does not contain the string “float”, instead saying only:
> A bug in the seaborn 0.11.2 plotting software [3], used by Park et al. [1], silently drops the largest data points in the histograms.
So at the very least, the paper is silent on a key aspect of the bug.
https://www.tiktok.com/t/ZT8oG7ym6/
This is one of my favorite TikToks of all time, and you’ll see why. It goes into detail about how charts killed the Challenger crew. But the storytelling is second to none.
The bug in Seaborn simply meant that the histograms that could have alerted them that something was wrong with their analysis, didn't.
And I hope the original authors tell Nature to retract their paper. It's already highly influential unfortunately.
On mobile and can’t read the rest of the paper, the impact could be massive.
There are (at the time of posting this comment) no comments raising any substantive issue with the arxiv submission itself (which ofc has to go through the peer review process of publication, and hopefully the original authors will respond / rebut this new article) - so curious why its been flagged? It’s not dead, so cannot vouch for it.
If folks in the HN community who have flagged it have done so because there are serious issues with what the paper is asserting, please comment / critique instead of just flagging it. If it’s because of the ambiguity in the title, I hope @dang and the moderators editorialize - there are some valuable comments in this thread that helped me understand what the issue is and what the bug is!
Seaborn is a wrapper around matplotlib. It's popular because it removes a lot of the boilerplate from matplotlib and is pandas-aware
For example, you call the pairplot function with a dataframe, and you just get a matrix of correlation plots and histograms. Versus matplotlib where half the documentation/search results use imperative w/ global state and the other half is OOP, and all the extra subplots shenanigans you have to decipher to get something that looks good.
It's convenience, really. The people who use seaborn don't want to dive into matplotlib because the interface is kinda a mess with multiple incompatible ways to do things. It also documents what arguments mean instead of hiding most of them in **kwargs soup. You get plots in 1 minute of seaborn that would otherwise take 10 minutes in matplotlib to write.
There should be a real incentive/compensation for reviewing properly and real consequences if a paper gets retracted for reasons that should have been caught in review.
In this case it's fortunate that it did get found out in the end.
* is the treatment of existing work semi-thorough (even experts don’t know everything) and fair?
* are the claims novel w.r.t the existing work? If not, provide a reference to someone who has already done it.
* can you understand the experiments?
* do the experiments and their results lead to the conclusions claimed as novel?
* does the writing inhibit understanding of the technical content?
No peer review I have ever seen or done would catch anything but the most egregious bug of this nature.
I have definitely done that with benchmarks / profiles.
It’s probably even easier when the incentives encourage “the find”.
2nd. One of the ways we discover problems with data is by plotting. When the plot library has a bug that hides a problem, well shit.
3rd. They did check their own findings multiple ways. Mistakes happen. The biggest critics of scientific mistakes are often those that have never done science themselves. Its easy, and its a cheap play.
Like others, expecting a wildy different article...
So I thought the article would be about some ocean-faring insect or microbe that somehow affected scientists' mental acuity.
...nor does it have anything to do with tech companies hoarding cash by the trillions of dollars oversees instead of spending it on R&D, and even what R&D they internally produce they have no incentive to publish or productize, because virtually no new business will be more profitable than the monopoly business they already have...
Edit: Not mentioned in the abstract but it is in the main paper. Editorialised title.
> A bug in the seaborn 0.11.2 plotting software [3], used by Park et al. [1], silently drops the largest data points in the histograms.