1. Use a non-institutional email address, or have a hospital affiliation, 2. Have no international co-authors.
And they acknowledge 86% sensitivity and 44% specificity. It's a coin-toss which biases massively against research from outside the US and Western Europe.
This "paper" is bigoted nonsense.
If you know the true prevalence of a disease in a population, and the sensitivity and specificity of your test, you can predict how many positive measurements you obtain. Vice versa, from the (flawed raw) measurement, given sensitivity and specificity, you can estimate the true prevalence.
Furthermore, they’re explicitly saying that “red flagging” by their simple indicator doesn’t mean that the paper is fake, but that it merits higher scrutiny.
ETA: I mean, it could still all be bullshit (by virtue of some bias or so), but you’ll need to argue a bit harder to establish that.
ETA2: Actually, not sure that’s what they’ve done. They might have just reported the raw (very bad) measurement (that they call “potential red flagged fake paper”), without doing the obvious next step outlined above, and without applying any confidence intervals. So, it might actually be a pretty crap paper (though possibly technically correct) coupled with some mediocre reporting layered on top. Isn’t basic statistics taught anymore?
I think this paper by Peter J Diggle [0], gives a solid methodology. Instead of treating sensitivity and specificity as fixed values using sample estimates, you can model them as each having a beta distribution. In this case these beta distributions can be found using a Bayesian treatment of Bernoulli trials.
Then they and science should change their sensationalist headline. It's ironic that a paper about fakeness of something uses a borderline misleading title.
A completely random test given equal populations results in 50% accuracy and 50% specificity. Things don’t look nearly as good if only 1% of the actual population has the condition.
They better have a flawless methodology, because any tiny problem is enough to ruin their analysis. And well, just flagging almost any paper not from the EU or US as fraud doesn't usually come together with a flawless methodology.
Paper mills are a $3-4 billion dollar industry that is growing rapidly. That money isn't coming from nowhere. There are a lot of fake papers, and the fake paper industry is growing steadily.
So then the question becomes "where are those fake papers being published, and by whom."
You can converge on answers to those questions in a lot of ways. The fake paper detection method is suggested as one tool to aid journals tackle fraud.
If you don't think the conditions are valid, well, ok. But why not? How would you improve on the validation methodology? Obviously having more known fakes would be nice.
Saying the article is "bigoted nonsense" doesn't make a lot of sense without more information (to be fair, I might be lacking crucial context). Are the authors known bigots with history of pushing bigotry? What I read seemed to be a sincere attempt to improve scientific publication practices by identifying the scope and scale of the fraud problem, while also developing means to address it. That doesn't strike me as bigoted nonsense.
That said, the headline of the article is pretty click-baity, and shame on science's editors for that.
I havent looked at the details here, but if you make a prediction model and if that prediction model is robust enough to explain with great accuracy something with 2 or 3 variables, it's not going to be "biased", it's just going to be robust and right more often than not using only these few variables (as long as the training data was broad enough).
Kinda similar to those researchers years back who proved how easy it was to go into certain social science journals as long as you copied their ideology.
For the social science journals bit, are you thinking of the "grievance studies affair": https://en.wikipedia.org/wiki/Grievance_studies_affair ?
Ironically, this study has generated a lot of "fake news" on the field of social science. The conclusions of this study were widely spread mainly by people for ideological reason. When we look at the study in question, it's clear the conclusions are quite different than what the rumors say. For example, the same researchers tried such hoax before the ones they mention in their study, except that these hoaxes failed to be published, and they "forgot" to mention it. They did not have any control group, neither as "correct article" or "article defending the opposite ideology" (so, how can we conclude that the reason these bad articles were published were because of ideology if you don't know how many articles are published without being critically reviewed). They also count as valid a lot of journals that are pay-to-publish and not seriously used in the field. One of the author, ironically, ended up supporting platforms publishing conspiracy theories (and he was even banned from Twitter) (not that the study should be judged based on that, but it's a funny anecdote: the author who, according to some, had the courage to defend real science against bad woke ideology, who ends up demonstrating that he never cared about real science and is driven by ideology not science)
Not by the definition of "fake" used in the article, as the data wouldn't be plagiarized or fabricated. It'd just be shitty data.
Typically, it doesn't affect people working in that specific area - they develop/have a sixth sense to detect bullshit papers - it comes with experience but depends on several factors including the authors reputation, their institution (for the first screening), what journal/conference the paper was published in, authors other work, and sometimes things as simple as how much effort was put into the figures, polishing the text, etc. Some of these things are LLM proof, some of them are not - e.g. a senior professor I was talking to, who's been getting like 50-100 emails a week from non-english speaking countries (primarily India, China, Pakistan, and Bangladesh) mentioned that the quality of the text in the emails went up significantly almost overnight after ChatGPT was made open to public. It'll be interesting to see how things change in the next few months/years.
Some professor put it in a nice way - the current system motivates us to think of research in terms of LPUs - least publishing units. No matter how established your lab is, you’d try to publish as soon as possible, leading to a lot of papers with not a lot of contribution. If tenure committees and all other systems that gauge academicians require people to say present their only top 3 or 5 seminal papers, then people would try to put their best work out there without the constant pressure of always publishing - win win for everyone. Unfortunately, the ones with the power to make these changes are the ones gaining the most in the current system so it’s unlikely to happen.
Like look how many times that 2006 Nature paper on amyloid beta in Alzheimer's was cited, turns out some of the images were completely fabricated.
It's very ironical that this study that was all about "bad science" since then created a totally whimsical rumor on the real situation.
Uh huh.
I didn't realize until today that all my papers are fake because I give contact information that won't go stale in 3 years, instead of my work email.
> To identify indicators able to red-flagged fake publications (RFPs), we sent questionnaires to authors. Based on author responses, three indicators were identified: @@@“author's private email”@@@, “international co-author” and “hospital affiliation”.
> For Studies 1 to 6 we identified two easy-to-detect indicators, where a publication was labelled as RFP: @@@if an author used a private email@@@ and had no international partner.
> Then we combined the two best indicators (@@@“author's private email”@@@ and “hospital affiliation”) to form a classification (tallying) rule: “If both indicators are present, classify as a potential fake, otherwise not” (the “AND” rule) (Katsikopoulos et al., 2020).
Fun bonus there with the 2020 book citation for the concept of an AND gate in a classifier.
I would allow just one valid paper with that inability.
The meat of my complaint remains even when they're intersecting with other rules. We should not be incentivizing people to use emails that predictably go dead in O(years). It is quite a common annoyance to read a paper, want to contact the author, and not be able to because the email they listed is dead, requiring searching for where they currently work and trying to find their email at that new place, with mixed results.
Yes, a private email is predictive of a paper being fake, in the literal sense that P(fake|privateemail) > P(fake|institutionemail). I get weird looks at work for using my permanent email address because of it. And probably if we select on that as a way to discard papers, it will initially appear to work and then start to look like it's working even better because anyone trying to give permanent contact info will be forced to switch to be published/cited/taken-seriously. But that's a bad outcome. Also, if you systematize this rule, paper mills will just start using emails that appear institutional, because this is a simple rule to defeat.
Is treating "the scientific literature" as a single thing perhaps a habit worth giving up?
As convenient as it would be to be able to just blindly trust something because of where it is published, that model hasn't shown itself to be especially robust in other cases (e.g. the news media).
Elsewhere, this is a red flag:
> I trust it because of which aggregator aggregated it
Should we really make an exception for science? I think that academia is a bit biased towards optimism about publisher-based root-of-trust models because scientific publishing is a relatively unweaponized space. Sure, shenanigans happen, but not at the same scale as elsewhere. The fakers are just trying to get another published paper, they're for the most part not trying to mislead. It's only fake news with a lowercase-f.
Sure, let's try to create a medium we can trust, but let's not get our hopes too high about it. That's energy better spent augmenting the ability of a reader or researcher to decide whether to trust a paper based on it's content or based on it having been endorsed or authored by somebody that they explicitly (or transitively) trust.
But tempering our expectations while working to meaningfully improve on conditions? Aces, all for it.
If peer review is the product then the trust should be peer to peer. It feels like we're treating the publishers themselves as an authority, which I dislike.
The publishers ostensibly occupy a role of stewardship, I suspect the model must have made sense at one point. I admit its hard to see them as much more than rent extractors these days.
The nature of trust relationships seems to trend towards aggregation and centralization. Do you have any thoughts on how a web of trust can sustain itself, or is that perhaps not a concern if a centralization appears to reflect a network concensus?
I think filtering out bad faith efforts is too challenging because the pool of people capable of doing so is so limited, hell it might take longer to review and reject such a thing than to make it.
Mendel, father of genetics, failed to become an accredited teacher. His work on genetics would likely get no recognition in this environment of credentialism is king.
Some guy who knows enough about genetics he created his own home pill to deliver genes into his gut to fix his lactose intolerance is being ignored by the world. Someone recently told me on HN that his video sounds like a scam video of a sort that is common (probably in a redacted comment).
I have a genetic disorder, which fails to pass the credentialism test. For that and other reasons, I didn't bother to say anything like "Sorry you don't know enough about genetics to follow it."
The individual wanted to know where the "studies" and "papers" were. And they likely don't exist and will never exist because there's no profit in it for someone else to try to build on his work.
I don't know how we fix this, but the world has changed and it's valuing the facade of scientific work more than actual scientific work and it makes me want to scream.
To be honest, I know nothing other than your description and it 100% sounds like either a scam or there are some variables that are not being controlled for. I’m a little shocked that you seem to have fallen for it, unless there is just a lot more to the story…
I don't know why you would be "shocked" that I "fell for it." Most of the world thinks I'm a nutter who imagines I'm getting well from my genetic disorder and dismisses my progress as "placebo effect" -- which would give me a mind more powerful than Darth Vader -- or just deluded bullshit.
So either I understand genetics and medical stuff better than average, or I'm absolutely the kind of fool who falls for bullshit scams on the internet.
Somewhat related anecdote: I’m reminded of a good friend who is preeminent in their field. No one would know them outside of their area of expertise, but anyone within that area of expertise (or who has learned that area of expertise from their college textbooks) knows their name. I got dinner with them over the holidays last year and they lamented that, I’m guessing based on name recognition, they receive a steady stream of communications (letters, email, etc) from laypeople who always think they have done something amazing previously thought impossible, or they have a new insight that everyone else ever has missed. Invariably my friend no longer spends time going through these because in every single of the hundreds of comms they’ve read, there’s always some confound factor or something basic the writer missed that invalidates everything. I am not an academic, but my impression is that while laypeople like you and me can brute force things and have amazing insights, mostly we’re just wrong for some reason that a trained scientist or academic would have spotted immediately.
So there is some human review involved. Which is presumably how they got to the headline figures of 34% of neuroscience papers and 24% of medicine papers are fake.
Still, flagging 44% of genuine papers as fake doesn't sound very useful. The process only about halves your workload compared to just checking all the papers. In any large-scale rollout they would have to set a way higher threshold, and hope they still catch a useful number of fraudulent papers when using a threshold that detects 10% or 1% of genuine papers as fake.
They are also tagging all independent non affiliated researchers as fake. Do they know how many young people are doing science in the universities as temporary collabo-slaves without right to a nice personal mail?. Their detector would tag Einstein and Erdos as fake scientists by Pete's sake!. They just have a narrow vision tunnel about how the real research works
Note: I paid for the reprints and the postage, often expensive foreign rates.
https://www.thedailybeast.com/how-this-doctor-wrote-dozens-o...
The article you link doesn’t say if the papers are any good. It does suggest that they were in smaller niche journals so I suspect not.
On the upside, I can see the potential though for literature review type papers.
Can someone explain why the affiliation with a hospital is used as a key indicator?
Sounds great, who wouldn't want to use this? So I implemented and find that their increase was due entirely to applying a log transform of the input variables. The resulting clusters were tighter, but it had zero predictive capability.
Very disappointing but in my experience, this is not uncommon.
Adding new tools to 'detect' that don't solve the original problem, they might reduce the second-order problem, but do not touch the source problem. These are band-aids trying to stop a flood of bad science
You'd need to use some obfuscated correspondence email to complete the loop.
Even the article makes it clear that this is just a wide net for an automatic first pass. Of course, it is biased towards countries with lax standards.
People can do science on local problems without being babysat by a foreigner that most of the time will just appear and sign.
It's not like paper authors get any kind of royalties. Some journals even make you pay to publish.
So why are they doing that? Maybe that's what we need to attack
Strange world
- Realistically the only people who can determine if a work is sound or not are other researchers in that same field.
- Peer review is a weak signal: reviewers are good at recognizing bad papers but not good at recognizing good papers (read this carefully).
- Most papers aren't highly influential. Thus meaning that we don't rely heavily on the results of most works (we rely weakly or purely for citations).
- The more influential a work is the more likely it is to be reproduced and scrutinized.
- Benchmarks are benchmarks, nothing more. Benchmarks are weak signals at best and shouldn't be used to make strong conclusions. Be that a p-value, FID, or even likelihood.
So we have to keep this in mind for a lot of reasons. One is how we discuss with the public. Headlines like this often make people grow wary of science. While scrutiny is good we have a good history of being successful. All processes are noisy but the cream has is more likely to come to the top and the surface is less noisy. It also tells us about who we should be listening to when taking advice and summaries of works. If you believe the news has failed us, then look to the sources.
I see many who only get their science from news sources that claim scientists are corrupt. I found this odd, especially considering I've worked at national labs and I can tell you that no one there is doing it for the money. You'd have to be a fucking idiot to do science for money. It doesn't pay well, you never get real time off, there is a high barrier to entry, and you are under high amounts of pressure. We're on a forum with Silicon Valley wages: the average physicist wage is 100k, what you'd make with a BS in CS but need an advanced degree for working at a lab. Let try to compare likes and likes by looking at LLNL. As a PhD physicist you'll make between $150k and $200/yr. You'll make the same as a PhD computer scientist. Yeah, this seems good, but we need to consider that if you drove 45 minutes west then that would be your base salary and you'd be making the same in other compensations. You can easily verify this and there's plenty of people you can ask for personal experience (I've seen people jump ship often). This doesn't prove that they aren't corrupt, but it provides strong evidence that if these people were motivated by monetary compensations (or even prestige) then there are far better opportunities for them.
Another important aspect, which I think is critical to forums like this, is to be careful how you as a non domain expert. Opinions are fine and no one should prevent you from having them. But the confidence in your opinion should be proportional to your qualifications. If you're an expert in one domain I'm sure you're frustrated by how many people discuss your domain as if they knew so much and they get so much wrong. How wrong answers float to the top of forums (HN and Reddit) and the gems are hidden. This usually comes down to a lack of nuanced understanding. Simple answers are almost never correct. Murry Gell-Mann amnesia doesn't just apply to reading the news. Discussions can be had without teaching. Scientific discussions aren't done through debate. Determine your goals, and ask yourself if the way you are discussing allows you to change your opinion or not. Make sure you're on the same page as others, using the same assumptions (this is a key failure point). I'll argue to go in with care. If you don't, you're just adding to the noise.