1. They trained and tested on a balanced dataset, which is very unlike the data distribution this algorithm would see “in the wild”. Under real world prescreening conditions the data would likely be extremely unbalanced toward the negative class, and also be subject to drift over time.
2. They seem to have identified positive subjects through a questionnaire not via clinical chemistry diagnostics; so (a) it is unclear whether their training labels are correct, and (b) they may have completely missed the asymptomatic population.
3. As mentioned in another comment ca. 5000 patients and 250K samples is not a lot considering the size and diversity of the population(s) where this would be deployed.
Disclaimer: I gave the article the brief high level scan treatment so I could be wrong about any or all of these. Please correct me if I am mistaken.
It's a short 9-page paper, worth a read.
1. Given that real-world distribution of positive/negative COVID cases is hugely imbalanced, having a balanced dataset would seem to be a form of random undersampling from the majority class. (undersampling potentially discards useful data from the majority class, unless we can somehow determine that the discarded data adds no new information. In this case, there's a lack of homogeneity in the majority class, which the paper points out i.e. "there are cultural and age differences in coughs, future work could focus on tailoring the model to different age groups and regions of the world ")
2. In the abstract, the claim is:
"When validated with subjects diagnosed using an official test, the model achieves COVID-19 sensitivity of 98.5% with a specificity of 94.2% (AUC: 0.97). For asymptomatic subjects it achieves sensitivity of 100% with a specificity of 83.2%." [Reminder: sensitivity = True Positive Rate = TP/P, specificity = True Negative Rate = TN/N]
If you look at Table 1, the breakdown is 59% self-reported, 28% doctor's assessment, 13% official test.
3. 5320 patient data points is something (the train/test breakdown is 4256/1064, so the model was built on 4256 data points). It would depend on the assumptions, but on first glance (based on sample size calculators), it doesn't seem underpowered. That said, this assumes a homogeneous population. The dataset is likely (unintentionally but) systematically undersampling certain populations due to lack of reach.
What I worry about with the undersampling are the “difficult” cases such as other types of respiratory conditions and infections. How many COPD, rhinitis, chronic bronchitis, etc patients were there in the training data? It is precisely these patients the algorithm needs to perform well on as they are higher risk and / or likely to be most prevalent among the people who seek out this app.
I think the other big question is what advantages / disadvantages does this have compared to a questionnaire administered to someone who is experiencing symptoms of an upper respiratory infection?
That being said, this study is a significant academic achievement. The authors should be very proud of what they have done. There are real challenges to doing something like this that impose hard limitations and they did as well as anyone could without infinite resources.
So subjects were aware of their (presumed) covid status when they coughed?
I agree that you would not want to use only the class priors for prediction. However, I do not think it is clear that you would want to throw that information out. Also not sure that I agree with the statement that neural network has “no memory” of the prior class distribution. That is a strong claim to make about something as opaque as a neural net model.
Maybe feasible for a VC funded company with several million dollars and >20 FTEs. Less so for an academic lab with a few grad students and postdocs being paid with pocket lint.
But as usual, the fault is in the summary, not the research.
For example, it's possible to have early stage breast cancer or colon cancer but have no symptoms (yet). Which is why they do screenings to catch these early.
The final and most difficult step would be effective quarantine of infected individuals, some of whom are likely to try and go to work anyway etc.
But even if you assume nothing more than voluntary self-quarantine etc, I would expect this to drive R0 below 1 very quickly, as the vast majority of infected would stay home and thus cease to spread the disease.
Finally, if all of the above where to come true, I think this could go down in history as the first truly life-changing AI discovery, and potentially one of the biggest watershed moments in recent history.
Obviously we’re not there yet, but I am very optimistic and excited after reading the story.
The next week you will have the same problem ...
The next week, people will start to ignore the test.
And this assuming the students only take one test per day. If they also get tested in the bus and the cafeteria and the supermarket, ... the number of people without a false positive will much lower.
Your example assumes there's no hierarchy of available tests, and that this test is the only test there is.
What would really happen is those 6 false positives would be referred for a more accurate test. They might miss a day of school but not a week.
At the same time, your more accurate testing pipeline can now speed up thanks to Little's law. There's dramatically less pressure on the system and less backlog, so you have a second order effect that the more expensive slower tests also become cheaper and quicker.
But even if we gloss over all that, and we're only concerned about false positive rate, then this is still much better than no school at all, as in hard lockdown, which has a 100% false positive rate.
Finally, there's the lives saved because of earlier rapid detection and isolation, with corresponding relief for the health care system, leading to increased quality of care and resources available for more severe cases... and so on and so on.
A bloom filter can do wonders for a system, and if this test works it should do the same.
If it is the former (IID), and let's say P_D = 1.0, P_FA = 0.2, it is an extremely easy problem to solve: Just have each student take 3 tests each day, which will reduce the overall P_FA from 0.2 to 0.008. Or 4 tests for 0.0016.
If it is the latter, you will only lose 6 students for the whole week; you'll have 24 students left on Friday, not 10.
If false positives are per-person, then your scenario won't happen. It'll be the same 6 kids for whom the test never works right.
So with all this in mind, you'll have to come up with appropriate norms around the results. You could call it "okay" vs "suspect" instead of negative and positive. Maybe there's a lowered-risk version of activities for people who are "suspect" that day. Maybe they don't go to the gym that day, maybe they sit in the isolated booth in the classroom, whatever. But then, they need to take a standard test that night to return to school the next day. Or as someone else mentioned, a rapid-test at the nurse's office.
However, reading the article, it seems like the false negative rate is really low. It sounds like this could be an incredibly effective screening tool.
But if this study serves as a PoC to back up that real-world observation, then this is quite a promising approach!
It’s a false negative they really need to worry about. Then you have people who are going around super spreading but telling everyone that it’s fine because they tested negative.
It depends whether there's a casual reason for detecting the same person as positive.
You are not against public health are you citizen? You aren’t hiding any covid patients are you?
† AKA telescreens
If this works well, isn't it kind of anti-social to refuse to cough into a phone?
Keep in mind this isn't the only (or even the first) application of this sort of always-on ubiquitous surveillance.
Seriously, this is incredibly and if this is verified to work could be a game changer. The real thing we need to do is all get tested at once on the same day at the same time. Then those who are positive need to isolate for 2-3 weeks until they are negative again. That would completely reset us back to nearly zero. Then do this again 2-3 times and we could shove this demon back into the bottle.
But we could all cough into our phones every day.
What you describe is what South Korea has done a couple of times now with more traditional testing, it certainly works if you actually do it.
This was not a blinded clinical trial. The subjects all knew whether they have COVID-19 or not and knowing how strong psychological effects can be, what's detectable in their cough might be their knowledge they're sick. The researchers even acknowledge in the paper that "sentiment" is a big part of how a forced cough sounds.
What's worrying is also how little of the data was from a diagnostic test (over half of "positive" samples were "self-diagnosed" COVID-19, whatever that means).
I don't think FDA or any other regulatory body would accept such an app as a screening tool without a proper trial being done.
If it works, that would be the most practical and coolest application of ML I've seen - but it still feels like something from the "too good to be true" category at the moment.
That said, I like that they're thinking outside the box on this one. A free digital test with a low false negative rate would be a game changer.
Researchers with really strong results will for sure publish in medical journals, which typically implies a massive push in prestige (impresses the administration and funding providers).
https://trends.google.com/trends/explore?q=loss%20of%20taste...
While this model is indeed extremely useful and interesting work, this seemingly casual quote gives new meaning to how unsanitary our phones really are/can be.
Of course there the issue of water getting into the charging port, but I've found that the air blown out of a laptop compiling a Node.js project works brilliantly here.
Looks incredible. Hopefully the weights and code are released quickly.
Anyway, happy that someone else is doing it.
Does anyone have a link to a sound file that is a good example of a "Covid-19 cough"? I would appreciate it!
I'm rather skeptical, but the potential upsides are so huge that it's worth rapid additional investigation.