undefined | Better HN

0 pointsfisherjeff4y ago0 comments

They mention a “very conservative false positive rate” - doesn’t 1/trillion imply that they used 1 / (1e12 ^ (1/30)) = ~40% as the false positive rate? If so, that does seem extremely conservative to me!

0 comments

tehnub4y ago

A 40% false collision probability would give an overall false flag probability of 1/trillion only if you had exactly 30 photos in your library, and thus all 30 had to be false collisions. The calculation gets a little more complicated if you have more, because you have to account for all the possibilities of combinations of 30+ false collisions among N photos, for N > 30. I wrote out the calculation in a comment from when this was being discussed a few months back: https://news.ycombinator.com/item?id=28174822.

On page 10 of the paper I linked though, they state that they assume a false collision probability of 1/million, which is more conservative than the 3 in 100 million false collisions they saw in their tests. The way they chose 30 as the threshold is based on the safeguarding assumption that everyone's photo library is larger than the actual largest library. This is safeguarding because the more photos you have, the more likely you are to have collisions. Copying from my previous comment, we can compute their photo library size assumption by solving for N in this equation: 1/trillion = 1 - sum_{k=0}^{29} of (N choose k) (1 - p)^k p^(N - k), where p is 1/million (the probability of a false collision).

j / k navigate · click thread line to collapse

0 comments

tehnub4y ago

j / k navigate · click thread line to collapse