Or you could just use Guava's Bloom ;)
As for probabilistic testing of fp rate... The problem is that every once in a while a test will fail.
Disclaimer: I wrote a cuckoo filter library.
If you want to test fp rate check my cuckoo filter test at sanityOverFillFilter() in github.com/MGunlogson/CuckooFilter4J/blob/master/src/test/java/com/github/mgunlogson/cuckoofilter4j/TestCuckooFilter.java
The unit test basically does fuzzing, filling the filter repeatedly in the hopes that one day any errors will surface. The error bounds are pretty large but small enough to detect any egregious failures. Importantly, my filter defaults to a random seed. Guava DOES NOT, so any tests using the same items will be deterministic. The guava filters use this property to verify some filters that have been manually determined to be correct
I want to add statistics for mine to hopefully be able to monitor them at least by some estimate.
If you fail to do this you get unreproducable ghost tests that you can never investigate if they happen to fail rarely.
I'm not sure of an efficient general method for generating statistics. I know empirically testing the structures is the easy way. For Bloom Filters which evict old data, you can calculate the probability of FN by calculating the probability that a given element is a duplicate but reported as distinct.
In the vast majority of situations where false negatives are okay you're much better off just caching a hash of each object traditionally
Traditionally, probabilistic set membership tests were accomplished by using Bloom Filters. However, Bloom Filters only work well on finite sets because as Bloom Filters age (fill up), the false positive rate approaches 1. This issue was addressed by Stable Bloom Filters which frees up space for inserts; however, this introduces false negatives.
This library contributes three implementations of de-duplication algorithms centered around Bloom Filter variants whose design and replacement strategy allows it to reach stability faster than a Stable Bloom Filter while reducing the false FNR by even several orders of magnitude.