undefined | Better HN

0 pointsepistasis2mo ago0 comments

If that's the point they are making, let's see their false positive rate that it produces on the entire codebase.

They measured false negatives on a handful of cases, but that is not enough to hint at the system you suggest. And based on my experiences with $$$ focused eval products that you can buy right now, e.g. greptile, the false positive rate will be so high that it won't be useful to do full codebase scans this way.

0 comments

2 comments · 1 top-level

zelphirkalt2mo ago· 1 in thread

How do we know the false positives for this "Mythos" thingamabob? Since they didn't release it, and we cannot reproduce it, are we to simply believe their word on this? What if the author of the featured article simply made a claim about that? We also simply believe their word? To me these AI tech companies are not any more trustworthy than a random blog author, maybe even less so, due to all the shady stuff they are pulling and especially since they have not released. Show or it didn't happen.

epistasisOP2mo ago

That they were able to use it for security scanning puts the false positive rate at a useable level, inherently.

Maybe they spent more on labor to comb through reports than they did on the hardware costs of discovery, but if so I think we'd be hearing from third parties about how useless those millions in Mythos credits were that they got.

j / k navigate · click thread line to collapse