Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives" (opens in new tab)

(arstechnica.com)

84 pointsepistasis2d ago43 comments

43 comments

Again, and this is important:

A bug is a bug. A “potential vulnerability” is a bug. A vulnerability is verifiable as having security implications with a proof of concept or other substantial evidence.

Words matter. Bugs matter. It’s important to fix large amounts of bugs, just as it always has been, and has been done. Let that be impressive on its own, because it IS impressive.

Mythos didn’t write 271 PoC for vulnerabilities and demonstrate code path reachability with security implications. Mythos found 271 valid bugs. Let that be enough.

epistasisOP2d ago

I was a bit confused by your definitions, but here's how Mozilla broke out [1] the 271, um, things:

> As additional context, we apply security severity ratings from critical to low to indicate the urgency of a bug:

> * sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior, like browsing to a web page. We make no technical difference between these, but sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.

> * sec-moderate is assigned to vulnerabilities that would otherwise be rated sec-high but require unusual and complex steps from the victim.

> * sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

> Of the 271 bugs we announced for Firefox 150: 180 were sec-high, 80 were sec-moderate, and 11 were sec-low.

Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit. And on their definitional page, they classify even sec-low as "vulnerabilities" [2].

Words are tools, that get their utility from collective meaning. I'd be interested where you recieved your semantics from and if they match up or disagree with Mozilla.

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

[2] https://wiki.mozilla.org/Security_Severity_Ratings/Client

IainIreland2d ago

I work at Mozilla; I fixed a bunch of these bugs.

In general, I would say that our use of "vulnerability" lines up with what jerrythegerbil calls "potential vulnerability". (In cases with a POC, we would likely use the word "exploit".) Our goal is to keep Firefox secure. Once it's clear that a particular bug might be exploitable, it's usually not worth a lot of engineering effort to investigate further; we just fix it. We spend a little while eyeballing things for the purpose of sorting into sec-high, sec-moderate, etc, and to help triage incoming bugs, but if there's any real question, we assume the worst and move on.

So were all 271 bugs exploitable? Absolutely not. But they were all security bugs according to the normal standards that we've been applying for years.

(Partial exception: there were some bugs that might normally have been opened up, but were kept hidden because Mythos wasn't public information yet. But those bugs would have been marked sec-other, and not included in the count.)

So if you think we're guilty of inflating the number of "real" vulnerabilities found by Mythos, bear in mind that we've also been consistently inflating the baseline. The spike in the Firefox Security Fixes by Month graph is very, very real: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

paulvnickerson2d ago

What types of vulnerabilities was it finding? Cross site scripting, privilege escalation, etc? Mostly memory corruption or any Javascript logic bugs?

2 more replies

JeremyNT1d ago

You may not be able to comment, but do you feel like Mythos is accomplishing anything that couldn't have already been done with Opus and the right prompting?

I've assumed I could send an agent using a publicly available model bug hunting in a codebase like this and get tons of results, assuming I wanted to burn the tokens, so it's really unclear to me whether the Mythos hype is justified or if it's just an easy button (and subsidized tokens?) to do what is already possible.

1 more reply

epistasisOP2d ago

I'm not a security dev or researcher or anything, but as an outsider my understanding matches how Mozilla uses the terms. Though words used by specialists and the general public can offer differ...

cookiengineer2d ago

Can you elaborate why those bugs weren't found by e.g. fuzzing in the past?

I'm genuinely curious what "types" of implementation mistakes these were, like whether e.g. it was library usage bugs, state management bugs, control flow bugs etc.

Would love to see a writeup about these findings, maybe Mythos hinted us towards that better fuzzing tools are needed?

2 more replies

nirui2d ago

How about this: a "vulnerability" is a "vulnerability", but after it was identified and verified to cause problem, that's when it should be called a "bug", because it could make the software do unwanted things.

3 more replies

throw0101c2d ago

Presumably there are (implicit?) "sec-none" things, like [a] from the recently released 150.0.2 [b] which makes absolutely zero mention about "Security Impact" or "Severity" in the bug report, unlike [c], which is listed in the Mozilla weblog post [2].

Security things are mentioned in the Release Notes [b] pointing to a completely different document [d].

Perhaps sometimes a bug is 'just' a bug, and not a vulnerability.

[a] https://bugzilla.mozilla.org/show_bug.cgi?id=2034980 ; "Can't highlight image scans in Firefox 150+"

[b] https://www.firefox.com/en-CA/firefox/150.0.2/releasenotes/

[c] https://bugzilla.mozilla.org/show_bug.cgi?id=2024918

[d] https://www.mozilla.org/en-US/security/advisories/mfsa2026-4...

Gregaros2d ago

> Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit.

That’s not evident in what you pastedat all.

What you pasted says

> sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior […] We make no technical difference between these […] sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.

> sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

From this one infers that the "180 were sec-high" bugs found are actually exploitsble but known to have been found in the wild, and are NOT mere annoying bugs.

The difference between 180 and 270 does nothing to deflate the signicance, or lack there of, of the implication re: Mythos.

epistasisOP2d ago

Yes, it is not in what I pasted, as I said, "even though they say right below". If you don't believe me then click on either of the links.

mozdeco2d ago

Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

For us this is substantial enough evidence to consider it a security vulnerability at that point, unless shown otherwise and it has always been this way (also for fuzzing bugs).

ZrArm2d ago

> Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

> For us this is substantial enough evidence to consider it a security vulnerability at that point

Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

[1] https://www.mozilla.org/en-US/security/advisories/mfsa2026-3...

mozdeco2d ago

> But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

This is just the standard sentence we've been using for years. It has nothing to do with Mythos and for Mythos, almost all bugs show evidence of memory corruption (we do have a handful of bugs in JS IPC / JS Actors, one is in the blog post).

> Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

Yes but if we have a choice between writing exploits and scanning more source, potentially finding more bugs, then of course we prioritize the latter.

sfink2d ago

I'm guessing a bit, but for example: out of bounds reads are not memory corruption. Assertion failures in debug builds are also usually not memory corruption, and I'd guess that many of these bugs were found through assertions. (Some parts of Firefox like the SpiderMonkey JS engine make heavy use of assertions, and that's the biggest signal used for defect validation. An assertion firing is almost always treated as a real and serious problem. Though with our harness, Opus and Mythos try to come up with an exploit PoC anyway.)

2 more replies

jerrythegerbil2d ago

Is that number of crashing bugs with PoC available/written down anywhere?

mozdeco1d ago

As described in our blog posts, our harness/pipeline only looks for crashes so all of the bugs resulting from that do have PoCs. There is a smaller number of bugs found by manual auditing that didn't have PoCs but I'd say easily more than 90% of all of the bugs we are talking about had a PoC.

jeffparsons2d ago

It may be worth noting that Claude can and will (if it believes you own the code, at least) produce PoC exploits for exploitable bugs that it finds.

My only source for this is personal experience, and no, I can't share any evidence of it.

freedomben2d ago

Are you certified for high risk cyber uses? If so then you're correct. If not, then it does not match my experience

cvwright2d ago

The word “exploit” may be doing a lot of work here. In my experience Opus 4.6 is perfectly happy to provide test cases that trigger ASAN, even without the super secret squirrel security access.

But if you ask it to get you a shell it’ll probably tell you to get lost.

jeffparsons7h ago

I don't have any special certification or arrangement with Anthropic; this is vanilla Opus 4.x via Claude Code.

staticassertion1d ago

I'm not and I'm doing it right now.

browningstreet2d ago

This isn’t true anywhere people have to make decisions about what to work on first.

dataflow2d ago

> Mythos didn’t write 271 PoC for vulnerabilities

I think the word you're looking for is exploit?

input_sh2d ago

Original source: https://news.ycombinator.com/item?id=48051079

It's better because it actually lists a sample of Bugzilla reports that were made public. This topic was discussed previously (36 comments two weeks ago: https://news.ycombinator.com/item?id=47885042), but the part about bug reports being made public is brand new.

gnabgib2d ago

16 day old story

Wired: Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox (41 points, 18 comments) https://news.ycombinator.com/item?id=47853649

Ars: Mozilla: Anthropic's Mythos found 271 security vulnerabilities in Firefox 150 (33 points, 8 comments)https://news.ycombinator.com/item?id=47855384

delichon2d ago

In the latest Mission Impossible, saving the world depends on recovering the original software of an escaped superhuman AGI from a sunken Russian submarine. Luther writes a "poison pill" that given the original source will instantly one-shot the AI. We were left to wonder how this magical code could have been written, but now we know. Luthor just wrote a Mythos prompt that handed it the source code and asked for an immutable critical exploit.

tialaramex2d ago

They've only linked a few tickets, so of course maybe when we see all 271 actual distinct things the insight won't apply but all those I examined ended up as some C++ code with a nasty bug in it.

Firefox is written in several languages, only about 25% of it is in C++ but every single one of these issues seems to touch the C++.

mccr82d ago

A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free. For subtler issues will we have to more specific validators or will the LLM become better at coming up with other dangerous conditions it will verify? We'll see.

tialaramex2d ago

> A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free

Sure, but, surely AddressSanitizer would also detect the same problem in the C or Rust which together also make up about 25% of Firefox so... ?

jeroenhd2d ago

It's possible Mythos is a lot better at finding vulnerabilities in C++ code than it is for other languages. After all, these models are also based on pre-existing security analysis.

From what I can tell, a lot of these bugs were hardly C++-specific, they just happened in C++ code. Even the most secure Rust can't magically catch things like TOCTOU issues.

tialaramex2d ago

> Even the most secure Rust can't magically catch things like TOCTOU issues

I suppose it depends what the word "magically" means. A ToCToU race is because you imagined things wouldn't change but they did and in Rust you actually do write fewer patterns with this mistake because of the Mutable xor Aliased rule. If we have at least one immutable reference to a Goose then Rust isn't OK with anybody mutating the Goose, your safe Rust can't do that and unsafe Rust mustn't do that. So the ToCToU race caused by "Oops I forgot somebody else might change the Goose" is less likely because you were made to wrestle with this problem during design - the safe Rust where you just forgot about this doesn't compile.

IshKebab2d ago

It's because they verified the bugs using AddressSanitizer so by construction it was only ever going to find C++ bugs.

tialaramex2d ago

But there is AddressSanitizer for Rust and for C too right? As I understand it AddressSanitizer consumes LLVM IR, so from its point of view some C, C++ or Rust is all the same, and presumably also if you are a famous Russian streamer and you hand wrote LLVM IR instead of using a real programming language that too?

IshKebab2d ago

Yes I was including C in "C++". I dunno how much C Firefox uses.

And I presume you can run AddressSanitizer with Rust but given Rust is memory safe by default, it's only going to find issues in `unsafe` code which is a tiny tiny fraction of most code. Google had a blog post a few months ago where they managed to put some actual numbers on this, because they almost shipped one Rust memory safety bug.

1 more reply

crummy2d ago

Curious if people think LLMs will lead to more secure or less secure software in five years.

jillesvangurp2d ago

It will probably wipe out a few categories of issues, which is probably a good thing. And those things that still are still insecure can also be translated to some other language.

Translating things to Rust manually was already a thing before LLMs came into the picture. Now with LLMs that's only going to get easier and faster. The long term value is going to come from getting on top of the mountain of technical debt in the form of existing C/C++ code bases that is responsible for the vast majority of memory exploits, buffer overflows, and other issues that despite decades of attention still are being found across major code bases on a regular basis.

Mozilla finding these issues comes on the back of a quarter century of some very competent engineers trying to do the right thing and using all the tools at their disposal to prevent these issues from happening. I have a lot of respect for that team and the contributions it has made over the years to improve tools, testing/verification practices, etc. The issue is not their effort or competence.

The job of taking an existing system that is well covered in test, well documented/specified, etc. and producing a new one that can function as a drop in replacement is now something that can be considered. A few years ago that would have translated into absolutely massive project cost and risk. Now it's something you can kick off on a Friday afternoon. Worst case it doesn't work, best case you end up with a much better implementation.

It's still early days. There are still a lot of quality issues with LLM generated code. But the success/fail rate will probably improve over time.

int32_642d ago

Both. The skilled will use them to find problems, the unskilled will use them to slopcode insecure software the skilled will have to fix.

mc33012d ago

Kinda like home-improvement stores, power tools, easily available hardware and youtube tutorials led to both incredibly amazing and durable furniture, as well as janky, ugly and even dangerous furniture.

More tools for more people equals more stuff being made on a wider range.

FeepingCreature2d ago

More secure software, but in the same way that the population is net healthier after a plague.

data-ottawa2d ago

I’m just happy we’re talking about security.

That will make software safer alone.

stavros2d ago

That depends on which side has more money.

vga12d ago

More secure, at least in the cases where the tools are properly applied.

But it also represents more easily available opportunities for blackhats to abuse against the projects where these tools were not being applied.

UltraSane2d ago

In 5 years attackers have an advantage but in the long run I think more secure if developers use LLMs on software to find and fix all of the worse remotely exploitable bugs before release. LLMs are going to force devs to be much more security conscious.

canucker20161d ago

I think it'll be a war of who has the better LLMs-as-security-scanner.

Ideally, you'd do a comprehensive all-source-code scan, (and the LLM-scanner finds everything during those scans), and fix all the reported defects.

Afterwards, any dev that commits code will run the LLM-scanner on the modified code (and affected areas) and fix any reported defects.

So the black-hat hacker would be shut out unless they get access to an LLM-scanner with better analysis than what the target project is using.

Major LLM-scanners could give priority access for new versions of LLM-scanners to major projects to find any defects in the current source code before any other party could use the reported defects against the project or their users.

So black-hat hackers would be left with developing their own LLM-scanner better/more efficient than existing major LLM-scanners.

Given enough incentive, they might develop such a tool. Look at the market for zero-day vulnerabilities for smartphones, esp iPhones.

bawolff2d ago

One of the biggest issues in security historically imo is vendors who think, well nobody will ever find this bug so we can deprioritize fixing it. LLMs will prevent vendors lying to themselves which will lead to more secure software.

2ndorderthought2d ago

Less secure because of all the ways attacks can scale out and hackers can contribute vulnerabilities to active projects.

deferredgrant2d ago

A vuln finder is useful only if it respects the humans on the other end. Every bogus report taxes the same scarce attention needed for the real bugs.

lschueller2d ago

Let's see, how this will improve the daily soc work. I still don't see, what's the big difference between Mythos and Opus, security wise. I'm confident, that this kind of vul detection is a long-term improvement. But does specifically Mythos makes such a big difference to "normal" models? I would love to see, what's the actual difference.

mccr82d ago

Quantifying the abilities of an LLM is a hard research problem, so I'm not sure if I can describe it in any great way, but Mythos did seem to be fairly clever about putting together things from different domains to find problems.

For instance, in one of the included bugs (2022034) it figured out that a floating point value being sent over IPC could be modified by an attacker in such a way that it would be interpreted by the JS engine as an arbitrary pointer, due to the way the JS engine uses a clever representation of values called NaN-boxing. This is not beyond the realm of a human researcher to find, but it did nicely combine different domains of security.

As the person responsible for accidentally introducing that security problem (and then fixing it after the Mythos report), while I am aware of NaN-boxing (despite not being a JS engine expert), I was focused more on the other more complex parts of this IPC deserialization code so I hadn't really thought about the potential problems in this context. It is just a floating point value, what could go wrong?

lschueller2d ago

Okay, so far it makes sense to me. But is the deal with JS and floating point values, which isn't soemthing super special super rare stuff, only detected and identfied by Mythos while Opus wouldn't get to this point?

IainIreland2d ago

There doesn't have to be a huge qualitative discontinuity between Opus and Mythos. It's just that Mythos has reached a threshold where it's finally smart enough that putting it in a loop and asking it to find bugs is suddenly really effective. Especially at the beginning, Mozilla wasn't doing anything particularly clever with prompts. Mythos is just smart enough that the hit rate on obvious prompts is high enough to matter. (Maybe you can get similar performance out of Opus 4.6 with really smart prompts, but AFAICT nobody had managed it until Mythos.)

JoshTriplett2d ago

Among other things, Mythos seems better at "let me find, weaponize, and stack vulnerabilities until I get end-to-end from untrusted content to root", rather than just finding one thing in a specific identified area.

Havoc2d ago

Results similar to mythos have been duplicated by weaker models.

Think it's more a care of mythos raising widespread awareness that tireless LLMs can be weaponized to dig through code and find that one tiny flaw nobody spotted

sfink2d ago

The report I saw kind of seemed to be pointing at a flaw and asking "do you see it?" which is not the same thing. I felt a pretty large difference between Opus 4.6's results and Mythos's, so I would be surprised if even weaker models did anywhere near as well. I'd like to see these results, if they are using a decent methodology.

Of course, even the reports with flawed methodology could be suggesting that a great harness + weak model might achieve a similar level of results as a mediocre harness + strong model. But I'd want to see solid evidence for that.

empath751d ago

There is a phase transition where LLMs match or exceed humans' ability to do something, and from that point on, even if the difference between its previous version is small, it will go from something people use rarely, to something that people use all the time.

There was a time when the entire transportation infrastructure in the US was built around horses. Even after cars were invented, the cars weren't obviously better than horses for most people, especially because there wasn't any infrastructure to support them, but the infrastructure and the cars kept improving to the point where it was better for some people at some things, then suddenly it was better at most things, and then people stopped using horses, and we re-organized our entire transportation network around cars.

But there was never a revolutionary technological change. The technology of cars in the 1930s was the same fundamental technology as the cars in the 1890s. Just at some point it became "good enough" and that was it.

I think when people say that AI is a bubble, they are assuming that anything economically useful that LLMs cannot perform today is _qualitatively_ different from what LLMs can do right now, and that LLMs cannot do it even in theory, without some major technological innovation. But I have a suspicion that there are a large number of valuable things, that once LLMs advance just a little bit more, and harnesses and infra around them is improved a little bit more will just be completely taken over by LLMs.

MetaverseClub2d ago

I'm curious about how did Mozilla do bug finding before Mythos? Did they use any non-AI bug finding tools?

mccr82d ago

The usual sorts of fuzzing and static analyses, using AddressSanitizer and ThreadSanitizer. Also, with a bug bounty program to try to encourage external researchers to report issues. (I work on Firefox security; also I fixed 2 of the bugs linked in the blog post.)

canucker20162d ago

Coverity (similar to lint) scans various open source software products for vulnerabilities.

see https://www.blackduck.com/static-analysis-tools-sast/coverit...

and for Firefox-related alleged defects, see https://scan.coverity.com/projects/firefox

You have to create an account to view the actual reported defects.

There are just over 5000 reported defects still outstanding. I don't know how many overlap with the reported 271 Mythos-reported defects.

rockdoe2d ago

How many of those are false positives though? Probably just over 5000?

You get bug bounties if you report the kind of bugs Mythos identified. There's a reason no-one collected bounties from the "5000 defects" Coverity identified.

The Mythos reports have several examples of chaining a whole bunch of logic in different parts of the program together to exploit something very subtle. The Coverity reports aren't anything like that. These tools aren't remotely in the same league or even universe.

IainIreland2d ago

Yeah, fuzzing, sanitizers, and bug bounties were our main pre-AI tools for finding bugs.

MetaverseClub2d ago

it's just sad that Coverity represents the best working C++ static analysis tool.

2 more replies

mccr82d ago

Firefox developers do fix issues found by Coverity. I haven't looked at the results in over a decade, but the last time I did there were a few code patterns we used in a lot of places which Coverity didn't like (but were actually okay the way we were doing them) which resulted in a colossal number of false positives.

mmooss2d ago

> “That’s the key thing that has unlocked our ability to operate at the scale we’ve been operating at now,” he said. “It gives the engineer a crank they can pull that says: ‘Yep, this has the problem,’ and then you can iterate on the code and know clearly when you’ve fixed it and eventually land the test case in the tree such that you don’t regress it.”

I don't understand much of this paragraph:

* "a crank they can pull that says: ‘Yep, this has the problem,’": as in, ring an alarm? Does the LLM ring th alarm?

* "you can iterate on the code and know clearly when you’ve fixed it": Isn't that true of most bugs, assuming you do the normal thing and generate a test case? And I thought the LLM output test cases itself: "It will craft test cases. We have our existing fuzzing systems and tools to be able to run those tests" And are they claiming the LLM facilitates iterating?

* "and eventually land the test case in the tree": Don't you create the test case before the fix? And just a few words earlier they seemed to be working on the fix, not the test case. And see the prior point about test cases.

* "such that you don’t regress it.”: How is the LLM helping here?

Maybe I'm missing some fundamental unwritten assumption?

mccr82d ago

Mostly I think this just means that having a test case makes it easier to fix and verify. You can't actually take for granted having a test case when fixing a security bug. Sometimes you only have a crash stack or maybe a vague and hypothetical static analysis result.

> eventually land the test case

This is just a reference to the fact that we don't land test cases for security bugs immediately in the public repository, to make it harder for attackers. You are right that the LLM only helps with creating the initial test case. Things like running the test case in automation is part of the standard development process.

mmooss2d ago

Thank you; that makes sense.

rem10992d ago

I don't find that number very high. In a project of the size of Firefox, a new version of a compiler with stricter warnings or a draconian interpretation of the C standard can easily find 200 new bugs.

New tools find new bugs, but the oligarchy newspapers report on Mythos and not on clang-22.0.

sfink2d ago

The raw number of things found by Claude (Opus or Mythos) was much higher and would be more comparable to a new clang warning. I vaguely remember seeing a number early on in this process that was in the mid-thousands. The 271 is a small, validated subset of that. None of the 271 were deemed false positives iiuc. Most instances of a new clang warning will be false positives. (Same as most of the raw problems reported by the AI.)

It is still unclear and open for speculation as to what percentage of all security bugs in Firefox today are being found by the AIs (as opposed to not being found at all). It might be that AI is very good at certain types of problems, even if we can't put our finger on what those types are, and that after the initial wave of bug reports the AI findings will slow to a trickle even while many many other bugs remain in the codebase. Or it might be that AI really does detect most instances of some class of problems and all those bugs will now be gone forever, never to return as long as Mozilla keeps paying the token monster. This is closely related to the oft-asked question "are we better or worse off after both attackers and defenders have access to this new capability?"

ChrisArchitect2d ago

[dupe] Discussion on source: https://news.ycombinator.com/item?id=48051079

j / k navigate · click thread line to collapse

43 comments

jerrythegerbil2d ago

Again, and this is important:

A bug is a bug. A “potential vulnerability” is a bug. A vulnerability is verifiable as having security implications with a proof of concept or other substantial evidence.

Words matter. Bugs matter. It’s important to fix large amounts of bugs, just as it always has been, and has been done. Let that be impressive on its own, because it IS impressive.

Mythos didn’t write 271 PoC for vulnerabilities and demonstrate code path reachability with security implications. Mythos found 271 valid bugs. Let that be enough.

epistasisOP2d ago

I was a bit confused by your definitions, but here's how Mozilla broke out [1] the 271, um, things:

> As additional context, we apply security severity ratings from critical to low to indicate the urgency of a bug:

> * sec-moderate is assigned to vulnerabilities that would otherwise be rated sec-high but require unusual and complex steps from the victim.

> * sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

> Of the 271 bugs we announced for Firefox 150: 180 were sec-high, 80 were sec-moderate, and 11 were sec-low.

Words are tools, that get their utility from collective meaning. I'd be interested where you recieved your semantics from and if they match up or disagree with Mozilla.

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

[2] https://wiki.mozilla.org/Security_Severity_Ratings/Client

IainIreland2d ago

I work at Mozilla; I fixed a bunch of these bugs.

So were all 271 bugs exploitable? Absolutely not. But they were all security bugs according to the normal standards that we've been applying for years.

paulvnickerson2d ago

What types of vulnerabilities was it finding? Cross site scripting, privilege escalation, etc? Mostly memory corruption or any Javascript logic bugs?

2 more replies

JeremyNT1d ago

You may not be able to comment, but do you feel like Mythos is accomplishing anything that couldn't have already been done with Opus and the right prompting?

1 more reply

epistasisOP2d ago

I'm not a security dev or researcher or anything, but as an outsider my understanding matches how Mozilla uses the terms. Though words used by specialists and the general public can offer differ...

cookiengineer2d ago

Can you elaborate why those bugs weren't found by e.g. fuzzing in the past?

I'm genuinely curious what "types" of implementation mistakes these were, like whether e.g. it was library usage bugs, state management bugs, control flow bugs etc.

Would love to see a writeup about these findings, maybe Mythos hinted us towards that better fuzzing tools are needed?

2 more replies

nirui2d ago

3 more replies

throw0101c2d ago

Security things are mentioned in the Release Notes [b] pointing to a completely different document [d].

Perhaps sometimes a bug is 'just' a bug, and not a vulnerability.

[a] https://bugzilla.mozilla.org/show_bug.cgi?id=2034980 ; "Can't highlight image scans in Firefox 150+"

[b] https://www.firefox.com/en-CA/firefox/150.0.2/releasenotes/

[c] https://bugzilla.mozilla.org/show_bug.cgi?id=2024918

[d] https://www.mozilla.org/en-US/security/advisories/mfsa2026-4...

Gregaros2d ago

> Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit.

That’s not evident in what you pastedat all.

What you pasted says

> sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

From this one infers that the "180 were sec-high" bugs found are actually exploitsble but known to have been found in the wild, and are NOT mere annoying bugs.

The difference between 180 and 270 does nothing to deflate the signicance, or lack there of, of the implication re: Mythos.

epistasisOP2d ago

Yes, it is not in what I pasted, as I said, "even though they say right below". If you don't believe me then click on either of the links.

mozdeco2d ago

Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

For us this is substantial enough evidence to consider it a security vulnerability at that point, unless shown otherwise and it has always been this way (also for fuzzing bugs).

ZrArm2d ago

> Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

> For us this is substantial enough evidence to consider it a security vulnerability at that point

Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

[1] https://www.mozilla.org/en-US/security/advisories/mfsa2026-3...

mozdeco2d ago

> Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

Yes but if we have a choice between writing exploits and scanning more source, potentially finding more bugs, then of course we prioritize the latter.

sfink2d ago

2 more replies

jerrythegerbil2d ago

Is that number of crashing bugs with PoC available/written down anywhere?

mozdeco1d ago

jeffparsons2d ago

It may be worth noting that Claude can and will (if it believes you own the code, at least) produce PoC exploits for exploitable bugs that it finds.

My only source for this is personal experience, and no, I can't share any evidence of it.

freedomben2d ago

Are you certified for high risk cyber uses? If so then you're correct. If not, then it does not match my experience

cvwright2d ago

The word “exploit” may be doing a lot of work here. In my experience Opus 4.6 is perfectly happy to provide test cases that trigger ASAN, even without the super secret squirrel security access.

But if you ask it to get you a shell it’ll probably tell you to get lost.

jeffparsons7h ago

I don't have any special certification or arrangement with Anthropic; this is vanilla Opus 4.x via Claude Code.

staticassertion1d ago

I'm not and I'm doing it right now.

browningstreet2d ago

This isn’t true anywhere people have to make decisions about what to work on first.

dataflow2d ago

> Mythos didn’t write 271 PoC for vulnerabilities

I think the word you're looking for is exploit?

input_sh2d ago

Original source: https://news.ycombinator.com/item?id=48051079

gnabgib2d ago

16 day old story

Wired: Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox (41 points, 18 comments) https://news.ycombinator.com/item?id=47853649

Ars: Mozilla: Anthropic's Mythos found 271 security vulnerabilities in Firefox 150 (33 points, 8 comments)https://news.ycombinator.com/item?id=47855384

delichon2d ago

tialaramex2d ago

They've only linked a few tickets, so of course maybe when we see all 271 actual distinct things the insight won't apply but all those I examined ended up as some C++ code with a nasty bug in it.

Firefox is written in several languages, only about 25% of it is in C++ but every single one of these issues seems to touch the C++.

mccr82d ago

tialaramex2d ago

> A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free

Sure, but, surely AddressSanitizer would also detect the same problem in the C or Rust which together also make up about 25% of Firefox so... ?

jeroenhd2d ago

It's possible Mythos is a lot better at finding vulnerabilities in C++ code than it is for other languages. After all, these models are also based on pre-existing security analysis.

From what I can tell, a lot of these bugs were hardly C++-specific, they just happened in C++ code. Even the most secure Rust can't magically catch things like TOCTOU issues.

tialaramex2d ago

> Even the most secure Rust can't magically catch things like TOCTOU issues

IshKebab2d ago

It's because they verified the bugs using AddressSanitizer so by construction it was only ever going to find C++ bugs.

tialaramex2d ago

IshKebab2d ago

Yes I was including C in "C++". I dunno how much C Firefox uses.

1 more reply

crummy2d ago

Curious if people think LLMs will lead to more secure or less secure software in five years.

jillesvangurp2d ago

It will probably wipe out a few categories of issues, which is probably a good thing. And those things that still are still insecure can also be translated to some other language.

It's still early days. There are still a lot of quality issues with LLM generated code. But the success/fail rate will probably improve over time.

int32_642d ago

Both. The skilled will use them to find problems, the unskilled will use them to slopcode insecure software the skilled will have to fix.

mc33012d ago

More tools for more people equals more stuff being made on a wider range.

FeepingCreature2d ago

More secure software, but in the same way that the population is net healthier after a plague.

data-ottawa2d ago

I’m just happy we’re talking about security.

That will make software safer alone.

stavros2d ago

That depends on which side has more money.

vga12d ago

More secure, at least in the cases where the tools are properly applied.

But it also represents more easily available opportunities for blackhats to abuse against the projects where these tools were not being applied.

UltraSane2d ago

canucker20161d ago

I think it'll be a war of who has the better LLMs-as-security-scanner.

Ideally, you'd do a comprehensive all-source-code scan, (and the LLM-scanner finds everything during those scans), and fix all the reported defects.

Afterwards, any dev that commits code will run the LLM-scanner on the modified code (and affected areas) and fix any reported defects.

So the black-hat hacker would be shut out unless they get access to an LLM-scanner with better analysis than what the target project is using.

So black-hat hackers would be left with developing their own LLM-scanner better/more efficient than existing major LLM-scanners.

Given enough incentive, they might develop such a tool. Look at the market for zero-day vulnerabilities for smartphones, esp iPhones.

bawolff2d ago

2ndorderthought2d ago

Less secure because of all the ways attacks can scale out and hackers can contribute vulnerabilities to active projects.

deferredgrant2d ago

A vuln finder is useful only if it respects the humans on the other end. Every bogus report taxes the same scarce attention needed for the real bugs.

lschueller2d ago

mccr82d ago

lschueller2d ago

IainIreland2d ago

JoshTriplett2d ago

Havoc2d ago

Results similar to mythos have been duplicated by weaker models.

Think it's more a care of mythos raising widespread awareness that tireless LLMs can be weaponized to dig through code and find that one tiny flaw nobody spotted

sfink2d ago

empath751d ago

MetaverseClub2d ago

I'm curious about how did Mozilla do bug finding before Mythos? Did they use any non-AI bug finding tools?

mccr82d ago

canucker20162d ago

Coverity (similar to lint) scans various open source software products for vulnerabilities.

see https://www.blackduck.com/static-analysis-tools-sast/coverit...

and for Firefox-related alleged defects, see https://scan.coverity.com/projects/firefox

You have to create an account to view the actual reported defects.

There are just over 5000 reported defects still outstanding. I don't know how many overlap with the reported 271 Mythos-reported defects.

rockdoe2d ago

How many of those are false positives though? Probably just over 5000?

You get bug bounties if you report the kind of bugs Mythos identified. There's a reason no-one collected bounties from the "5000 defects" Coverity identified.

IainIreland2d ago

Yeah, fuzzing, sanitizers, and bug bounties were our main pre-AI tools for finding bugs.

MetaverseClub2d ago

it's just sad that Coverity represents the best working C++ static analysis tool.

2 more replies

mccr82d ago

mmooss2d ago

I don't understand much of this paragraph:

* "a crank they can pull that says: ‘Yep, this has the problem,’": as in, ring an alarm? Does the LLM ring th alarm?

* "such that you don’t regress it.”: How is the LLM helping here?

Maybe I'm missing some fundamental unwritten assumption?

mccr82d ago

> eventually land the test case

mmooss2d ago

Thank you; that makes sense.

rem10992d ago

New tools find new bugs, but the oligarchy newspapers report on Mythos and not on clang-22.0.

sfink2d ago

ChrisArchitect2d ago

[dupe] Discussion on source: https://news.ycombinator.com/item?id=48051079

j / k navigate · click thread line to collapse