What happens then is that, for example, the model looks through that particular file, identifies potential problems, and works upwards through the codebase to check whether those could actually be hit.
“Hum, here we assume that the input has been validated, is there any way that might not be the case?”
This is not unique to Mythos. You can already do this with publicly available models. Mythos does appear to be significantly more capable, so it would get better results.
The research discussed here provided models with just a known buggy function, missing the whole process required to find that bug in the first place.