I would concede a valid point you made:
> I've used them to successfully debug small issues occurring in my codebase.
Great! The pattern recognition machine successfully identified pattern.
But, how do you know that it won't flag the repaired pattern because you've added a guard to prevent the behaviour (ie; invalid/out of bounds memory access guarded by a heavy assert on a sized object before even entering the function itself)?
What about patterns that aren't in the training data because humans have a hard time identifying the bad pattern reliably?
The point I'm making is that it's autocomplete; if your case is well covered it will show up: wether you have guards or not (so: noise) and that it will totally miss anything that humans haven't identified before.
It works: absolutely, but there's no reliability and that's sort of inherent in the design.
For security auditing specifically, an unreliable tool isn't just unhelpful: it's actively dangerous, because false confidence is actually worse than an understood ignorance