undefined | Better HN

0 pointsdr_dshiv5mo ago0 comments

Prediction: Vibe coding systems will be better at security in 2 years than 90% of devs.

0 comments

input_sh5mo ago

Prediction: it won't.

You can't fit every security consideration into the context window.

dr_dshivOP5mo ago

90% of human devs are not aware of every security consideration.

input_sh5mo ago

90% of human devs can fit more than 3-5 files into their short-term memory.

They also know not to, say, temporarily disable auth to be able to look at the changes they've made on a page hidden behind auth, which is what I observed Gemini 3 Pro doing just yesterday.

dr_dshivOP5mo ago

Ok, and that’s your prediction for 2 years from now? It’d be quite remarkable if humans had a bigger short term memory than LLMs in 2 years. Or that the kind of dumb security mistakes LLMs make today don’t trigger major, rapid improvements.

1 more reply

eric-burel5mo ago

You may want to read about agentic AI, you can for instance call an LLM multiple times with different security consideration everytime.

input_sh5mo ago

There's about a dozen workarounds around context limits, agents being one of them, MCP servers being another one, AGENTS.md being the third one, but none of them actually solve the issue of a context window being so small that it's useless for anything even remotely complex.

Let's imagine a codebase that can fit onto a revolutionary piece of technology known as a floppy drive. As we all know, a floppy drive can store <2 megabytes of storage. But a 100k tokens is only about 400 kilobytes. So, to process the whole codebase that can fit onto a floppy drive, you need 5 agents plus the sixth "parent process" that those 5 agents will report to.

Those five agents can report "no security issues found" in their own little chunk of the codebase to the parent process, and that parent process will still be none the wiser about how those different chunks interact with each other.

eric-burel5mo ago

You can have an agent that focuses on studying the interactions. What you're saying is that an AI cannot find every security issue but neither do humans otherwise we wouldn't have security breaches in the first place. You are describing a relatively basic agentic setup mostly using your AI-assisted text editor but a commercial security bot is a much more complex beast hopefully. You replace context by memory and synthesis for instance, the same way our brain works.

joshribakoff5mo ago

In one instance it could not even describe why a test is bad unit test (asserting true is equal to true), which doesn’t even require context or multi file reasoning.

Its almost as if it has additional problems beyond the context limits :)

1 more reply

joshribakoff5mo ago

You may want to try using it, anecdotes often differ from theories, especially when they are being sold to you for profit. It takes maybe a few days to see a pattern of ignoring simple instructions even when context is clean. Or one prompt fixes one issue and causes new issues, rinse and repeat. It requires human guidance in practice.

ethbr15mo ago

Strongman: LLMs aren't a tool, they're fuzzy automation.

And what keeps security problems from making it into prod in the real world?

Code review, testing, static and dynamic code scanning, and fuzzing.

Why aren't these things done?

Because there isn't enough people-time and expertise.

So in order for LLMs to improve security, they need to be able to improve our ability to do one of: code review, testing, static and dynamic code scanning, and fuzzing.

It seems very unlikely those forms of automation won't be improved in the near future by even the dumbest form of LLMs.

And if you offered CISOs a "pay to scan" service that actually worked cross-language and -platform (in contrast to most "only supported languages" scanners), they'd jump at it.

1 more reply

windexh8er5mo ago

And that buys you what, exactly? Your point is 100% correct and why LLMs are no where near able to manage / build complete simple systems and surely not complex ones.

Why? Context. LLMs, today, go off the rails fairly easily. As I've mentioned in prior comments I've been working a lot with different models and agentic coding systems. When a code base starts to approach 5k lines (building the entire codebase with an agent) things start to get very rough. First of all, the agent cannot wrap it's context (it has no brain) around the code in a complete way. Even when everything is very well documented as part of the build and outlined so the LLM has indicators of where to pull in code - it almost always cannot keep schemas, requirements, or patterns in line. I've had instances where APIs that were being developed were to follow a specific schema, should require specific tests and should abide by specific constraints for integration. Almost always, in that relatively small codebase, the agentic system gets something wrong - but because of sycophancy - it gleefully informs me all the work is done and everything is A-OK! The kicker here is that when you show it why / where it's wrong you're continuously in a loop of burning tokens trying to put that train back on the track. LLMs can't be efficient with new(ish) code bases because they're always having to go lookup new documentation and burning through more context beyond what it's targeting to build / update / refactor / etc.

So, sure. You can "call an LLM multiple times". But this is hugely missing the point with how these systems work. Because when you actually start to use them you'll find these issues almost immediately.

joshribakoff5mo ago

To add onto this, it is a characteristic of their design to statistically pick things that would be bad choices, because humans do too. It’s not more reliable than just taking a random person off the street of SF and giving them instructions on what to copy paste without any context. They might also change unrelated things or get sidetracked when they encounter friction. My point is that when you try to compensate by prompting repeatedly, you are just adding more chances for entropy to leak in — so I am agreeing with you.

1 more reply

aduwah5mo ago

This will age badly

dr_dshivOP5mo ago

That’s why we make concrete measurable predictions.

MangoToupe5mo ago

Agreed, but "vibe coding will be better at security" is not one of them. Better by which metric, against which threat model, with which stakes? What security even means for greenfield projects is inherently different than for hardened systems. Vibe coding is sufficient for security today because it's not used for anything that matters.

Cthulhu_5mo ago

It'll play a role in both securing and security research I'm sure, but I'm not confident it'll be better.

But also, you'd need to have some metrics - how good are developers at security already? What if the bar is on the floor and LLM code generators are already better?

wizzledonker5mo ago

Only if they work in a fundamentally different manner. We can't solve that problem the way we are building LLMs now.

j / k navigate · click thread line to collapse

0 comments

input_sh5mo ago

Prediction: it won't.

You can't fit every security consideration into the context window.

dr_dshivOP5mo ago

90% of human devs are not aware of every security consideration.

input_sh5mo ago

90% of human devs can fit more than 3-5 files into their short-term memory.

They also know not to, say, temporarily disable auth to be able to look at the changes they've made on a page hidden behind auth, which is what I observed Gemini 3 Pro doing just yesterday.

dr_dshivOP5mo ago

1 more reply

eric-burel5mo ago

You may want to read about agentic AI, you can for instance call an LLM multiple times with different security consideration everytime.

input_sh5mo ago

eric-burel5mo ago

joshribakoff5mo ago

In one instance it could not even describe why a test is bad unit test (asserting true is equal to true), which doesn’t even require context or multi file reasoning.

Its almost as if it has additional problems beyond the context limits :)

1 more reply

joshribakoff5mo ago

ethbr15mo ago

Strongman: LLMs aren't a tool, they're fuzzy automation.

And what keeps security problems from making it into prod in the real world?

Code review, testing, static and dynamic code scanning, and fuzzing.

Why aren't these things done?

Because there isn't enough people-time and expertise.

So in order for LLMs to improve security, they need to be able to improve our ability to do one of: code review, testing, static and dynamic code scanning, and fuzzing.

It seems very unlikely those forms of automation won't be improved in the near future by even the dumbest form of LLMs.

And if you offered CISOs a "pay to scan" service that actually worked cross-language and -platform (in contrast to most "only supported languages" scanners), they'd jump at it.

1 more reply

windexh8er5mo ago

And that buys you what, exactly? Your point is 100% correct and why LLMs are no where near able to manage / build complete simple systems and surely not complex ones.

joshribakoff5mo ago

1 more reply

aduwah5mo ago

This will age badly

dr_dshivOP5mo ago

That’s why we make concrete measurable predictions.

MangoToupe5mo ago

Cthulhu_5mo ago

It'll play a role in both securing and security research I'm sure, but I'm not confident it'll be better.

But also, you'd need to have some metrics - how good are developers at security already? What if the bar is on the floor and LLM code generators are already better?

wizzledonker5mo ago

Only if they work in a fundamentally different manner. We can't solve that problem the way we are building LLMs now.

j / k navigate · click thread line to collapse