AI-Exploits: Repo of multiple unauthenticated RCEs in AI tools (opens in new tab)

(github.com)

67 pointsDanMcInerney2y ago18 comments

18 comments

Is anyone using any of these services? The only one I actually recognize from their list[1] is Triton Inference Server.

1: https://github.com/protectai/ai-exploits/tree/main/nmap-nse

I recognize most of them, they're all pretty common orchestration, distributed computation, or experiment management tools. Maybe you're just not as integrated on the operations portion of the ML space?

wolftickets2y ago

[I work at Protect AI] - The goal here was initially relatively common tooling around MLOps/Data Science work. All ears here if you have some ideas for other projects to explore.

swatcoder2y ago

The purpose of the repo seems to be to collect an archive of what real-world vulnerabilities look like, to inform service implementors and security researchers in their future work.

I suppose I’m idly curious about the answer to your question too, but paying too much attention to the specific targets feels like it’s missing the point and purpose of the collection.

spmurrayzzz2y ago

h2o is definitely somewhat popular specifically for LLMs, but ray is certainly widely used for distributed training workloads

gumballindie2y ago

No wonder people working in ai think ai will replace programmers, given the prevalent lack of experience with actual programming among them.

Having said that, the Achilles heel of ai is data. The lower the quality the more powerful the attack.

I imagine if someone wanted to mess about with it on a serious scale they’d go for the jugular - the data. Write content and create hundreds or thousands of code repositories with subtle issues and bang, you’ve compromised thousands and thousands of unsuspecting folks relying on ai to create code, or any other type of content.

dwringer2y ago

I'm not sure... hundreds or thousands of code repositories with subtle issues sounds like... the real world of code repositories. And I'd think through analogy and redundancy of some common algorithms, the LLM trained that way might conceivably be able to FIX many of those errors.

gumballindie2y ago

Someone should build a poc. Ai doesnt know things other than what it’s ingested. So for such an attack to be successful you’d need to tilt the statistic towards problematic code. You’d need loads and loads of repositories but its definitely doable.

1 more reply

wolftickets2y ago

[I work at Protect AI] You're spot on for data being the jugular, interestingly with exploits like this as an attacker you could quickly go for attacking model content but also have credentials that would grant you access to data in many cases.

These tools can serve as the first opening but a sizable one when looking to attack an enterprise more broadly.

gumballindie2y ago

Indeed. I am thinking that one way to protect data and ensure its integrity is to somehow use agents trained on trusted sources to validate that the content is secure? For instance to detect “injections” of malicious or ill written code. Same for other types of content, but difficult.

Suppose someone magically creates thousands of repositories that write about a specific way of doing c pointers but all allow for buffer overflows, or sql queries with subtle ways to inject strings.

One way to defend is each data source that goes into training is to have an ai agent asses the input sources.

But even so it’s extremely difficult to catch convoluted attacks (ie when an exploit can be made upon meeting certain criteria).

Until then i’d consider any code written by an ai and unsupervised by a competent person as potentially tainted.

swyx2y ago

> Protect AI is the first company focused on the security of AI and ML Systems creating a new category we call MLSecOps.

alright i looked you up, congrats on your fundraising. is there like an OWASP top 10 vuln list for MLSecOps? does it differ between traditional ML apps and LLM apps?

1 more reply

waihtis2y ago

Nice work, just saw these pop up on the official CVE feed

RomanPushkin2y ago

How does it work? Can't understand from the description

byt3bl33d3r2y ago

(I work for ProtectAI) We added a quick demo to the Readme [1]

[1] https://github.com/protectai/ai-exploits?tab=readme-ov-file#...

friendlynokill2y ago

> With the release of this repository, Protect AI hopes to demystify to the Information Security community what pratical attacks against AI/Machine Learning infrastructure look like in the real world and raise awareness to the amount of vulnerable components that currently exist in the AI/ML ecosystem. More vulnerabilities can be found here: November Vulnerability Report

pratical --> practical

j / k navigate · click thread line to collapse

18 comments

aftbit2y ago

Is anyone using any of these services? The only one I actually recognize from their list[1] is Triton Inference Server.

1: https://github.com/protectai/ai-exploits/tree/main/nmap-nse

ianbutler2y ago

wolftickets2y ago

[I work at Protect AI] - The goal here was initially relatively common tooling around MLOps/Data Science work. All ears here if you have some ideas for other projects to explore.

swatcoder2y ago

The purpose of the repo seems to be to collect an archive of what real-world vulnerabilities look like, to inform service implementors and security researchers in their future work.

I suppose I’m idly curious about the answer to your question too, but paying too much attention to the specific targets feels like it’s missing the point and purpose of the collection.

spmurrayzzz2y ago

h2o is definitely somewhat popular specifically for LLMs, but ray is certainly widely used for distributed training workloads

gumballindie2y ago

No wonder people working in ai think ai will replace programmers, given the prevalent lack of experience with actual programming among them.

Having said that, the Achilles heel of ai is data. The lower the quality the more powerful the attack.

dwringer2y ago

gumballindie2y ago

1 more reply

wolftickets2y ago

These tools can serve as the first opening but a sizable one when looking to attack an enterprise more broadly.

gumballindie2y ago

Suppose someone magically creates thousands of repositories that write about a specific way of doing c pointers but all allow for buffer overflows, or sql queries with subtle ways to inject strings.

One way to defend is each data source that goes into training is to have an ai agent asses the input sources.

But even so it’s extremely difficult to catch convoluted attacks (ie when an exploit can be made upon meeting certain criteria).

Until then i’d consider any code written by an ai and unsupervised by a competent person as potentially tainted.

swyx2y ago

> Protect AI is the first company focused on the security of AI and ML Systems creating a new category we call MLSecOps.

alright i looked you up, congrats on your fundraising. is there like an OWASP top 10 vuln list for MLSecOps? does it differ between traditional ML apps and LLM apps?

1 more reply

waihtis2y ago

Nice work, just saw these pop up on the official CVE feed

RomanPushkin2y ago

How does it work? Can't understand from the description

byt3bl33d3r2y ago

(I work for ProtectAI) We added a quick demo to the Readme [1]

[1] https://github.com/protectai/ai-exploits?tab=readme-ov-file#...

friendlynokill2y ago

pratical --> practical

j / k navigate · click thread line to collapse