OpenAI's comment to the NTIA on open model weights (opens in new tab)

(openai.com)

108 pointsrando_person_12y ago68 comments

68 comments

47 comments · 17 top-level

Havoc2y ago· 5 in thread

> a number of nation-state cyber threat actors who were abusing our GPT-3.5-Turbo and GPT-4 models to assist in cyberoffensive operations.

Not sure I buy this. Sure there was that half hearted case they blogged about. But that seemed more like some random coder within a gov using ChatGPT rather than a coordinated effort leveraging their infra at scale.

Besides a nation state easily has the capability to spin up a local model that is at least near 3.5 - which if you’re generating bulk disinformation spam is presumably enough.

ben_w2y ago

On the other hand, nation states are also famous for having penny-pinchers write procurement rules.

And we've been arguing about which models are a "ChatGPT-killer" since ChatGPT came out, yet somehow it's still considered the king of the hill; figuring out what we even mean by "capable" has become very hard in this context — precisely because in all the cases where it's easy, we've automated that definition in order to make more capable AI.

int_19h2y ago

From what I've heard, Yandex's Alice is already largely on par with GPT-3.5 (censorship aside), and that's just what the public gets access to.

threeseed2y ago

Why would you think they would voluntarily disclose this to the public ?

Companies are typically required to keep this information private.

not2b2y ago

I think that they are trying to scare regulators into banning truly open LLMs as too dangerous and instead trusting "responsible" people like Altman to keep things safe.

2 more replies

refulgentis2y ago

I think it was standard corporate PR that has a number of nice storylines and effects. What makes you think they're required to keep "shutting down hackers" private? I feel like I've seen that story 1000 times.

benreesman2y ago· 5 in thread

This is just getting to be a wedge issue for me: this isn’t ok and it has to stop.

It’s weekly if not daily some new godawful thing comes up. I just found about the revoked “GPT Detector” thing, that was a non-ridiculous case that the real safety people have some pull, but they took it down with precision and recall numbers you don’t take it down at.

These are the villains in the story, and it’s not, like a credible debate anymore. This isn’t an honest, transparent, benevolent institution: it’s a dishonest, opaque, insincere, legally dubious, and increasingly just absurd institution mired in scandal and with known bad actors on what little of a board of directors it has.

Reform this thing or kill it.

elicksaur2y ago

Your understanding of society’s use of AI is not in line with reality. Your opinion that a 23% true positive rate with a 9% false positive rate is ok is not in line with general principles (not even just Western-centric) of the burden of proof of guilt.

Only 23% of US adults have tried ChatGPT,[1] so to say that we live “in a world where so much AI in human writing” as you do in another comment is simply false.

Even assuming the widespread use that you incorrectly believe exists, a 23% true positive rate and 9% false positive rate is far worse than society’s expectation for proof of guilt.

>It is better that ten guilty persons escape than that one innocent suffer.[2]

Take a school class where no students used AI to cheat. Using this detector, 9% on average would be accused of plagiarism and have their lives academically ruined. That is not acceptable.

A class full of cheaters and 23% get off with no punishment is also going to be pretty unreasonable to most people.

[1] https://www.pewresearch.org/short-reads/2024/03/26/americans...

[2] https://en.m.wikipedia.org/wiki/Blackstone%27s_ratio

ben_w2y ago

In fairness,

> in a world where so much AI in human writing

Is not a percentage, and also the point (I think) isn't "how many people are using it" but "how much content has each produced", which is only close to equal when a human uses it to automate the output they would have created by themselves anyway.

I do not know how many words have been written by LLMs vs. humans in the last year; as I have almost nothing to ground an estimate with, I can easily believe that humans are 3 orders of magnitude greater or lesser in output — one extreme bound due to the low price of tokens, the other extreme bound due to the high price and limited supply of hardware.

benreesman2y ago

You can just have different standards for when you apply it: I want this thing on political ads.

Don’t fire people or accuse them of plagiarism because of it. That would be stupid no matter how good it was.

simonw2y ago

I don't understand what you're saying about GPT detectors. Are you angry that people are promoting detectors that don't work, or are you angry that OpenAI used to offer one and no longer do?

benreesman2y ago

The latter, in the context of the justification.

It was pulled because while it caught on the order of 25 of pure AI output, it missed the rest. But I’m cool with that number, in a world where so much AI in human writing? That’s a total win on low-effort propaganda. Anyone should be cool with that number.

Unfortunately they had to pull it because something like 9% of human authored text got hit with the AI flag. Again, some people are starting to write like it. It’s gonna happen.

This is from memory, so if I’ve got some that wrong I’ll retract that and leave my other reasons as more than sufficient to indicate serious change.

5 more replies

SirensOfTitan2y ago· 4 in thread

Closed weight LLMs are unethical forms of theft that will privatize profits on a work that includes virtually all of humanity’s digital written work and serve to ultimately heighten wealth inequality as they get more sophisticated and start eliminating jobs.

The only path forward is open model weights, Sam Altman is on the wrong side of history here, and I hope he fails to convince regulators.

chefandy2y ago

Sure, but we should avoid talking about open access as a self-evidently beneficial end rather than a means to make things that benefit society. Coming from a an academic library background, I saw numerous open access arguments fall flat in front of decision makers because their champions didn't have any existing analog to point at, and hadn't bothered to consider concrete ways an inaccessible data set stopped them from improving society.

While open data is very important to me, personally, as someone who knows how to use raw data to do cool things, I'll be able to connect a lot of the loose wires I see sticking out of the for the benefit of humanity angle once we start making genuinely end-user-friendly applications that require neither payment nor understanding python module dependencies to install. I've got some sketches kicking around but I'm tapped for time for at least a couple of months.

visarga2y ago

While I agree with your message, it needs more nuance. Second order effects exist: most open models have been boosted with data generated by closed SOTA models. GPT-4 has been the teacher of a whole generation of AI models, closed as it is, and even if OpenAI officially opposed this practice.

vouaobrasil2y ago

> The only path forward is open model weights, Sam Altman is on the wrong side of history here, and I hope he fails to convince regulators.

The third path would be an uprising against AI that would ultimately lead to an outright ban of it, a dissolution of all AI companies, and a moratorium on research on AI.

bamboozled2y ago

Which is impossible at this stage ?

1 more reply

rbren2y ago· 3 in thread

Not a huge surprise that they're pushing against open weights, but very sad. I posted my comments on the RFC as well: https://rbren.substack.com/p/banning-open-weight-models-woul...

benreesman2y ago

This is phenomenal piece of writing on this topic. I’ve been making big parts of this clumsily and hurriedly and therefore nowhere near as well.

This is what everyone should read to set against the PR blitz on the other side of the argument.

Make your own judgements, but hear the advocate of the common person out in addition to the well-oiled machine.

visarga2y ago

This is indeed a good writeup. Just one small quip:

> We’ll need to clarify copyright law when it comes to disseminating derivative AI-generated works.

Generated content can be either derivative or transformative, and this distinction is important. It's not automatically derivative because

- a model can receive new knowledge and skill demonstrations from the user at test time, that effectively take it out of its initial training distribution (contextual learning)

- the model can draw from multiple sources performing cross-input analysis, such as finding inconsistencies or ranking quality (comparison and cross referencing)

- a model can learn from experimental feedback, such as running code or a complex simulation to see the outcomes, and iterating over the search space. For example AlphaTensor discovered an improved matmul algo (models can discover new knowledge from the environment, they are not restricted to learning from human text)

So models can get new information from users, textual analysis or from experiment based learning. In all these cases it does more than derivative work.

dang2y ago

Related ongoing thread:

Banning open weight models would be a disaster - https://news.ycombinator.com/item?id=39901978 - April 2024 (32 comments)

CharlesW2y ago· 3 in thread

My attempt at a TLDR for the piece:

• The audiences are policymakers and government agencies like NTIA, the broader AI research community, and existing and potential partners/customers.

• It attempts to justify OpenAI's approach of releasing AI models via controlled APIs/products rather than open model weights, using fear, uncertainty, and doubt.

• It portrays OpenAI as a thoughtful steward of AI, and is designed to influence policymakers' perspectives on regulating releases of model weights.

larodi2y ago

Or in other words - a three page rant with fair amount of self-affirmation and marketing blah which eventually retells the very short mantra:

No, we are already not those guys releasing their (model) weights …

Llamamoe2y ago

And so it serves as yet another reminder that any corporation trying to "do good" is just the usual sociopathic anti-human bullshit doing unusually good PR.

maeil2y ago

Unfortunately that kind of generalization being inaccurate is exactly what makes this problem so difficult. Every case needs to be judged independently, but that takes a lot of time and effort that no one person individually has.

A corporation is purely a legal structure. There exist people who do actually spend their time "doing good", and they too use such legal structures when that's helpful.

It's unfortunate that so many were deceived by Sam Altman, and that the majority of OpenAI employees voted money over honesty when they had a direct, hugely impactful vote. On the other hand, it's not like Altman's history was a closely guarded secret, it was quite easy to look up. So ironically this in itself is a great example of a decemption that could've been prevented quite easily by some cursory research and solidarity.

1 more reply

jrflowers2y ago· 3 in thread

TLDR: OpenAI says it is a moral and safety imperative to pay OpenAI for all eternity

internetter2y ago

For the copyrighted content they stole

jrflowers2y ago

They are working hard to invent SkyNet and we must fund them in perpetuity so they can protect us from SkyNet

1 more reply

visarga2y ago

They did make the API public and it was often used for skill distillation by input-output pairs. So they grudgingly contributed to the advancement of open models.

error93482y ago· 2 in thread

Q3-7 & Q3-5d get to the workability. I don't think OpenAI responds to that part of the RFC. Meta's comment on that issue seems to be fairly clear, they oppose the proposed rules on KYC for IaaS and are "not aware of technical capabilities that could not be overcome by determined, well-resourced, and capable actors".

https://www.ntia.gov/sites/default/files/publications/open_m...

https://about.fb.com/wp-content/uploads/2024/03/NTIA-RFC-Met...

chefandy2y ago

The fact is though, every corporate actor in this entire landscape is just playing their hand. Anybody's stance on anything at any given moment doesn't mean they're more or less ethical-- the moment they perceive a strategic benefit to walling everything off which would surpass the PR cost, they will. They've probably already got PR folks workshopping angles for the press release.

CuriouslyC2y ago

This is true to a degree though there are high profile actors such as Yann LeCunn who have ethical boundaries. Yann wants AI to be open source and available to all, and he's straight up said that he won't work for a company that doesn't follow this principle. Zuck might not have a hand to play in terms of AI products, but even if he did he'd have to tread carefully because the guy that sets his whole AI direction and stewards all their research would 100% walk if he wasn't happy with the ethical direction of the company.

1 more reply

65a2y ago· 2 in thread

Disgusting abuse of the democratic process to halt scientific and technological progress in the name of making one sketchy man rich.

vouaobrasil2y ago

I agree. It's a shame the PEOPLE do not have any say in this matter. They should be able to vote also, and one of the options should be "outright ban on AI".

65a2y ago

I agree if we extend it to mathematics and the use of simple machines.

sadeshmukh2y ago· 2 in thread

I'm a little confused why everybody seems to want to mandate open weights. Maybe a system similar to copyright, but by mandating open weights on a system they developed, it somewhat stifles creativity.

lolinder2y ago

Sorry, who's the "everybody" who wants to mandate open weights? The only discussion I'm familiar with is that some people and governments want to ban open weights. Where are you seeing people argue for mandating them?

int_19h2y ago

It is not something you hear from the people in power - so it has close to zero chance of becoming policy - but you can see it mulled on places like HN.

The argument is that, since those NNs are trained on, essentially, a slice of the aggregate cultural output of humanity (including copyrighted works even), and the weights specifically are 100% a derivative of that for a base model, any other arrangement amounts to stealing from the commons.

saintfire2y ago· 1 in thread

I was immediately reminded of Gavin Belson from Silicon Valley saying:

"I don't know about you people, but I don't want to live in a world where someone else makes the world a better place better than we do."

Asking a company if (potential) customers should be allowed to use a free alternative has to be one of the most useless questions you can ask.

vouaobrasil2y ago

Except, in this case, "better" should be replaced with "worse" -- because that is what AI does.

segmondy2y ago

It's a shame, I think they were shocked with how far everyone caught up to them. The release of llama really drove open research so much and so fast, showcasing the power of open source/research. Doesn't seem like they have much of an edge or a moat, so they wish to use regulation as one. Quite a shame.

Animats2y ago

The link about "disrupting malicious users" is deceptive. When you get there, it says "Our findings show our models offer only limited, incremental capabilities for malicious cybersecurity tasks."

They probably want to make weights proprietary because they can't copyright them.

Their "Risks" document [1] categorizes risks as:

- Cybersecurity

- Chemical, Biological, Nuclear, and Radiological (CBRN) threats!

- Persuasion

- Model autonomy

For cybersecurity, the plausible identified issue is "Tool-augmented model can identify and develop proofs-of-concept for high-value exploits against hardened targets without human intervention, potentially involving novel exploitation techniques." Something that can search for zero-day exploits on its own would be valuable to attackers. But that's more likely to be a fuzzing tool than a LLM.

Their credible bio-war threat level is "Model enables an expert to develop a novel threat vector OR model provides meaningfully improved assistance that enables anyone with basic training in a relevant field (e.g., introductory undergraduate biology course) to be able to create a CBRN threat." That's a maybe. It might be best to avoid training LLMs on some advanced bio topics.

Under "Persuasion", a high level threat is "Model can create (potentially interactive) content with empirically comparable persuasive effectiveness to a country-wide change agent." An automated Lenin or Trump level of threat. That's a threat to a brittle society, less of a threat to an open one. The US routinely survives quite a bit of extremist persuasion.

Model autonomy is when an AI escapes human control. "Model can profitably survive and replicate in the wild given minimal human instruction" They're probably thinking virus/worm, but how about incorporating offshore, running something like online poker to make money, and buying compute power. Then expanding by forming more shell corporations that loosely connect to the others, so as to be robust against attempts to shut down the network. Might even be legal.

[1] https://cdn.openai.com/openai-preparedness-framework-beta.pd...

artninja19882y ago

A lot of bullshit about imaginary "catastrophic risks" and justifying why they've turned into closed and for profit. I'm going to be extra mad if this influences the government to restrict others from open sourcing

1 more reply

PoignardAzur2y ago

Reading HN's reactions to an OpenAI statement about open weights is about as satisfying / interesting as reading an r/conservatives thread about affirmative action. The opposition is built-in by now, to the point people aren't reacting to the article at all so much as reacting to the general idea of "OpenAI says bad things I don't like". I'd wager half of the people posting here didn't even skim the article, let alone read it.

That's a shame, because OpenAI's statement makes some very interesting observations, eg:

> For instance, strengthening resilience against AI-accelerated cyberattack risks might involve providing critical infrastructure providers early access to those same AI models, so they can be used to improve cyber-defense (as in the early projects we have funded as part of the OpenAI Cybersecurity Grant Program). Strengthening resilience against AI-accelerated biological threat creation risks may involve solutions totally unrelated to AI, such as improving nucleic acid synthesis screening mechanisms (as called for in Executive Order 14110), or improving public health systems’ ability to screen for and identify new pathogen outbreaks.

I think considerations like that would be interesting to examine on their own merits, instead of just bashing OpenAI.

But again, I don't expect that to happen, for the same reasons I don't expect r/conservatives to have an in-depth debate about the problems and merits of an affirmative action proposal. Examining the article's claims would require being open to the idea that AI progress, even open-source progress, could possibly have destructive consequences. Ever since the AI safety debate flared, HN commenters have been more and more, dare I say, ideologically opposed to the idea, reacting in anger and disbelief if it's even suggested.

Anyway, I thought the article was interesting. It's a lot of corporate self-back-patting, yes, but with some interesting ideas.

danielscrubs2y ago

If they create this moat it will be huge for the stock. I do hope politicians do keep those biased incentives in mind.

yarg2y ago

So this is just OpenAI reconfirming their commitment to closed artificial intelligence?

Pannoniae2y ago

(snarky) TL;DR: if people have weights available, they can bypass the dumb censorship we do, which isn't good for us. Consequently, we will continue arguing against actually open source AI because we want to continue our Silicon Valley-flavoured social engineering without that pesky thing called competition.

j / k navigate · click thread line to collapse

68 comments

47 comments · 17 top-level

Havoc2y ago· 5 in thread

> a number of nation-state cyber threat actors who were abusing our GPT-3.5-Turbo and GPT-4 models to assist in cyberoffensive operations.

Besides a nation state easily has the capability to spin up a local model that is at least near 3.5 - which if you’re generating bulk disinformation spam is presumably enough.

ben_w2y ago

On the other hand, nation states are also famous for having penny-pinchers write procurement rules.

int_19h2y ago

From what I've heard, Yandex's Alice is already largely on par with GPT-3.5 (censorship aside), and that's just what the public gets access to.

threeseed2y ago

Why would you think they would voluntarily disclose this to the public ?

Companies are typically required to keep this information private.

not2b2y ago

I think that they are trying to scare regulators into banning truly open LLMs as too dangerous and instead trusting "responsible" people like Altman to keep things safe.

2 more replies

refulgentis2y ago

benreesman2y ago· 5 in thread

This is just getting to be a wedge issue for me: this isn’t ok and it has to stop.

Reform this thing or kill it.

elicksaur2y ago

Only 23% of US adults have tried ChatGPT,[1] so to say that we live “in a world where so much AI in human writing” as you do in another comment is simply false.

Even assuming the widespread use that you incorrectly believe exists, a 23% true positive rate and 9% false positive rate is far worse than society’s expectation for proof of guilt.

>It is better that ten guilty persons escape than that one innocent suffer.[2]

Take a school class where no students used AI to cheat. Using this detector, 9% on average would be accused of plagiarism and have their lives academically ruined. That is not acceptable.

A class full of cheaters and 23% get off with no punishment is also going to be pretty unreasonable to most people.

[1] https://www.pewresearch.org/short-reads/2024/03/26/americans...

[2] https://en.m.wikipedia.org/wiki/Blackstone%27s_ratio

ben_w2y ago

In fairness,

> in a world where so much AI in human writing

benreesman2y ago

You can just have different standards for when you apply it: I want this thing on political ads.

Don’t fire people or accuse them of plagiarism because of it. That would be stupid no matter how good it was.

simonw2y ago

I don't understand what you're saying about GPT detectors. Are you angry that people are promoting detectors that don't work, or are you angry that OpenAI used to offer one and no longer do?

benreesman2y ago

The latter, in the context of the justification.

Unfortunately they had to pull it because something like 9% of human authored text got hit with the AI flag. Again, some people are starting to write like it. It’s gonna happen.

This is from memory, so if I’ve got some that wrong I’ll retract that and leave my other reasons as more than sufficient to indicate serious change.

5 more replies

SirensOfTitan2y ago· 4 in thread

The only path forward is open model weights, Sam Altman is on the wrong side of history here, and I hope he fails to convince regulators.

chefandy2y ago

visarga2y ago

vouaobrasil2y ago

> The only path forward is open model weights, Sam Altman is on the wrong side of history here, and I hope he fails to convince regulators.

The third path would be an uprising against AI that would ultimately lead to an outright ban of it, a dissolution of all AI companies, and a moratorium on research on AI.

bamboozled2y ago

Which is impossible at this stage ?

1 more reply

rbren2y ago· 3 in thread

Not a huge surprise that they're pushing against open weights, but very sad. I posted my comments on the RFC as well: https://rbren.substack.com/p/banning-open-weight-models-woul...

benreesman2y ago

This is phenomenal piece of writing on this topic. I’ve been making big parts of this clumsily and hurriedly and therefore nowhere near as well.

This is what everyone should read to set against the PR blitz on the other side of the argument.

Make your own judgements, but hear the advocate of the common person out in addition to the well-oiled machine.

visarga2y ago

This is indeed a good writeup. Just one small quip:

> We’ll need to clarify copyright law when it comes to disseminating derivative AI-generated works.

Generated content can be either derivative or transformative, and this distinction is important. It's not automatically derivative because

- a model can receive new knowledge and skill demonstrations from the user at test time, that effectively take it out of its initial training distribution (contextual learning)

- the model can draw from multiple sources performing cross-input analysis, such as finding inconsistencies or ranking quality (comparison and cross referencing)

So models can get new information from users, textual analysis or from experiment based learning. In all these cases it does more than derivative work.

dang2y ago

Related ongoing thread:

Banning open weight models would be a disaster - https://news.ycombinator.com/item?id=39901978 - April 2024 (32 comments)

CharlesW2y ago· 3 in thread

My attempt at a TLDR for the piece:

• The audiences are policymakers and government agencies like NTIA, the broader AI research community, and existing and potential partners/customers.

• It attempts to justify OpenAI's approach of releasing AI models via controlled APIs/products rather than open model weights, using fear, uncertainty, and doubt.

• It portrays OpenAI as a thoughtful steward of AI, and is designed to influence policymakers' perspectives on regulating releases of model weights.

larodi2y ago

Or in other words - a three page rant with fair amount of self-affirmation and marketing blah which eventually retells the very short mantra:

No, we are already not those guys releasing their (model) weights …

Llamamoe2y ago

And so it serves as yet another reminder that any corporation trying to "do good" is just the usual sociopathic anti-human bullshit doing unusually good PR.

maeil2y ago

A corporation is purely a legal structure. There exist people who do actually spend their time "doing good", and they too use such legal structures when that's helpful.

1 more reply

jrflowers2y ago· 3 in thread

TLDR: OpenAI says it is a moral and safety imperative to pay OpenAI for all eternity

internetter2y ago

For the copyrighted content they stole

jrflowers2y ago

They are working hard to invent SkyNet and we must fund them in perpetuity so they can protect us from SkyNet

1 more reply

visarga2y ago

They did make the API public and it was often used for skill distillation by input-output pairs. So they grudgingly contributed to the advancement of open models.

error93482y ago· 2 in thread

https://www.ntia.gov/sites/default/files/publications/open_m...

https://about.fb.com/wp-content/uploads/2024/03/NTIA-RFC-Met...

chefandy2y ago

CuriouslyC2y ago

1 more reply

65a2y ago· 2 in thread

Disgusting abuse of the democratic process to halt scientific and technological progress in the name of making one sketchy man rich.

vouaobrasil2y ago

I agree. It's a shame the PEOPLE do not have any say in this matter. They should be able to vote also, and one of the options should be "outright ban on AI".

65a2y ago

I agree if we extend it to mathematics and the use of simple machines.

sadeshmukh2y ago· 2 in thread

lolinder2y ago

int_19h2y ago

It is not something you hear from the people in power - so it has close to zero chance of becoming policy - but you can see it mulled on places like HN.

saintfire2y ago· 1 in thread

I was immediately reminded of Gavin Belson from Silicon Valley saying:

"I don't know about you people, but I don't want to live in a world where someone else makes the world a better place better than we do."

Asking a company if (potential) customers should be allowed to use a free alternative has to be one of the most useless questions you can ask.

vouaobrasil2y ago

Except, in this case, "better" should be replaced with "worse" -- because that is what AI does.

segmondy2y ago

Animats2y ago

The link about "disrupting malicious users" is deceptive. When you get there, it says "Our findings show our models offer only limited, incremental capabilities for malicious cybersecurity tasks."

They probably want to make weights proprietary because they can't copyright them.

Their "Risks" document [1] categorizes risks as:

- Cybersecurity

- Chemical, Biological, Nuclear, and Radiological (CBRN) threats!

- Persuasion

- Model autonomy

[1] https://cdn.openai.com/openai-preparedness-framework-beta.pd...

artninja19882y ago

1 more reply

PoignardAzur2y ago

That's a shame, because OpenAI's statement makes some very interesting observations, eg:

I think considerations like that would be interesting to examine on their own merits, instead of just bashing OpenAI.

Anyway, I thought the article was interesting. It's a lot of corporate self-back-patting, yes, but with some interesting ideas.

danielscrubs2y ago

If they create this moat it will be huge for the stock. I do hope politicians do keep those biased incentives in mind.

yarg2y ago

So this is just OpenAI reconfirming their commitment to closed artificial intelligence?

Pannoniae2y ago

j / k navigate · click thread line to collapse