How magnanimous! They are only thinking of others, you see. They are rejecting their safety pledge for you.
> “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”
Oops, said the quiet part out loud that it’s all about money. “I mean, if all of our competitors are kicking puppies in the face, it doesn’t make sense for us to not do it too. Maybe we’ll also kick kittens while we’re at it”.
For all of you who thought Anthropic were “the good guys”, I hope this serves as a wake up call that they were always all the same. None of them care about you, they only care about winning.
But lucky for the AI companies, most of them are based in place that only has a government on paper and everyone forgot where that paper is.
I mean, yes, that is actually how world works. That is why we need safety, environmental and other anti-fraud regulations. Because without them, competition makes it so that every successful company will fraud, hurt and harm. Those who wont will be taken over by those who do.
To be fair, this is true in nearly all industries and for nearly all companies. Almost everyone is chasing money and monopoly. Not that it makes it right, just pointing out it isn’t unique or even interesting about the AI companies
It sounds like they are in a cutthroat market, and realised they couldn't afford to stake that principle. And that it wouldn't matter if they did – it would just assure them being handicapped in a field where no others followed suit.
Better hills to die on.
Was anyone fooled by this?
I mean, I know this is HN and there is a demographic here that gets all misty eyed about the benevolence of corporations.
It takes a special kind of naivety to believe in those claims.
> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.
Their core argument is that if we have guardrails that others don't, they would be left behind in controlling the technology, and they are the "responsible ones." I honestly can't comprehend the timeline we are living in. Every frontier tech company is convinced that the tech they are working towards is as humanity-useful as a cure for cancer, and yet as dangerous as nuclear weapons.
AI is powerful and AI is perilous. Those two aren't mutually exclusive. Those follow directly from the same premise.
If AI tech goes very well, it can be the greatest invention of all human history. If AI tech goes very poorly, it can be the end of human history.
If they were unrelated, Anthropic wouldn’t be doing this this week because obviously everyone will conflate the two.
With the latest competing models they are now realizing they are an "also" provider.
Sobering up fast with ice bucket of 5.3-codex, Copilot, and OpenCode dumped on their head.
N.B. the time travel aspect also required suspension of disbelief, but somehow that was easier :-)
They're not really, it's always been a form of PR to both hype their research and make sure it's locked away to be monetized.
Curing all cancers would increase population growth by more than 10% (9.7-10m cancer related deaths vs current 70-80m growth rate), and cause an average aging of the population as curing cancer would increase general life expectancy and a majority of the lives just saved would be older people.
We'd even see a jobs and resources shock (though likely dissimilar in scale) as billions of funding is shifted away from oncologists, oncology departments, oncology wards, etc. Billions of dollars, millions of hospital beds, countless specialized professionals all suddenly re-assigned just as in AI.
Honestly the cancer/nuclear/tech comparison is rather apt. All either are or could be disruptive and either are or could be a net negative to society while posing the possibility of the greatest revolution we've seen in generations.
Maybe some of the more naive engineers think that. At this point any big tech businesses or SV startup saying they're in it to usher in some piece of the Star Trek utopia deserves to be smacked in the face for insulting the rest of us like that. The argument is always "well the economic incentive structure forces us to do this bad thing, and if we don't we're screwed!" Oh, so ideals so shallow you aren't willing to risk a tiny fraction of your billions to meet them. Cool.
Every AI company/product in particular is the smarmiest version of this. "We told all the blue collar workers to go white collar for decades, and now we're coming for all the white collar jobs! Not ours though, ours will be fine, just yours. That's progress, what are you going to do? You'll have to renegotiate the entire civilizational social contract. No we aren't going to help. No we aren't going to sacrifice an ounce of profit. This is a you problem, but we're being so nice by warning you! Why do you want to stand in the way of progress? What are you a Luddite? We're just saying we're going to take away your ability to pay your mortgage/rent, deny any kids you have a future, and there's nothing you can do about it, why are you anti-progress?"
Cynicism aside, I use LLMs to the marginal degree that they actually help me be more productive at work. But at best this is Web 3.0. The broader "AI vision" really needs to die
The reason Claude became popular is because it made shit up less often than other models, and was better at saying "I can't answer that question." The guardrails are quality control.
I would rather have more reliable models than more powerful models that screw up all the time.
It is entirely reasonable to not provide tools to break the law by doing mass surveillance on civilian citizens and to insist the tool not be used automatically to kill a human without a human in the loop. Those are unreasonable demands by an unreasonable regime.
Riiiiiight.
This sounds like a lie. But if they are telling the truth, that's a terrible timing nonetheless.
Amd they alone are responsible enough to govern it.
But frankly I feel like the founders of Anthropic and others are victim of the same hallucination.
LLMs are amazing tools. They play back & generate what we prompt them to play back, and more.
Anybody who mistakes this for SkyNet -- an independent consciousness with instant, permanent, learning and adaptation and self-awareness, is just huffing the fumes and just as delusional as Lemoine was 4 years ago.
Everyone of of us should spend some time writing an agentic tool and managing context and the agentic conversation loop. These things are primitive as hell still. I still have to "compact my context" every N tokens and "thinking" is repeating the same conversational chain over and over and jamming words in.
Turns out this is useful stuff. In some domains.
It ain't SkyNet.
I don't know if Anthropic is truly high on their own supply or just taking us all for fools so that they can pilfer investor money and push regulatory capture?
There's also a bad trait among engineers, deeply reinforced by survivor bias, to assume that every technological trend follows Moore's law and exponential growth. But that applie[s|d] to transistors, not everything.
I see no evidence that LLMs + exponential growth in parameters + context windows = SkyNet or any other kind of independent consciousness.
Reminds me of:
https://en.wikipedia.org/wiki/Paradox_of_tolerance
which has the same kind of shitty conclusion.
Claude only talks about safety, but never released anything open source.
All this said I’m surprised China actually delivered so many open source alternatives. Which are decent.
Why westerns (which are supposed to be the good guys) didn’t release anything open source to help humanity ? And always claim they don’t release because of safety and then give the unlimited AI to military? Just bullshit.
Let’s all be honest and just say you only care about the money, and whomever pays you take.
They are businesses after all so their goal is to make money. But please don’t claim you want to save the world or help humans. You just want to get rich at others expenses. Which is totally fair. You do a good product and you sell.
We must build a moat to save humanity from AI.
Please regulate our open-source competitors for safety.
Actually, safety doesn't scale well for our Q3 revenue targets.
‘While there’s value in safety, we value the Pentagon’s dollars more’
That said, I'm not thrilled about this. I joined Anthropic with the impression that the responsible scaling policy was a binding pre-commitment for exactly this scenario: they wouldn't set aside building adequate safeguards for training and deployment, regardless of the pressures.
This pledge was one of many signals that Anthropic was the "least likely to do something horrible" of the big labs, and that's why I joined. Over time, the signal of those values has weakened; they've sacrified a lot to get and keep a seat at the table.
Principled decisions that risk their position at the frontier seem like they'll become even more common. I hope they're willing to risk losing their seat at the table to be guided by values.
that's about as naive as it can be.
if they have any values left at all (which I hope they have) them not being at the table with labs which don't have any left is much worse than them being there and having a chance to influence at least with the leftovers.
that said, of course money > all else.
Pledges are generally non-binding (you can pledge to do no evil and still do it), but fulfill an important function as a signal: actively removing your public pledge to do "no evil" when you could have acted as you wished anyway, switches the market you're marketing to. That's the most worrying part IMO.
The moral failing is all of ours to share.
Write essays about AI safety in the application.
An entire interview round dedicated to pretending that you truly only care about AI safety and not the money.
Every employee you talk to forced to pretend that the company is all about philanthropy, effective altruism and saving the world.
In reality it was a mid-level manager interviewing a mid-level engineer (me), both putting on a performance while knowing fully well that we'd do what the bosses told us to do.
And that is exactly what is happening now. The mission has been scrubbed, and the thousands of "ethical" engineers you hired are all silent now that real money is on the line.
The kind of principles you talk about can only be upheld one level up the food chain. By govts.
Which is why legislatures, the supreme court, central banks, power grid regulators deciding the operating voltage and frequency auto emerge in history. Cause corporations structurally cant do what they do without voilating their prime directive of profit maximization.
Anybody involved should also be prohibited from starting a private company using their IP and catering to the same domain for 5-10 years after they leave.
Non-profits where the CEO makes millions or billions are a joke.
And if e.g. your mission is to build an open browser, being paid by a for-profit to change its behavior (e.g. make theirs the default search engine) should be prohibited too.
Could you describe the model that you think might work well?
If regular corporations are sued for not acting in the interests of shareholders, that would suggest that one could file a suit for this sort of corporate behavior.
I'm not even a lawyer (I don't even play one on TV) and public benefit corporations seem to be fairly new, so maybe this doesn't have any precedent in case law, but if you couldn't sue them for that sort of thing, then there's effectively no difference between public benefit corporations and regular corporations.
“At this point”? It was always the case, it’s just harder to hide it the more time passes. Anyone can claim anything they want about themselves, it’s only after you’ve had a chance to see them in the situations which test their words that you can confirm if they are what they said.
The article yes, but we cannot be sure about its topic. We definitely cannot claim that they are unrelated. We don't know. It's possible that the two things have nothing to do with each other. It's also possible that they wanted to prevent worse requests and this was a preventive measure.
I don't know enough to evaluate this or other decisions. I'm just glad someone is trying to care, because the default in today's world is to aggressively reject the larger picture in favor of more more more. I don't know how effective Anthropic's attempts to maintain some level of responsibility can be, but they've at least convinced me that they're trying. In the same way that OpenAI, for example, have largely convinced me that they're not. (Neither of those evaluations is absolute; OpenAI could be much worse than it is.)
> The meeting between Hegseth and Amodei was confirmed by a defense official who was not authorized to comment publicly and spoke on condition of anonymity.
https://fortune.com/2026/02/24/hegseth-to-meet-with-anthropi...
"n general, fascist governments exercised control over private property but they did not nationalize it. Scholars also noted that big business developed an increasingly close partnership with the Italian Fascist and German Nazi governments after they took power. Business leaders supported the government's political and military goals. In exchange, the government pursued economic policies that maximized the profits of its business allies.[8]"
Then they ignored the researchers warning about what it could do, and I said nothing. It sounded like science fiction.
Then they gave it control of things that matter, power grids, hospitals, weapons, and I said nothing. It seemed to be working fine.
Then something went wrong, and no one knew how to stop it, no one had planned for it, and no one was left who had listened to the warnings.
> Then they ignored the researchers warning about what it could do, and I...
...tried it and became an eager early adopter and evangelist. It sounded like something from a dystopian science function novel I enjoyed.
> Then [I] gave it control of things that matter, power grids, hospitals, weapons, and...
...my startup was doing well, and I was happy. We should be profitable next quarter.
> Then something went wrong, and no one knew how to stop it, no one had planned for it...
...and I was guilty as fuck,
FTFY, to fit the HN crowd.
This is the problem with every AI safety scenario like this. It has a level of detachment from reality that is frankly stark.
If linesman stop showing up to work for a week, the power goes out. The US has show that people with "high powered" rifles can shut down the grid.
We are far far away from a sort of world where turning AI off is a problem. There isnt going to be a HAL or Terminator style situation when the world is still "I, Pencil".
A lot of what safety amounts to is politics (National, not internal, example is Taiwan a country). And a lot more of it is cultural.
This is how all of these companies work. They’ll follow some ethical code or register as a PBC until that undermined profits.
These companies are clearly aiming at cheapening the value of white collar labor. Ask yourself: will they steward us into that era ethically? Or will they race to transfer wealth from American workers to their respective shareholders?
It's like a snake eating its own tail.
It’s important to remember that a company’s primary purpose is profit, especially when it’s accountable to shareholders. That isn’t inherently bad, but the occasional moral posturing used to serve that goal can be irritating.
Focusing on Dario, his exact quote IIRC was "50% of all white collar jobs in 5 years" which is still a ways off, but to check his track record, his prediction on coding was only off by a month or so. If you revisit what he actually said, he didn't really say AI will replace 90% of all coders, as people widely report, he said it will be able to write 90% of all code.
And dhese days it's pretty accurate. 90% of all code, the "dark matter" of coding, is stuff like boilerplate and internal LoB CRUD apps and typical data-wrangling algorithms that Claude and Codex can one-shot all day long.
Actually replacing all those jobs however will take time. Not just to figure out adoption (e.g. AI coding workflows are very different from normal coding workflows and we're just figuring those out now), but to get the requisite compute. All AI capacity is already heavily constrained, and replacing that many jobs will require compute that won't exist for years and he, as someone scrounging for compute capacity, knows that very well.
But that just puts an upper limit on how long we have to figure out what to do with all those white collar professionals. We need to be thinking about it now.
Council on Foreign Relations, 11 months ago: "In 12 months, we may be in a world where AI is essentially writing all of the code."
Axios interview, 8 months ago: "[...] AI could soon eliminate 50% of entry-level office jobs."
The Adolescence of Technology (essay), 1 month ago: "If the exponential continues—which is not certain, but now has a decade-long track record supporting it—then it cannot possibly be more than a few years before AI is better than humans at essentially everything."
Don't worry, I know exactly why. $
General population: How will AI get to the point where it destroys humanity?
Yudkowsky: [insert some complicated argument about instrumented convergence and deception]
The government: because we told you to.
Again, not saying that AI is useless or anything. Just that we're more likely to cause our own downfall with weaker AI, than some abstract super AGI. The bar for mass destruction and oppression is lower than the bar for what we typically think of as intelligence for the benefit for humanity ( with the right systems in place, current AI systems are more than enough to get the job done - hence why the Pentagon wants it so bad...)
If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil), and that might mean releasing models that are safer and more steerable than others (even if, unfortunately, they are not 100% up to Anthropic’s goals)
Dogmatism, while great, has its time and place, and with a thousand bad actors in the LLM space, pragmatism wins better.
I genuinly curious why they are so holy to you, when to me I see just another tech company trying to make cash
Edit: Reading some of the linked articles, I can see how Anthropic CEO is refusing to allow their product for warfare (killing humans), which is probably a good thing that resonates with supporting them
I don't think it's going to be as easy to tell as you think that they might be becoming evil before it's too late if this doesn't seem to raise any alarm bells to you that this is already their plan
https://apnews.com/article/anthropic-hegseth-ai-pentagon-mil...
> I take significant responsibility for this change.
https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsibl...
> Holden Karnofsky, who co-founded the EA charity evaluator GiveWell, says that while he used to work on trying to help the poor, he switched to working on artificial intelligence because of the “stakes”:
> “The reason I currently spend so much time planning around speculative future technologies (instead of working on evidence-backed, cost-effective ways of helping low-income people today—which I did for much of my career, and still think is one of the best things to work on) is because I think the stakes are just that high.”
> Karnofsky says that artificial intelligence could produce a future “like in the Terminator movies” and that “AI could defeat all of humanity combined.” Thus stopping artificial intelligence from doing this is a very high priority indeed.
https://www.currentaffairs.org/news/2022/09/defective-altrui...
He is just giving everyone permission to do bad things by saying a lot of words around it.
"move fast and break things" ?
Empty words. I would like to know one single meaningful way he will be held responsible for any negative effects.
Incredibly long and verbose. I will fall short of accusing him of using an AI to generate slop, but whatever happened to people's ability to make short, strong, simple arguments?
If you can't communicate the essence of an argument in a short and simple way, you probably don't understand it in great depth, and clearly don't care about actually convincing anybody because Lord knows nobody is going to RTFA when it's that long...
At best, you're just trying to communicate to academics who are used to reading papers... Need to expect better from these people if we want to actually improve the world... Standards need to be higher.
* Our shareholders will probably sue us
Anthropic's Responsible Scaling Policy, the hard commitment to never train a model unless safety measures were guaranteed adequate in advance, lasted roughly 2.5 years (Sept 2023 to Feb 2026).
The half-life of idealism in AI is compressing fast. Google at least had the excuse of gradualism over a decade and a half.
Because at this point, it's too broad to be defined in the context of an LLM, so it feels like they removed a blanket statement of "we will not let you do bad things" (or "don't be evil"), which doesn't really translate into anything specific.
It increasingly feels like operating at that scale can require compromises I’m not comfortable making. Maybe that’s a personal limitation—but it’s one I’m choosing to keep.
I’d genuinely love to hear examples of tech companies that have scaled without losing their ethical footing. I could use the inspiration.
Also don’t take investment from anyone who isn’t fully aligned ethically. Be skeptical of promises from people you don’t personally know extremely well.
That may limit you to slower growth, or cap your growth (fine if you want to run a company and take home $2M/ye from it; not fine if you want to be acquired for $100M and retire.) It may also limit you to taking out loans to fund growth that you can’t bootstrap to, which is a different kind of risky.
...only lately?
All it really takes to do some kind of crazy world-dominating thing is some simple mechanisms and base intelligence, which the machines already possess. Using basic tactics like coercion, spoofing, threats, financial leverage, an unsophisticated attacker could cause major damage.
For example, that Meta exec who had their email deleted. Imagine instead one email had a malicious prompt which the bot obeyed. That prompt simply emailed everyone in her contacts list telling them to do something urgently (and possibly prompting other bots who are reading those emails). You could pretty quickly do something like cause a market crash, a nationwide panic, or maybe even an international conflict with no "super intelligence" needed, just human negligence, short-sightedness, and laziness.
Examples would be things like saying there is a threat incoming, a CIA source said so. Another would be that everyone will be fired, Meta is going bankrupt, etc. Its very easy to craft a prompt like that and fire it off to all the execs you can find (or just fire off random emails with plausible sounding emails). Then you just need to hit one and might set off a cascade.
They inserted themselves into the supply chain, and then the government told them that they'll be classified as a supply chain risk unless they get unfettered access to the tech. They knew what they were getting into, but didn't want the competitors to get their slice of the pie.
The government didn't pursue them, Anthropic actively pursued government and defense work.
Talk about selling out. Dario's starting to feel more and more like a swindler, by the day.
Public benefit corporation, hm?
However, Anthropic's business consists mostly of intellectual property-- which is highly mobile. What if Anthropic were to go to Marcron (France) for example or Carney (Canada) or Xi Jinping even and say "You give us work visas and support, we move to your land"?
Hell, isn't Canada (specifically Toronto) the birthplace of deep learning? Why stay in a hostile environment when the land of your birth is welcoming?
A pre-commitment means nothing unless you have the mechanisms in place to enforce it.
A pre-sacrifice would be more effective.
https://xcancel.com/elonmusk/status/2026181748175024510
I don't know where xAI got its training material from, but seeing Musk rewteeting that is refreshing.
The AI startup has refused to remove safeguards that would prevent its technology from being used to target weapons autonomously and conduct U.S. domestic surveillance.
Pentagon officials have argued the government should only be required to comply with U.S. law. During the meeting, Hegseth delivered an ultimatum to Anthropic: get on board or the government would take drastic action, people familiar with the matter said.
https://www.staradvertiser.com/2026/02/24/breaking-news/anth...
I kind of wish they had forced the governments hand and made them do it. Just to show the public how much interference is going on.
They say it wasn't related. Like every thing that has happened across tech/media, the company is forced to do something, then issues statement about 'how it wasn't related to the obvious thing the government just did'.
Makes perfect sense!!
Or, more likely, adding the "core safety promise" was just them playing hard to the government to get a better deal, and the government showed them they can play the same game.
* AI and states cannot peacefully coexist, and AI is not going to be stopped. Therefore, we must begin to deprecate states.
I think it's very unlikely that this is unrelated to the pressure from the US administration, as the anonymous-but-obvious-anthropic-spokesperson asserts.
We're at a point now where the nation states are all totally separate creatures from their constituencies, and the largest three of them are basically psychotic and obsessed with antagonizing one another.
In order to have a peaceful AI age, we need _much_ smaller batches of power in the world. The need for states that claim dominion over whole continents is now behind us; we have all the tools we need to communicate and coordinate over long distances without them.
Please, I pray for a gentle, peaceful anarchism to emerge within the technocratic leagues, and for the elder statesmen of the legacy states to see the writing on the wall and agree to retire with tranquility and dignity.
Humans are, by nature, forgetful and argumentative. Fourteen hundred years ago, the Qur'an said this unequivocally (20:115, 18:54, 22:8, 18:73). Not to moralize here, I'm just saying if camel-herders could build a medieval superpower out of nothing, they knew something we don't.
Any state or system that insists good humans are always nice, smart, cogent, and/or aware is doomed to fail. A Washington or a Cincinnatus that can get out of his own way (and that of society) is rare indeed, a one-in-a-billion soul. We shouldn't sit around and wait for that, while your run-of-the-mill dictator in a funny hat (or a funny toupée for that one orange fellow) has his way with us.
1. AI is military/surveillance technology in essence, like many other information technologies,
2. Any guarantee given by AI companies is void since it can be changed in a day,
3. Tech companies have no real control over how their technology will be used,
4. AI companies may seem over-valued with low profits if you think AI as a civil technology. But their investors probably see them as a part of defense (war) industry.
Given by anyone, actually.
Hegseth gives Anthropic until Friday to back down on AI safeguards
The safeguards dropped are when they will release a model or not based on safety.
The Friday deadline is to allow to use their products for mass surveillance and autonomous weapons systems without a human in the loop.
Anthropic hasn't backed down on those, yet. But they are in a bad situation either way.
If they don't back down, they lose US government contracts, the government gets to do what it wants anyway. It also puts them in a dangerous position with non-governmental bodies.
If they give into the demands, then it puts all AI companies at risk of the same thing.
Personally I think they should move to the EU. The recent EU laws align with Anthropics thinking.
Write essays about AI safety in the application.
An entire interview dedicated to pretending that you truly only care about AI safety and ethics and nothing else.
Every employee you talk to forced to pretend that the company is all about philanthropy, effective altruism and saving the world.
In reality it was a mid-level manager interviewing a mid-level engineer (me), both putting on a performance while knowing fully well that we'd do what the bosses told us to do.
And that is exactly what is happening now. The mission has been scrubbed, and the thousands of "ethical" engineers you hired are all silent now that real money is on the line.
The structural problem is that once you've taken billions in VC, safety becomes a negotiable constraint rather than a core value. The board's fiduciary duty runs toward returns, not toward whatever was in the mission statement. PBC status doesn't change that in practice — there's basically zero enforcement mechanism.
What's wild is how fast the cycle has compressed. Google took maybe 15 years to go from "don't be evil" to removing it from the code of conduct. OpenAI took about 5 years from nonprofit to capped-profit to whatever they are now. Anthropic is speedrunning it in under 3. At this rate the next AI startup will launch as a PBC and pivot before their Series B closes.
> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.
Pledges are a cynical marketing strategy aimed at fomenting a base politics that works to prevent such a regulatory regime.
Is the implication here that Anthropic admits they already can't meet their own risk and safety guidelines? Why else would they have to stop training models?
https://www.npr.org/2026/02/25/nx-s1-5725354/nurses-emigrate...
Anthropic's market cap is going to be huge when they go public. Why do it on Nasdaq when there are so many other exchanges in the world?
I can't help but think about how Google once had "Don't be evil" as their motto.
But the thing with for-profit companies is that when push comes to shove, they will always serve the love of money. I'm just surprised that in an industry churning through trillions, their price is $200 million.
The US is not the only country in the world so the idea that humanity as a whole could somehow regulate this process seemed silly to me.
Even if you got the whole US tech community and the US government on board, there are 6.7bn other people in the world working in unrelated systems, enough of whom are very smart
What would safety applied to the leading 3 mean to you anyways ?
What a gigantic, absolute, pieces of s...
Not because of what they did, which is classic startup playbook but because of the cynicism involved, particularly after all the fuzz they've been making for years about safety. The company itself was founded, allegedly, due to pursuing that as a mission as opposed to OpenAI.
"Hi all, that was a lie, we never really cared." They only missed the "dumb f***s" remark, a la Facebook.
I have not read “If Anybody Builds It, Everybody Dies” but I believe that's also its premise.
Current GenAI is extremely capable but also very weird. For instance, it is extremely smart in some areas but makes extremely elementary mistakes in others (cf the Jagged Frontier.) Research from Anthropic and OpenAI gives us surprising glimpses into what might be happening internally, and how it does not necessarily correspond to the results it produces, and all kinds of non-obvious, striking things happening behind the scenes.
Like models producing different reasoning tokens from what they are really reasoning about internally!
Or models being able to subliminally influence derivative models through opaque number sequences in training data!
Or models "flipping the evil bit" when forced to produce insecure code and going full Hitler / SkyNet!
Or the converse, where models produced insecure code if the prompt includes concepts it considers "evil" -- something that was actually caught in the wild!
We are still very far from being able to truly understand these things. They behaves like us, but don't necessarily “think” like us.
And now we’ve given them direct access to tools that can affect the real world.
Maybe we am play god: https://dresdencodak.com/2009/09/22/caveman-science-fiction/
I think the Dario of today is very different to the Dario 3 years ago.
You are just one new feature announcement from Anthropic/OpenAI away from irrelevance.
Same as it was when people built their busineses on top of AWS a decade ago
https://apnews.com/article/anthropic-hegseth-ai-pentagon-mil...
1. Extremely granular ways to let user control network and disk access to apps (great if resource access can also be changed)
2. Make it easier for apps as well to work with these
3. I would be interested in knowing how adding a layer before CLI/web even gets the query OS/browser can intercept it and could there be a possibility of preventing harm before hand or at least warning or logging for say someone who overviews those queries later?
And most importantly — all these via an excellent GUI with clear demarcations and settings and we’ll documented (Apple might struggle with documentation; so LLMs might help them there)
My point is — why the hell are we waiting for these companies to be good folks? Why not push them behind a safety layer?
I mean CLI asks .. can I access this folder? Run this program? Download this? But they can just do that if they want! Make them ask those questions like apps asks on phones for location, mic, camera access.
Basicaly an EDR
Are people really attempting to have LLMs replace vision models in robots, and trying to agentically make a robot work with an LLM?? This seems really silly to me, but perhaps I am mistaken.
The only other thing I could think of is real-time translation during special ops with parabolic microphones and AR goggles...
On the other hand, those organizations are operating in the best interest of Americans and the world right?
Surely, those agencies aren't just a trick of the rich people? Right?
making promises in good times is a real minefield hah
And it will be, as Warren Buffet puts it, a "Only when the tide goes out do you discover who's been swimming naked." moment.
I really miss the nerd profile who cared a lot more about tech and science, and a lot less about signaling their righteousness.
How did we get so religious/narcissistic so quickly and as a whole?
ok lol what a coincidence.
but setting aside the conspiracy. the article actually spells out the real reason pretty directly: Anthropic hoped their original safety policy would spark a "race to the top" across the industry. it didn't. everyone else just ignored it and kept moving. at some point holding the line unilaterally just means you're losing ground for nothing.
Even if it were ever done with good intentions, it is an open invitation for benefit hoarding and margin fixing.
Do you realy want to create this future where only a select few anointed companies and some governments have access to super advanced intelligent systems, where the rest of the planet is subjected to and your own ai access is limited to benign basal add pushing propaganda spewing chatbots as you bingewatch the latest "aw my ballz"?
Netflix said that they'd never have live TV, or buy a traditional studio, or include ads in their content. Then they did all three.
All companies use principled promises to gain momentum, then drop those principles when the money shows up.
As Groucho Marx used to say: these are my principles, if you don't like them, I have others.
Dark times and darker forests.
That doesn't even make sense.
What stops one model from spouting wrongthink and suicide HOWTOs might not work for a different model, and fine-tuning things away uses the base model as a starting point.
You don't know the thing's failure modes until you've characterized it, and for LLMs the way you do that is by first training it and then exercising it.
"We promise are not going to do __, except if our customers ask us to do, then we absolutely will".
What is the point? Company makes a statement public, so what?
Not the first time this company puts some words in the wind, see Claude Constitution. It's almost like this company is built, from ground up, upon bullshit and slop
The largest predictor of behavior within a company and of that companies products in the long run is funding sources and income streams, which is conveniently left out in their "constitution". Mostly a waste of effort on their part.
It isn't about the right answers, rather the expected answers.
The intention to start these pledge and conflict with DOW might be sincere, but I don’t expect it to last long, especially the company is going public very soon.
The Amodeis' have just proven that the threat of even slight hardship will make them throw any and all principles away.
They’re pointless if they just get removed once you get close to hitting them.
And all the major corps seem to be doing this style of pr management. Speaks of some pretty weapons grade moral bankruptcy
The concept of "having a contract with society" doesn't even formally exist because companies would never sign one.
It's so much focus on implementation, and processes, and really really seems to consider the question of what even constitutes "misaligned" or "unethical" behavior to be more or less straight forward, uncontroversial, and basically universally agreed upon?
Let's be clear: Humans are not aligned. In fact, humans have not come to a common agreement of what it means to be aligned. Look around, the same actions are considered virtuous by some and villainous by others. Before we get to whether or not I trust Anthropic to stick to their self-imposed processes, I'd like to have a general idea of what their values even are. Perhaps they've made something they see as super ethical that I find completely unethical. Who knows. The most concrete stances they take in their "Constitution" are still laughably ambiguous. For example, they say that Claude takes into account how many people are affected if an action is potentially harmful. They also say that Claude values "Protection of vulnerable groups." These two statements trivially lead to completely opposing conclusions in our own population depending on whether one considers the "unborn" to be a "vulnerable group". Don't get caught up in whether you believe this or not, simply realize that this very simple question changes the meaning of these principles entirely. It is not sufficient to simply say "Claude is neutral on the issue of abortion." For starters, it is almost certainly not true. You can probably construct a question that is necessarily causally connected to the number of unborn children affected, and Claude's answer will reveal it's "hidden preference." What would true neutrality even mean here anyways? If I ask it for help driving my sister to a neighboring state should it interrogate me to see if I am trying to help her get to a state where abortion is legal? Again, notice that both helping me and refusing to help me could anger a not insignificant portion of the population.
This Pentagon thing has gotten everyone riled up recently, but I don't understand why people weren't up in arms the second they found out AIs were assisting congresspeople in writing bills. Not all questions of ethics are as straight forward as whether or not Claude should help the Pentagon bomb a country.
Consider the following when you think about more and more legislation being AI-assisted going forward, and then really ask yourself whether "AI alignment" was ever a thing:
1. What is Claude's stances on labor issues? Does it lean pro or anti-union? Is there an ethical issue with Claude helping a legislator craft legislation that weakens collective bargaining? Or, alternatively, is it ethical for Claude to help draft legislation that protects unions?
2. What is Claude's stance on climate change? Is it ethical for Claude to help craft legislation that weakens environmental regulations? What if weakening those regulations arguably creates millions of jobs?
3. What is Claude's stance on taxes? Is it ethical for Claude to help craft legislation that makes the tax system less progressive? If it helps you argue for a flat tax? How about more progressive? Where does Claude stand on California's infamous Prop 19? If this seems too in the weeds, then that would imply that whether or not the current generation can manage to own a home in the most populous state in the US is not an issue that "affects enough people." If that's the case, then what is?
4. Where does Claude land on the question of capitalism vs. socialism? Should healthcare be provided by the state? How about to undocumented immigrants? In fact, how does Claude feel about a path to amnesty, or just immigration in general?
Remember, the important thing here is not what you believe about the above questions, but rather the fact that Claude is participating in those arguments, and increasingly so. Many of these questions will impact far more people than overt military action. And this is for questions that we all at least generally agree have some ethical impact, even if we don't necessarily agree on what that impact may be. There is another class of questions where we don't realize the ethical implications until much later. Knowing what we know now, if Claude had existed 20 years ago, should it have helped code up social networks? How about social games? A large portion of the population has seemingly reached the conclusion that this is such an important ethical question that it merits one of the largest regulation increases the internet has ever seen in order to prevent children from using social media altogether. If Claude had assisted in the creation of those services, would we judge it as having failed its mission in retrospect? Or would that have been too harsh and unfair a conclusion? But what's the alternative, saying it's OK if the AI's destroy society... as long as if it's only on accident?
What use is a super intelligence if it's ultimately as bad at predicting unintended negative consequences as we are?
The narrative on social media, this site included, is to portray the closed western labs as the bad guys and the less capable labs releasing their distilled open weight models to the world as the good guys.
Right now a kid can go download an Abliterated version of a capable open weight model and they can go wild with it.
But let's worry about what the US DoD is doing or what the western AI companies absolutely dominating the market are doing because that's what drives engagement and clicks.