This is really sad. Suchir was just 26, and graduated from Berkeley 3 years ago.
Here’s his personal site: https://suchir.net/.
I think he was pretty brave for standing up against what is generally perceived as an injustice being done by one of the biggest companies in the world, just a few years out of college. I’m not sure how many people in his position would do the same.
I’m sorry for his family. He was clearly a talented engineer. On his LinkedIn he has some competitive programming prizes which are impressive too. He probably had a HN account.
Before others post about the definition of whistleblower or talk about assassination theories just pause to consider whether, if in his position, you would that want that to be written about you or a friend.
Yes, if I was a few months away from giving the court a statement and I "suicided" myself, I'd rather have people tribulate about how my death happened than expect to take the suicide account without much push.
Sure, if I killed myself in silence I want to go in silence. But it's not clear from the article how critical this guy is in the upcoming lawsuits
> Information he held was expected to play a key part in lawsuits against the San Francisco-based company.
If he was the key piece to the lawsuit the lawsuit wouldn't really have legs. To get the ball rolling someone like him would have to be critical but after they're able to get the ball rolling and get discovery if after all that all you have is one guy saying there is copyright infringement you've not found anything.
And realistically, the lawsuit is, while important, rather minor in scope and damage it could do to OpenAI. It's not like folk will go to jail, and it's not like OpenAI would have to close its doors, they would pay at most a few hundred million?
If you look at Aaron Schwartz for example you see they don’t have to assassinate you, they just have so many lawyers, making so many threats, with so much money/power behind them, people feel scared and powerless.
I don’t think OpenAI called in a hit job, but I think they spent millions of dollars to drive him into financial and emotional desperation - which in our system, is legal.
You damn well better be trying to figure out what happened if I end up a dead whistleblower.
If that was my public persona, I don't see why not. He could have kept quiet and chosen not to testify if he was afraid of this defining him in a way.
I will say it's a real shame that it did become his public legacy, because I'm sure he was a brilliant man who would have truly help change the world for the better with a few more decades on his belt.
All that said, assassination theories are just that (though "theory" is much too strong a word here in a formal sense. it's basically hearsay). There's no real link to tug on here so there's not much productivity taking that route.
There will always be a few tacky remarks in any Internet forum but those have all found their way to the bottom.
RIP.
People are free to comment on media events. You too are free to assume the moral high ground by commenting on the same event, telling people what they should or should not do.
https://web.archive.org/web/20241211184437/https://suchir.ne...
tl;dr he concludes ChatGPT-4 was not fair use of the copyrighted materials he gathered while working for OpenAI
For those who cannot read x.com:
https://nitter.poast.org/suchirbalaji/status/184919257575813...
https://theintercept.com/2023/03/23/peter-thiel-jeff-thomas/
The internet wildly speculating would probably get back to my mom and sister which would really upset them. Once I’m gone my beliefs/causes wouldn’t be more important than my family’s happiness.
I don't think I have delusions of grandeur, I worry that the cost of exterminating people algorithmically could become so low that they could decide to start taking out small fries in batches.
A lot of narratives which would have sounded insane 5 years ago actually seem plausible nowadays... Yet the stigma still exists. It's still taboo to speculate on the evils that modern tech could facilitate and the plausible deniability it could provide.
The only benefit of turning it into gossip is to dissuade other whistleblowers, without the inconvenience of actually having to kill anyone.
It seems like he just disagreed with whether it was "fair use" or not, and it was notable because he was at the company. But the facts were always known, OpenAI was training on public copyrighted text data. You could call him an objector, or internal critic or something.
Training on X doesn't run afoul of fair-use because it doesn't redistribute nor does using it simply publish a recitation (as Suchir suggested). Summoning an LLM is closer to the act of editing in a text editor than it is to republishing. His hang up was on how often the original works were being substituted for chatGPT, but like AI sports articles, overlap is to be expected for everything now. Even without web scraping in training it would be impossible to block every user intention to remake an article out of the magic "editor" - that's with no-use of the data not even fair-use.
What do you mean he was "stealing data"? Was he hacking into somewhere?
>In a Nov. 18 letter filed in federal court, attorneys for The New York Times named Balaji as someone who had “unique and relevant documents” that would support their case against OpenAI. He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.
Yes it's true it's been public knowledge that OpenAI has trained on copyrighted data, but details about what was included in training data (albeit dated ...), as well as internal metrics (e.g. do they know how often their models regurgitate paragraphs from a training document?) would be important.
When does generative AI qualify for fair use? by Suchir Balaji
> I recently participated in a NYT story about fair use and generative AI, and why I'm skeptical "fair use" would be a plausible defense for a lot of generative AI products. I also wrote a blog post (https://suchir.net/fair_use.html) about the nitty-gritty details of fair use and why I believe this.
> To give some context: I was at OpenAI for nearly 4 years and worked on ChatGPT for the last 1.5 of them. I initially didn't know much about copyright, fair use, etc. but became curious after seeing all the lawsuits filed against GenAI companies. When I tried to understand the issue better, I eventually came to the conclusion that fair use seems like a pretty implausible defense for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they're trained on. I've written up the more detailed reasons for why I believe this in my post. Obviously, I'm not a lawyer, but I still feel like it's important for even non-lawyers to understand the law -- both the letter of it, and also why it's actually there in the first place.
> That being said, I don't want this to read as a critique of ChatGPT or OpenAI per se, because fair use and generative AI is a much broader issue than any one product or company. I highly encourage ML researchers to learn more about copyright -- it's a really important topic, and precedent that's often cited like Google Books isn't actually as supportive as it might seem.
> Feel free to get in touch if you'd like to chat about fair use, ML, or copyright -- I think it's a very interesting intersection. My email's on my personal website.
If I'm an artist and copy the style of another artist, I'm also competing with that artist, without violating copyright. I wouldn't see this argument holding up unless it can output close copies of particular works.
He seemed very insightful for someone that isn't a lawyer.
RIP.
I would respond to this by
1. authors don't actually get revenue from royalties, instead it's all about add revenue which leads to enshittification. If they were to live on royalties they would die of hunger, artists, copywriters and musicians.
2. copyright is increasingly concentrated in the hands of a few companies and don't really benefit the authors or the readers
3. actually the competition to new creative works is not AI, but old creative works that have been accumulating for 25 years on the web
I don't think restrictive copyright is what we need. Instead we have seen people migrate from passive consumption to interactivity, we now prefer games, social networks and search engines to TV, press and radio. Can't turn this trend back, it was created by the internet. We have now wikipedia, github, linux, open source, public domain, open scientific publications and non-restrictive environments for sharing and commenting.
If we were to take the idea of protecting copyrights to the extreme, it would mean we need to protect abstract ideas not just expression, because generative AI can easily route around that. But if we protected abstractions from reuse, it would be a disaster for creativity. I just think copyright is a dead man walking at this point.
" It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.
If a departing employee declines to sign the document, or if they violate it, they can lose all vested equity they earned during their time at the company, which is likely worth millions of dollars."
[1] https://www.vox.com/future-perfect/2024/5/17/24158478/openai...
Really ridiculous how afraid Openai is of criticism. Acting like a child that throws a tantrum, when something doesn't go its way, just that one needs to remind oneself, that somehow there are, with regard to age at least, adults behind this stuff.
"Any country with 'democratic' in its name, isn't".
The fight to claim a word's meaning can sometimes be fascinating to observe. We've started with "Free Software", but it was easily confused with "freeware", and in the meantime the meaning of "open source" was being put to test by "source available" / "look but do not touch" - so we ended up with atrocities like "FLOSS", which are too cringe for a serious-looking company to try to take over. I think "open" is becoming meaningless (unless you're explicitly referring to open(2)). With the advent of smart locks, even the definition of an open door is getting muddy.
Same for "AI". There's nothing intelligent about LLMs, not while humans continue to supervise the process. I like to include creativity and self-reflection in my working definition of intelligence, traits which LLMs are incapable of.
* The company will not cancel any vested equity, regardless of whether employees sign separation agreements or non-disparagement agreements
* Former employees have been released from their non-disparagement obligations
* OpenAI sent messages to both former and current employees confirming that it "has not canceled, and will not cancel, any vested units"
https://www.theregister.com/2024/05/24/openai_contract_staff...
https://www.bloomberg.com/news/articles/2024-05-24/openai-re...
Can someone with legal expertise weigh in on how likely this would be to hold up in court?
Completely unrelated: https://jalopnik.com/uzi-nissan-spent-8-years-fighting-the-c...
I am saying "laughable" because there are small things companies try to enforce, and say sorry afterwards. But telling you that you are stuck with this for life is comedy grade.
If you're struggling reading this, I want to say that you're not alone. Even if it doesn't feel like it right now, the world truly wants you to be happy.
The path is open to you:
Old Path White Clouds [0]
Opening the Heart of Compassion [1]
Seeing That Frees [2]
[0] https://z-library.sk/book/1313569/e77753/old-path-white-clou... [1] https://z-library.sk/book/26536611/711f2c/opening-the-heart-... [2] https://z-library.sk/book/3313275/acb03c/seeing-that-frees-m...
As someone who has struggled with suicidal ideation while working in the tech industry for over a decade, I do wonder if the insane culture of Bay Area tech has a part to play.
Besides the extreme hustle culture mindset, there's also a kind of naive techno-optimism that can make you feel insane. You're surrounded by people who think breaking the law is OK and that they're changing the world by selling smart kitchen appliances, even while they're exploiting workers in developing countries for cheap tech support and stepping over OD victims outside their condo.
This mindset is so pervasive you really start to wonder if you're crazy for having empathy or any sense of justice.
I have no special insight except to guess that going from being an obviously brilliant student at Berkeley to a cut-throat startup like OpenAI would be a jarring experience. You've achieved everything you worked your whole life for, and you find you're doing work that is completely out of whack with your morals and values.
Imposter syndrome is high among engineers of all levels of experience and ability. Engineering has it's own set of pressures. Then you add in all the other reasons people can feel stressed or pressured and all of the bay area specific reasons those things are amplified. It adds up.
You would be surprised how many brilliant and highly capable people have broken down. For anyone out there feeling like they are all alone - don't. Even if all the people around you seem happy and confident, I guarantee that a larger portion of them than you realize are struggling.
The ends do not justify the means—and it is easy to see the means having wide-ranging systemic effects besides the ends, even if we pretended those ends were well-defined and planned (which, aside from the making profit, they are clearly not: just think of the nebulous ideas and contention around AGI).
As much as I want to give this a charitable reading, the only explanation I can think of for using the word whistleblower here is to imply that there's something shady about the death.
Not to be pedantic, but this is actually incorrect, both under federal and California law. Case law is actually very explicit on the point that the information does NOT need to be previously unknown to qualify for whistleblower protection.
However, disclosing information to the media is not typically protected.
We can evaluate that argument without caring too much about whether the writer intended it, or whether some other circumstances might have forced their word-choice.
In this case I see very little reason to believe that would be the case. No one has hinted that this employee has more damning information than was already public knowledge, and the lawsuit that he was going to testify in is one in which the important facts are not in dispute. The question doesn't come down to what OpenAI did (they trained on copyrighted data) but what the law says about it (is training on copyrighted data fair use?).
A whistleblower could also be someone in the process of doing so, i.e. they have a claim about the organization, as well as a promise to give detailed facts and evidence later in a courtroom.
I think that's the more commonsense understanding of what whistleblowers are and what they do. Your remark hinges on a narrow definition.
You're running into the birthday paradox here. The probability of a specific witness dying before they can testify in a lawsuit is low. The probability of any one of dozens of people involved in a lawsuit dying before it's resolved is actually rather high.
Entirely possible.
But in my career as a paramedic, I've (sadly) lost count of the number of mental health patients who have said, "Yeah, that was just a glitch, I'm not suicidal, not now/nor then." ... and gone on to commit or attempt suicide in extremely short order.
No, it’s not low. No need to put conspiracies before evidence, and certainly not by making claims you’ve not done no diligence on.
And the article provides statements by professionals who routinely investigate homicides and suicides that they have no reason to believe anything other than suicide.
That is an exceedingly charitable read of these lawsuits.
Everyone knows LLMs are copyright infringement machines. Their architecture has no distinction between facts and expressions. For an LLM to be capable of learning and repeating facts, it must also be able to learn and repeat expressions. That is copyright infringement in action. And because these systems are used to directly replace the market for human-authored works they were trained on, it is also copyright infringement in spirit. There is no defending against the claim of copyright infringement on technical details. (C.f. Google Books, which was ruled fair use because of it's strict delineation of facts about books and the expressions of their contents, and provides the former but not a substitute for the latter.)
The legal defense AI companies put up is entirely predicated on "Well you can't prove that we did a copyright infringement on these specific works of yours!".
Which is nonsense, getting LLMs to regurgitate training data is easy. As easy at it is for them to output facts. Or rather, it was. AI companies maintain this claim of "you can't prove it" by aggressively filtering out any instances of problematic content whenever a claim surfaces. If you didn't collect extensive data before going public, the AI company quickly adds your works to it's copyright filter and proclaims in court that their LLMs do not "copy".
A copyright filter that scans all output for verbatim reproductions of training data sounds like a reasonable compromise solution, but it isn't. LLMs are paraphrasing machines, any such copyright filter will simply not work because the token sequence 2nd-most-probable to a copyrighted expression is a simple paraphrase of that copyrighted expression. Now, consider: LLMs treat facts and expressions as the same. Filtering impedes the LLM's ability to use and process facts. Strict and extensive filtering will lobotomize the system.
This leaves AI companies in a sensitive legal position. They are not playing fair in the courts. They are outright lying in the media. The wrong employees being called to testify will be ruineous. "We built an extensive system to obstruct discovery, here's the exact list of copyright infringement we hid". Even just knowing which coworkers worked on what systems (and should be called to testify) is dangerous information.
Sure. The information was public. But OpenAI denies it and gaslights extensively. They act like it's still private information, and to the courts, it currently still is.
And to clarify: No I'm not saying murder or any other foul play was involved here. Murder isn't the way companies silence their dangerous whistleblowers anyway. You don't need to hire a hitman when you can simply run someone out of town and harass them to the point of suicide with none of the legal culpability. Did that happen here? Who knows, phone & chat logs will show. Friends and family will almost certainly have known and would speak up if that is the case.
> He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.
QubesOS, disposable laptop, faraday cage, and never work from home. https://www.qubes-os.org/
It's an extra layer of protection against more powerful threat actors.
https://www.nytimes.com/interactive/2019/12/19/opinion/locat...
If the AGI actual existed it could certainly indirectly get people killed that were threatening it's existence. It could "swat" people. Plant fake evidence (mail order explosives to the victim's house, call the FBI). It could manipulate others. Find the most jealous unstable person. Make up fake texts/images that person is having an affair with their partner. Send fake messages from partner provoking them into action, etc... Convince some local criminal the victim is invading their turf". We've already seen several examples of LLMs say "kill your parents/partner".
>In early 2022, Mr. Balaji began gathering digital data for a new project called GPT-4
https://www.nytimes.com/2024/10/23/technology/openai-copyrig...
Training a massive LLM on the scale of GPT-4 required a lot of lead time (less so nowadays due to various optimizations), so the timeframe makes sense.
I'm a bit worried that while regulators are focusing on X/Facebook/Instagram/etc. from a moderation perspective, not one regulator seems to be looking at the increasingly extreme and unmoderated rhetoric on Reddit. People are straight up braying for murder in the comments there. I'm worried that one of the most visited sites in the US is actively radicalizing a good chunk of the population.
Thank goodness they're an American site, where the precedent for persecuting websites for active radicalization is practically nonexistent.
Especially if one party have incentive to discredit/destroy such person, so court/jury won't take their testimony seriously(or there will be no testimony at all). After all it's almost impossible to connect such actions with subsequent suicide.
While suicide is by definition action of individual, what leads to it isn't always the same.
The Boeing guy killed himself, this guy apparently killed himself. The pattern of David vs Goliath, where David kills himself, is almost becoming a pattern.
You need to have ice water flowing in your veins if you are about to mess with something big. At worst you need to have benign neglect for the consequences.
Often fear is the only instrument they have against you. And if you are not afraid, they will likely not contest further. Threat of jail, violence or courts is often what they use to stop you. In reality most people are afraid to go to war this way. Its messy and often creates more problems for them.
where did you get that? this article doesn't say a cause of death.
I'm not sure why there are so many comments trying to downplay and argue around whether OpenAI was a whistleblower or not he fits pretty much all the definition.
OpenAI was suspected of using copyright data but that wasn't the only thing OpenAI whistleblower was keeping under wraps given NDA. The timing of OpenAI partnering with US military is odd.
Yes, but also, his own brother said:
“He was suffering from PTSD and anxiety attacks as a result of being subjected to the hostile work environment at Boeing, which we believe led to his death,”
Internet echo chambers love a good murder mystery, but dragging a quiet and honest employee who works in the trenches through a protracted, public, and stressful legal situation can be very tough.
https://www.siliconvalley.com/2024/12/13/openai-whistleblowe...
In the case of the US, you cannot make your selection wide enough. For optimal security, get it to both local news organizations and serious European press agencies.
The US news media do not have independent editorial boards. Several titles are actually from the same house. Corporate ownership, and professionals going to the dark side via https://en.wikipedia.org/wiki/Elite_capture are just some other risks.
Even if it gets published, your story can be suppressed by the way the media house deals with it. Also, there are many ways to silence news that is inconvenient or doesn't fit belief schemes, good example https://news.ycombinator.com/item?id=42387549
When WSJ broke the Elizabeth Holmes story, much ink was spilled showing how no European paper would take on a corporation strong government support.
Looking at Europe, governments first instinct is to protect national favorites.
European whistleblowers are likely to face defamation suits, something thankfully difficult in America.
There are almost none left.
> The US news media do not have independent editorial boards.
The EU also don't. They are all penetrated by NGOs
It's very naive to believe in 'European press'. To get the idea check Ukrainian war coverage. What you'll see first is how single sided it is. This cannot be a coincidence. It can be only a result of total control. I respected 'The Guardians' before, but after eyes opening it appears to be the most brainwashing and manipulative there. Very professionally done, must admit. The problem isn't just that war, it's likely everything and I have no easy way to check for example what really happened in Afghan war. Did US really won like Biden said?
* Enemies and competitors of A now have an incentive to kill you.
* If the info about A would move the market, someone who would like to profit from knowing the direction and timing now has an incentive to kill you.
* Risks about trustworthiness of this "service". What if the information is released accidentally. What if it's a honeypot for a hedge fund, spy agency, or a "fixer" service.
* You've potentially just flagged yourself as a more imminent threat to A.
* Attacks against loved ones seems to have been a thing. And doesn't trigger your deadman's switch.
Are you saying they won't kill you because then the documents would be released? So you would never release the documents if they never kill you?
Or are you saying you'll do this so the documents are guaranteed to be released, even if you're killed? In that case, why not just publish them right now?
Why not just publish immediately? Publishing immediately likely violates NDA and could be prosecuted if you're not compelled to testify under oath. This is what Edward Snowden did and he's persona non grata from the US for the rest of his life.
This is about leverage, and perhaps even bluff. It's never a binary situation, nor are there any guarantees.
Boeing manufacturing is also the source of the persistent Boeing problems and issues that goes back to before the MCAS catastrophic incidents and has continued after MCAS was fixed.
Airbus has deeply integrated R&D and manufacturing hubs where the R&D engineers and scientists can just walk a few minutes and they will be inside the factory halls manufacturing the parts they designs.
Meanwhile Boeing has separated and placed their manufacturing plants in the US states where they can get most federal and state tax benefits for job creation.
It would likely be safer to write a service and have interdependent relationships between redundant hosting systems in different jurisdictions without direct connections because that way you can protect against single points of failure (eg. compromised hosts, payment systems, regulators, network providers).
I would be surprised if this isn't a thing yet on Ethereum or some other well known distributed processing crypto platform.
They'll just get out the $5 wrench then
1. Dangerous to someone else
2. Separable from the main reveal
3. Something you're willing to held conceal indefinitely
Yes, indeed, that would attest to your mental state!
How would one protect themselves from something like this? Avoid all 'algorithmically' generated data sources, AdBlock, VPN, don't log in anywhere?
I found that people are using it to abuse those they hate. I’ve received the message a few times when I had an argument with someone. Apparently it's a thing:
https://www.reddit.com/r/questions/comments/1bp1k9h/why_do_i...
There’s something profound about someone looking serious(official looking reddit account) giving you the idea of suicide. The first time I remember feeling very bad, because it’s written in a very official and caring way, it was like someone telling me that “I hate you so much that I spent lots of energy to meticulously tell you dat I want you to kill yourself” but also made me question myself.
I don't know how much this is embellished, but I'd say it's not too hard.
for defence, as others have said, walk away from the phone. spend time with friends.
I personally swear out loud followed by the name of the company whenever I see a YouTube advert, I hope it helps me avoid making the choices they want me to.
Your ads would still need to be reviewed and would likely not pass the filters if they straight up encourage self harm.
Fair use hasn't been tested in court. He has documents that show OpenAI's intention and communications about the legal framework for it. He was directly involved in web scraping and openly discussing the legal perspectives with his superiors. That is damning evidence.
He might have been under pressure from attention he got from the press for whistle blowing. He might have worried about career damage. 26 and working on a web scraper for a high-profile company is great, but it's nothing special. I'm not sure of his immigration status, but he could also be dealing with visa issues.
Countless people have killed themselves upon losing a job. Jobs are fundamental to our identity in society and the ramifications of job loss are enormous.
But frankly this is a "oh that person seemed so happy, how could they have been depressed!?" line of thinking. The 2021 suicide death rate in the US population for the 26 to 44 age bracket is 18.8 per 100,000[1]. It is literally the highest rate (second is 18-25), and it is wildly skewed in favor men (22.8 / 100,000 vs 5.7 per 100,000).
[1] https://www.kff.org/mental-health/issue-brief/a-look-at-the-...
They mention "Lower Haight" and "Buchanan St" for the apartment location. In lieu of an exact address of his apartment, I feel like the marked location is reasonably close to situate the story within the area - within a half mile or so?