If you don&#x27;t opt out by Apr 24 GitHub will train on your private repos

jffry2mo ago

It's unnecessarily splitting hairs.

> interaction data—specifically inputs, outputs, code snippets, and associated context [...] will be used to train and improve our AI models

So using Copilot in a private repo, where lots of that repo will be used as context for Copilot, means GitHub will be using your private repo as training data when they were not before.

tptacek2mo ago

No it isn't. Most people don't use Copilot, so this term change won't effect most people. You can reasonably be unhappy about it anyways (or unreasonably still be using Copilot in 2026), but it's still ultra-useful information for them to add to the discussion.

6 more replies

pverheggen2mo ago

Isn't this pretty standard, using your interaction data for training and making it opt-out? Claude Code, Codex, Antigravity etc. all do the same. Private repo doesn't make a difference as they have a local copy to work from.

munk-a2mo ago

The initial title and your reply are both too broad to be fully accurate. By April 24th Github will train on private repos (assuming a flag isn't set) but this change is limited to just non-Business/Pro users. So a number of private repos will be effected but it won't automatically affect all private repos (so my panic check on our corporate account wasn't necessary yet).

I am not certain if you're a spokesperson for github - but it's good to be careful in your language. Instead of "No we won't" a lead like "That isn't entirely accurate" would be more suitable. In the end both the original post title and your reply have ended up being misleading.

tadfisher2mo ago

> By April 24th Github will train on private repos

This statement itself is misleading. Also, GitHub probably should have seen this coming.

They are not doing what I initially thought, which is slurping up your private repo, wholesale, into its training set. You don't have to opt out of anything to prevent that.

They are slurping any context and input containing code from your private repo which is provided to them as part of using Copilot.

So, in addition to the opt-out setting, there is an even easier way to avoid providing them your private repository data to train AI models, and that's by continuing to not use Copilot.

andoando2mo ago

Thats still pretty bad. Its no longer private if all your code goes through LLM training set and is resurfable to everyone publicly.

Why would I ever use copilot on any code Id want to be kept private? Labling it a private repo and having a tiny clause in the TOS saying we can take your code and show it to everybody is just an upright lie

NewsaHackO2mo ago

I mean, you shouldn't send data to any SaaS LLM for code you want to be private, unless you have had them sign some sort of contract saying they will not train on your use. In fact, it is probably never a good idea to send anything you want to be private off premises unencrypted.

layer82mo ago

In the EU, opt-out is not a legally valid way to obtain the necessary consent. How do you plan to handle this?

booi2mo ago

probably by paying the fine and doing it anyway

x0x02mo ago

For personal data. I don't believe you can reasonably claim code is personal data any more than a hammer is your personal data.

otterley2mo ago

Hey Martin, can you please work with Product to significantly clarify what is meant by the following language in the settings? Because right now it's nearly impossible for a layperson (or even an average programmer) to understand what this means:

""" Allow GitHub to use my data for AI model training

Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement. """

If the reality is less scary than how it sounds, then the wording needs to be less scary-sounding. It may be that GitHub isn't training models on private repos, but the language certainly suggests that it is. The feedback we're seeing in this post is proof enough of that.

Finally, I read the Privacy Statement, and it's unclear what the applicable language is. "Inputs," "Outputs," and "Associated Context" are terms of art that have no matching definitions in the Statement. (The terms "Outputs" and "Associated Context" don't even appear in the Statement at all. Not even "train.") As an attorney I find this completely baffling.

saghm2mo ago

Yes, you will. This is what the setting says on my account when I clicked the link:

> model training

> Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement

Are you seriously trying to claim that the code isn't input, output, or associated context of Copilot operating on a private repo? What term do you think better applies to the code that's being read as input, used as context, and potentially produced as output?

ziml772mo ago

I don't like that they are training on any interactions with Copilot by default but training on something that you've put through Copilot yourself is much different than them just shoving all the private repos currently on Github into the training data.

Jolter2mo ago

If you are not willing to migrate out of GitHub, what you can do is to avoid using Copilot on your private repository.

wewtyflakes2mo ago

If Copilot later adds a feature like "Scan your repo for vulnerabilities using Copilot <opt-out>", then that would both fit your criteria, and the baiting outrage of the original poster, in one swoop! Of course, Microsoft would _never_ do that, right?

edelbitter2mo ago

> If you don’t use Copilot this will not affect you.

How does this work for a private repository with access granted to additional contributors? Which setting is consulted then?

daveguy2mo ago

Nice try. If you're training on "inputs" to Copilot then you are training on the private repos.

This suspect denial is why I will get my clients moved off of github.

grepfru_it2mo ago

Back in my day someone would post a HN article to the internal slack in order to sway conversation in their favor. Glad to see its still happening! :D

BoredPositron2mo ago

Yes you do? If a user uses any form of copilot in one of his repos except ofc enterprise, says so right in the blog post. These aktshually corporate technicality defense posts aren’t helping, they just end up making you personally look a bit fishy.

SirensOfTitan2mo ago

Right, but it shouldn't be opt-out only to begin with. It's a dishonest pattern that relies on people not noticing. Honest use of data is a "Caesar's wife must be above suspicion" moment for me -- if this is how you're acting when engaging with customers explicitly, I don't trust you to resist the temptation to tap into my data privately. AI companies already have trained their models illegally against the intellectual property of all of humanity with little consent along the way.

Honestly, if you work at GitHub, maybe you should focus on your uptime -- it's awful.

mrdependable2mo ago

I think the problem is more with using PRIVATE repos. My letters are also private and I would be pretty pissed if the mail carrier was reading them. Why does GitHub think it has the right to do this?

languid-photic2mo ago

Appreciate the clarification. But, it's still not great.

To the PM behind this - developers are sensitive to this kind of thing. Just make it opt-in instead?

dataflow2mo ago

Say someone has a very sensitive secret (say, a Bitcoin private key) in their free private Github repo, and uses Copilot on that repo and touches the secret with it. Would you be willing to assure here that toggling that setting would not affect the likelihood of that secret leaking, and that that likelihood is also unaffected by whether the account is Business or Free?

mrits2mo ago

Thanks for confirming you train on our data

pokot02mo ago

Question. How does it work if I own a repository (opt out, don't use copilot) and I give access to someone else (use is opted in and uses copilot). Do you train on his submissions of my code? How can you know what that he has the right to share the code with you for training?

ClikeX2mo ago

How do you handle accounts that have copilot managed by an organisation? I've seen several cases where people cannot opt out their account because of the org connection (the option just isn't there in the settings). What happens to their account the moment they leave that org?

kingkandu2mo ago

Sorry doesn't help at all but you can still be useful - can you please tell us how many private repos do "users of Free, Pro and Pro+ Copilot" who have used Copilot in the last 90 days exist in the github database?

Because microsuck is about to violate the law that many times

jawilson22mo ago

I'm in the process of moving all of my repos off of github and deleting that account.

Hope that helps.

_pdp_2mo ago

So you will train on data collected from free users working on GPL and copyrighted projects?

DougN72mo ago

And on users that don’t even use github, other than the required account to use CoPilot in Visual Studio.

https://news.ycombinator.com/item?id=37124188

johndough2mo ago

Under GDPR, opt-out is not considered informed consent, and repositories can contain personally identifiable information, which fall under GDPR. Do you think differently, or do you think ignoring the law will be worth it?

ziml772mo ago

Thanks for the clarification. The OP here made me think I missed something in both the blog post about the change and in the available settings.

gortok2mo ago

This is a distinction without a difference, according to the text of that enable/disable dialog,

> Allow GitHub to use my data for AI model training: Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement.

“Associated Context” is the repo. If I use copilot, I’m giving it access to my repo.

I don’t know in all the ways copilot can be triggered, and I’m not certain that I could stop it from being triggered, given Microsoft’s past behaviors in slapping Copilot on everything that exists.

Jabrov2mo ago

Can't you just make it opt-in?

No? Because no one would opt-in, you say?

Wow. It's almost like this is a user-hostile feature that breaks the implicit promise behind a "private" repo.

nickvec2mo ago

I think you're well aware that people aren't upset at the distinction between training on Copilot data versus training on private repo data (at rest). People are upset because GH is using an opt-out model. Your response is disingenuous not to address this, and the "hope this helps" comes across as condescending (not sure if that was your intention.)

happytoexplain2mo ago

As others have pointed out, this is somewhat dishonest. Which is depressing, if you represent GitHub.

buildbot2mo ago

>Hope that helps

Honestly, what the fuck? This changes was already pretty bad but this being the apparent corporate response is insane.

Done with Github and Microsoft after this. Just disgusting how little you care for users, ethics, or morals.

pesus2mo ago

Why not get user consent first?

irishcoffee2mo ago

I am aware of CUI data hosted on github by corporate entities. You’re saying you’ll essentially violate the entire point of CUI?

That’s fucking terrifying.

elAhmo2mo ago

Defaulting to opt-in is a malicious move, no matter how you present things.

ethanwillis2mo ago

"hope that helps"

Why the smug sarcastic attitude? nah, fuck github i'm out.

anarticle2mo ago

tl;dr: installed gitlab.

I'm not bidding against you to not train on my data.

inopinatus2mo ago

“Opt-out” is an egregiously toxic and unethical approach to consent and should be illegal everywhere that it isn’t already.

I didn’t think Github had much of a brand left to damage, but here we are.

kepano2mo ago· 24 in thread

I've been saying this since 2023

> If your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resist

mememememememo2mo ago

Yes I think you are right. Even a super ethical company can be taken over. There may be exceptions but it is more luck. I work for a SP500 that absolutely won't dont this and locks down prod access so a rogue staff can't do it. But if Larry or Zuck or Bezos buys them out, who knows.

Forgeties792mo ago

I worry about a post-Gabe valve for this reason.

miohtama2mo ago

Microsoft would never do this

(-:

random32mo ago

The “do it first, apologize later” will be the general principle with anything. It’s going to be hard and futile to prove even if they don’t do it through ToS first. Amazon has one of the largest corporate training sets out there:)

slowhadoken2mo ago

I’m still concerned about MS using the code I write on my laptop to train AI. Tinfoil hat wearing Linux users are starting to make a lot of sense to me.

qaadika2mo ago

It's been interesting the past year or so watching myself turn more and more into one of the tin-foil wearing linux users. I'm not sure how it happened, but self-hosting became more and more alluring and hyperfocusing on taking as much data as I can offline became worth spending entire weekends on.

I didn't become paranoid, everybody else didn't!

DougN72mo ago

I thought that’s more what the CoPilot change is really about - not your repo, but all the code CoPilot read while it is offering helpful completions, etc - so literally the code on your laptop. I cancelled my account.

account422mo ago

You don't need a tinfoil hat so see the value in having a computer you fully own as opposed to one where some company can push whatever you want and all you get to say is "yes master" or perhaps if you are really lucky, "maybe later".

b1122mo ago

It's not tinfoil, it's aluminum foil. I.. I mean, I heard it's that.

ekjhgkejhgk2mo ago

You're right, of course, and I find it frustrating that people are so thick as to not see your claim as obvious.

Stallman is always right.

itsdesmond2mo ago

Back in 2003 he was advocating for legalization of child sexual abuse material. In 2006 he said he was skeptical of the harm caused by “voluntary pedophilia”, a statement that presupposes that children can consent to sex with adults.

So I dunno bout that.

jamiek882mo ago

About technology.

About communication with other humans he’s pretty much always wrong.

Imagine we’d had a better communicator who wasn’t a gross toe nail picking troll fronting free software? It shouldn’t matter. Only the ideas should matter . But the reality is different.

worik2mo ago

> Stallman is always right.

Not really. Almost always right....

moralestapia2mo ago

Thank you for your service. We really need more "canaries in the mine" giving out early warnings of things that might not be evident on a first glance.

Any takes on what 2029 will look like? (related to this topic, ofc)

chistev2mo ago

Now this is sarcasm. Lol

hugodan2mo ago

and it is not end-to-end encrypted if you don't own the keys, avoid bullshit

cj2mo ago

Edit: Okay, sounds like you guys are pissed to the point where it seems like the pro tip here is to stop using GitHub.

Pro tip: sign up for the business/enterprise version when reasonable in price.

I do this with Google Workspace. You can also do it with GitHub.

(Google doesn’t train on Workspace, Github doesn’t train on business customers, etc)

worble2mo ago

Pro tip: You could instead spend that money to spin up a forgejo instance for as little as $2 a month https://www.pikapods.com/apps#development (not affiliated, just a happy customer)

Please don't reward these companies with money.

thot_experiment2mo ago

Probably don't reward extortion with money.

Lio2mo ago

An enterprise licence won't save you, Google, Microsoft, et al have happily been breaking copyright laws for years.

If the publishing industry can't win a case against the AI firms then you don't stand a chance when you finally find out they've been training on your private data the whole time.

They can tell you one thing and do the opposite and there's effectively nothing you can do about it. You'd be a fool to trust them.

saghm2mo ago

At the risk of stating the obvious, I don't think it makes sense to reward them with money for trying to pull a bait-and-switch on this.

groby_b2mo ago

Github's enterprise version "starts at" $21.99/seat, and requires you to "contact sales".

And I don't see any mention that that exempts you from being trained on. (Yes, the blog says you're still covered, but at that price I'd like to see a contract saying that)

margalabargala2mo ago

> Google doesn’t train on Workspace, Github doesn’t train on business customers, etc

...yet

[0]: https://forum.gitlab.com/t/can-i-opt-out-from-my-code-being-...

throwuxiytayq2mo ago

It's not a pro tip if it only fucks you over slightly later. How's the weather in Stockholm?

kristianp2mo ago· 16 in thread

What's a good alternative for free private repos?

eblume2mo ago

I've recently started hosting my own forgejo instance. It works so well! Free tailscale for connectivity. I expose mine over fly.io proxy, also free, but not to be done without caution.

Supermancho2mo ago

Gitlab?

Microsoft services are tech debt. I moved the moment they were acquired and never regretted it.

nottorp2mo ago

I opened gitlab.com and it starts with

"Finally, AI for the entire software lifecycle."

Not very trust inspiring, that.

Can I even have git hosting without anything else being crammed down my throat, or it's just like Microsoft?

mrweasel2mo ago

It's a fair question, but if you need private repos, I think you need to start considering a paid option, or self-host.

If it's really important to you that the repo is private, I'd self-host.

conductr2mo ago

Just spitballing, don’t use these tools myself, but isn’t this something that should be encrypted to really prevent them from training? I personally don’t trust anyone with my data when they pivot to building AI products yet claim my data wasn’t a part of that strategy. It’s too easy to hide/lie.

But it always seemed to me that the UI should run locally with encryption keys that are shared and the service just manages encrypted blobs of diffs that can roll from version to version of encrypted data and that’s about it. Granted I probably don’t know the full workflow, i typically am a single dev on simple projects where I don’t need 99% of the overhead these introduce.

piersj2252mo ago

I've not tried this, however https://github.com/AGWA/git-crypt

Apparently someone has developed something similar to this

Imustaskforhelp2mo ago

I would've recommended codeberg but codeberg isn't the finest to be recommended for free private repos.

I definitely feel like more can be done within this space and that there is space for more competitors (even forgejo instances for that matter)

sebastiennight2mo ago

GitLab would be a good bet here. We started on their free tier and used that for a couple of years, I was very happy with it. Not sure how the tiers might have evolved since.

And according to their PM and privacy policy, they're not training their models on your code[0].

pyjarrett2mo ago

It doesn't take much power or time to run your own local git server. My first one which lasted years was parts I mangled together from old computers from garage sales.

There's instructions on running a Git server in the git book: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protoco...

wuschel2mo ago

Sourcehut comes to my mind: https://sourcehut.org/

werdnapk2mo ago

I've been using gitosis to manage private repos for almost 2 decades now. It's extremely easy to host your own repositories.

I just looked up gitosis on github though and it was last updated 12 years ago.... still works for me though.

Overall, hosting your own repos is very easy.

bigstrat20032mo ago

I use Fossil for mine. Dead easy to set up, and while the workflow might not be great for public contributions like Github is, that doesn't matter on something where I'm the only user.

stephenr2mo ago

I've seen https://codefloe.com mentioned, can't say I've used it myself yet though.

bonestamp22mo ago

BitBucket.org (Atlassian)

JonChesterfield2mo ago

Any computer you have ssh access to.

throwaway6137462mo ago

A cheap mini-pc in your closet.

hedayet2mo ago· 10 in thread

To Github's credit, they have been showing a banner consistently. To my discredit - I never bothered to read that banner until I saw this HN headline

nottorp2mo ago

How does that help if you don't go to the github site but just use git from the command line?

fph2mo ago

Can you use git's Copilot from the command line? If you can't, then you have nothing to opt out from.

lkbm2mo ago

They also sent an email.

tomwheeler2mo ago

And even if you read the banner on the site, the email they sent, and the announcement itself, you would not see instructions that mention the specific thing(s) you must change in order to opt out.

Sure, you can poke around in the settings and find one that you believe opts you out, but in lieu of clear and explicit instructions from GitHub, you'll have no way to find out. Only the possibility of finding out later that you guessed wrong.

jmward012mo ago

I've never seen the banner. Where does this show up?

arcanemachiner2mo ago

It's been on top of the web UI for 2 or 3 days now.

You might have closed it...

Just go to your account settings and find the opt-out option.

roegerle2mo ago

right up top. I'm not sure how anyone could miss it.

daveguy2mo ago

Probably have to have adblockers turned off.

_pdp_2mo ago

I have never seen any app reset/loose setting before.

lkbm2mo ago

What are you referring to? I set this to "Disabled" months/years ago and it's retained the disabled setting.

landl0rd2mo ago· 5 in thread

This headline is false; it will not go take your private repos and dump them into a training dataset. Rather, GitHub will train on your copilot interactions with your private repos. If you do not use copilot, this makes no difference to you, though you should probably still turn it off.

dotancohen2mo ago

What if one of my contributors uses copilot?

computomatic2mo ago

Then GitHub will train on their inputs, which includes your code.

Doesn’t seem to leave non-enterprise projects with much choice but to ban contributors from using copilot (to whatever extent they can - company policy, etc.)

hirako20002mo ago

That's also my read of the flag. But if they can train co pilot on input, I don't see what prevents them from training copilot on the code itself. In a court case they would simply say the opt in meant we can train from input. That's all we did.

olejorgenb2mo ago

To be fair, they display it reasonable prominently in GitHub when you are logged in. Given that, I feel the post title fall under the click bait category. I was fully aware of the Co-pilot opt-out change, but still clicked due the phrasing of the title.

ekjhgkejhgk2mo ago

I think this kind of nuance is useless or even harmful. That might be how it is now but they'll change it when you're not looking.

You see coders have this reasoning flaw where they go "Oh I've understood the system, now I can work out all the ramifications of my actions", and then they get tricked at every step of the life.

parsimo20102mo ago· 3 in thread

Jokes on them, my private repos are total dog dookie. If nobody but me can see the code then I don't have to worry about style, structure, comments, or any other best practices.

You don't want an LLM trained on my private repos. Trust me.

aduwah2mo ago

I will join the club. +1 for ruining M$ AI with my garbage code

forinti2mo ago

Poisoning LLMs is an interesting path of resistance.

roegerle2mo ago

Well known running code has more weight than unknown code that may not run. I think it’s pointless.

SunshineTheCat2mo ago· 3 in thread

RIP all the people who have been paying Github for years and never happen to see the notice.

tedivm2mo ago

I think opt out is stupid, but the notice is on every page of github using their banner display right now. They've also blasted out emails.

flykespice2mo ago

At least they are being very upfront with it (I guess?), most companies just slickly add the clause on their routinely TOS update.

malfist2mo ago

And how many people who use git on github go to the website? I only do when my token has expired and I need to grab a new one to push again. Which is every 90 days. Github.com is mostly invisible infrastructure to me.

yonatan80702mo ago· 3 in thread

How do I opt out of this for my own private repos? I don't see anything related to this as I've got a ton of settings for Copilot itself (I have access to Copilot through my work org)

jamie_ca2mo ago

https://github.com/settings/copilot/features, it's near the bottom "Allow GitHub to use my data for AI model training"

forthac2mo ago

I believe it is under:

Settings->Copilot->Features->Privacy=>[ Allow GitHub to use my data for AI model training

Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement. ]

hedayet2mo ago

Under privacy.

> Allow GitHub to use my data for AI model training

uberman2mo ago· 2 in thread

If even one person in a repo does not disable this will copilot have full access to the repo? How can I determine if other members of my team have turned this off or not?

hirako20002mo ago

The same way you can't determine whether a team member pulling the repo dumped the code into a prompt.

It's convenient for MS to make this opt in by default for sure.

elAhmo2mo ago

It’s not convenient, it is a deliberate decision.

mxtbccagmailcom2mo ago· 2 in thread

Time to put adversarial code into GitHub to pollute the training set?

ethagnawl2mo ago

`:(){ :|:& };:`s all the way down.

encrypted_bird2mo ago

Ah, yes, the ol' Bobby Tables maneuver. Haha.

prmoustache2mo ago· 2 in thread

While I understand the network effect of github for public project, I don't really understand why one would want to use it for private repos.

There are tons of git providers including free ones that include full gitlab/gitea/forgejo to get similar features to github and there is nothing more easy to self host or host on a vps with near zero maintenance.

w10-12mo ago

Sorry, which ones support 2-GB private repositories and are supported by package managers?

artyom2mo ago

The same reason b/c FreeBSD is great, but eventually it's transitioned to Linux at scale: commodity personnel.

You wouldn't believe the amount of people that would list Github, but not git, as a skill.

_pdp_2mo ago· 2 in thread

Rather than defending this absurd decision, GitHub could instantly win back trust by admitting they f*** up and reversing it entirely.

If they want to incentivise people to contribute their sources and copilot sessions, they could easily make it opt-in on a per-repository basis and provide some incentive, like an increased token quota.

This is not hard.

NegativeK2mo ago

AI is maximizing the move fast and break things approach, including not asking for permission from its userbase.

It's consistent with believing that AI is the future -- if a company doesn't perform really well, it loses that race. And if the userbase they piss off is also the userbase that's skeptical about AI, then they're not pissing off anyone that's relevant to the company winning.

Downside: Pissing off users is gross.

danaris2mo ago

The problem is, GitHub is owned by Microsoft, and Microsoft is desperately trying to shove AI into everything in hopes that it will save them.

jmward012mo ago· 2 in thread

They just lost my repos. I can not believe they snuck this in. My level of anger right now is far higher that I ever wanted to feel. I went to API access for anthropic, paying more in the process, to avoid them training on my code. And GH just -adds- this, without telling me? Without a prompt. They are dead to me.

ares6232mo ago

make sure you opt-out anyway before deleting your account. they'll probably train on some archived version if it sees your profile didn't opt-out at some point.

gverrilla2mo ago

honest question: is there any realistic mechanism that will make them accountable if let's say they just train on 100% of repos without regards to opt-ins? I operate under the premise these tech companies can do whatever they want and there's very little oversight.

sethops12mo ago· 2 in thread

When Louis Rossmann started describing tech leadership as having a "rapist mentality" I brushed him off as being sensationalist. But actions like this make me think more and more he's right. The product managers pushing for changes like this are despicable scum.

doubled1122mo ago

Even the way modern software phrases questions is rapey.

Imagine a man asking a woman “want to have sex? Or maybe later?” out of the blue, then asking her again every 3 days until she says “yes”

kingstnap2mo ago

There is this distinct lack of giving a shit about the user that you see coming through in a lot of big tech nowadays.

Take this extremely simple example about antenna pod. I can change the order and what buttons show up in the app nav bar. For example I can remove the "home" button or put other things there instead like playback history.

This is a small minor point of the bigger picture. Yet there is this distinct sense in which when using that app I don't feel like I'm beholden to some chain of management in some company deciding they get to decide what I get to do.

Like its almost unthinkable that the YouTube app let you remove shorts or reorder the navigation bar and decide what you wanted to have there.

maplethorpe2mo ago· 1 in thread

What's the best way to poison my repos to sabotage LLM training? Asking for a friend.

NegativeK2mo ago

By migrating to another code forge and paying them so they're sustainable.

Which doesn't answer your question at all, but it is the metric they'll pay attention to. And it is the the thing that actually addresses the underlying problem.

bonestamp22mo ago· 1 in thread

Thanks for the heads up, I assumed they had already done this with my data.

seanw4442mo ago

Probably did. Now comes the legal ass-covering.

livinglist2mo ago· 1 in thread

Thanks for posting this, I was never made aware of this by GitHub..

lkbm2mo ago

If you use Github, you should have an email from ~2 days ago with the subject "Important Update to GitHub Copilot Interaction Data Usage Policy". Easy to skip over assuming it's just one of a million private policy update emails.

If you don't use Github Copilot, this shouldn't effect you, and may be why you got no email. The current headline is fairly misleading--it's about Copilot usage, not private repos per se.

Esophagus42mo ago· 1 in thread

There’s a lot of furor in this thread, but people felt the same way when Google Street View came out. Eventually they worked through most of the thorny bits and people use Street View now.

I suspect MSFT is in a similar spot. If they don’t train on more data, they’ll be left behind by Anthropic/OAI. If they do, they’ll annoy a few diehards for a while, they’ll work through the kinks, then everyone will get used to it.

computomatic2mo ago

That comparison doesn’t hold at all. This would be equivalent to Google publishing photos of inside your home.

Or, perhaps more directly, training their image-gen models on your private Google Photos.

mrled2mo ago· 1 in thread

I'm curious about specific consequences of this. I tend to think the importance of code secrecy has always been exaggerated (there are specific exceptions like hedge fund strategies and malware), even more so now in this post-Claude world. Does anyone have specific things they're trying to avoid by opting out of this?

jawilson22mo ago

Algorithms and models for a proprietary trading system? My personal notes? The latex text of my phd thesis?

I will go screaming and kicking and fighting into this dystopian nightmare post-privacy shithole world that so many people seem fine with. If I have to move off of every service or technology to maintain some semblance of privacy so be it.

bolangi2mo ago· 1 in thread

Hah, github can have my crap code. Anyone trained on it will be in for a world of hurt :-)

Esophagus42mo ago

Can’t wait for copilot to start saying stuff like

// todo… remove this before it goes to prod lol

Sohcahtoa822mo ago· 1 in thread

I wonder how effective it would be to sabotage the training by publishing deliberately bad code. A FizzBuzz with O(n^2) complexity. A function named "quicksort" that actually implements bogosort. A "filter_xss" function that's a no-op or just does something else entirely.

The possibilities are endless. I thought of this after remembering seeing a post a couple months ago about how it doesn't take a significant amount of bad data to poison an LLM's training.

munk-a2mo ago

Probably extremely ineffective, it's an issue of scale and unless you really automate the terrible code generation and somehow manage to make it distinct enough in style that it isn't easy to detect and eliminate wholesale then you just won't have the volume to significantly impact the result set.

I'm absolutely sure that there are state actors with gigantic budgets that are putting a lot of effort into similar attacks, though.

rakel_rakel2mo ago· 1 in thread

I'm looking forward to the class action lawsuit, even if only to establish a precedent!

I don't have much hope, but I wish that ignoring software licensing and attribution at scale becomes harder than it currently seems.

rrgok2mo ago

They would've done the math. Even with a class action they will come up positive. It just another bill for them.

jokoon2mo ago· 1 in thread

weren't they already using repos for training?

darthwalsh2mo ago

Not private repos.

Now, anything that gets referenced in a copilot chat is fair game

jollyllama2mo ago· 1 in thread

It's not clear to me what happens to personal repos if you're getting Copilot for work, or where to disable it there.

djsavvy2mo ago

yeah, how can I view the settings on my own personal account if my employer is managing the copilot settings?

jambutters2mo ago· 1 in thread

Where does it say it will train on private? This seems like a security nightmare if it trains on hardcoded keys

chistev2mo ago

Having hardcoded keys is a security nightmare regardless.

harikb2mo ago· 1 in thread

The UI options are also shady af. The setting reads

Enabled - "You will have access to this feature" as help text. Disabled - "You will not have access to this feature".

WTF does that mean?

gs172mo ago

I saw that too, it feels like it's worded to make it sound like it's mandatory for Copilot. Based on their blog post the "feature" is them training on your data.

jongjong2mo ago· 1 in thread

Wow. This is theft. Should be illegal! It's like if I own a vault storage business and I am keeping other people's gold in my vaults and then I just take all the gold for myself and claim that the customers should have opted out of me stealing their gold but they missed the deadline...

zelphirkalt2mo ago

This hints at something, that in my opinion isn't not discussed enough:

Say some personal data leaked into training data, where can I request surgical deletion of that data from the LLM? Not only license washing is done using LLM, but also PII washing and consent ignoring is done using LLMs. How will a service provider make sure to not ever have personal data in the training data set and fix earlier mistakes pertaining to personal data? Are they not obliged to have a way of deleting one's personal data? GDPR or something?

_bypa2mo ago· 1 in thread

Thanks for flagging this!

layer82mo ago

Note that “flagging” has a specific meaning on HN.

bdangubic2mo ago· 1 in thread

That training will be like “OMG this is horrible… WAIT I wrote this shit”

salawat2mo ago

God, there's always that moment when you see the most shit code on earth, just as you're typing "git blame" and you just start chanting "please don't be me".

ChrisArchitect2mo ago

[dupe] Discussion on source: https://news.ycombinator.com/item?id=47521799

https://github.com/settings/copilot/features

munk-a2mo ago

The only setting I'm seeing is on a per-user basis. Does anyone know how to blanket disable training on an organizational basis?

Is there any information about how much information from an organization managed repo may be trained on if an individual user has this flag enabled? Will one leaky account cause all of our source code to be considered fair game?

w10-12mo ago

The feature to opt out is at the bottom under privacy: "Allow GitHub to use my data for AI model training"

TIL: you cannot opt out of a copilot-pro subscription. How is it a subscription if I can't cancel?

(Honestly, who has time to evade all these traps? Or to migrate 150+ repo's on 6+ machines...)

sedatk2mo ago

I have an individual GitHub Copilot Pro subscription and also am a member of an Enterprise account that has one of its GitHub Copilot Business seats assigned to me. The opt-out setting doesn't appear on my individual profile anymore. However, I want to be able to use individual GitHub Copilot subscription for my individual work, and it seems like I can't do it anymore as Enterprise has taken over all my preferences. What a mess.

GMoromisato2mo ago

I'm sure this is just me, but I don't mind if AI trains on my public or private repos. I suspect my imagination is just not good enough to come up with downsides.

So far it's been a benefit because coding agents seems to understand my code and can follow my style.

I don't store client data (much less credentials) in my repos (public or private) so I'm not worried about data leaks. And I don't expect any of my clients to decide to replace me and vibe code their way to a solution.

I do worry (slightly) about large company competitors using AI to lower their prices and compete with me, but that's going to happen regardless of whether anyone trains on my code. And my own increases in efficiency due to AI have made up for that.

jacamera2mo ago

Lots of hair splitting in the comments. The service is so unreliable at this point that I don’t trust them to not train on private repos even accidentally. You’re one vibe-coded PR away from having all your data scooped up regardless of any policy or intention.

AndrewKemendo2mo ago

I started self hosting my own git on a digital ocean droplet with Gitea (1). It’s been unbelievably fantastic and trivially easy to manage experience and I can make them public and invite contrib ans do integrations … I see zero downsides

I see no reason to ever go back to holding my code elsewhere.

Don’t forget git is fairly new

When I first started doing production code it was pre-github so we used some other kind of repo management system

This is a perfect example of where the they’re starting to cannibalize their base and now we have the ability to get away from them entirely.

(1) https://about.gitea.com/

endofreach2mo ago

How did people forget that github was purchased by that one company?

maxloh2mo ago

Context: https://github.com/orgs/community/discussions/188488

TLDR: As long as you aren't using Copilot, your code should be safe (according to GitHub).

  What data are you collecting?

  When an individual user has this setting enabled, the interaction data we may collect includes:

  - Outputs accepted or modified by the user
  - Inputs sent to GitHub Copilot, including code snippets shown to the model
  - Code context surrounding the user’s cursor position
  - Comment and documentation that the user wrote
  - File names, repository structure, and navigation patterns
  - Interactions with Copilot features including Chat and inline suggestions

bsza2mo ago

I've been encrypting my private git repos for a while because I had suspected they were going to do something like this.

https://github.com/flolu/git-gcrypt

It's very easy to set up and integrates nicely into git. Obviously only works if you don't need Actions or anything that requires Github to know what's in your repo (duh).

rrgok2mo ago

I'm gonna put a license fee on all my repos. 10% of revenue if my private repos have been used for AI training. 5% on all my other repos.

JonChesterfield2mo ago

Don't give your code to Microsoft if you don't want them to have your code.

This setting will make no difference to whether your code is fed into their training set. "Oops we accidentally ignored the private flag years ago and didn't realise, we are very sorry, we were trying to not do that".

torben-friis2mo ago

How's the codeberg experience nowadays? I think it's finally time to switch for me.

tartoran2mo ago

If you opt out Github will probably still train on your private repo. Just migrate.

shevy-java2mo ago

Microslop tries to make money off of our data on github. Not a big surprise though.

holoduke2mo ago

For 5 bucks you can host your own gitea with most GitHub functionally. I moved my 500 repos to it. Actions are working perfectly fine. I make daily snapshots on hetzner. Trust them for that backup part.

mxtbccagmailcom2mo ago

Time to place some adversarial code into GitHub to pollute training set?

wilsonjholmes2mo ago

At least they are finally being honest about the direction of the business. I have thought for a long while that they were already doing this and just not telling anyone...

134152mo ago

It is the feature "Allow GitHub to use my data for AI model training" that needs to be disabled. Right?

Or am I missing some trick / dark GUI pattern? Just want to make sure.

> https://github.blog/news-insights/company-news/updates-to-gi...

shifto2mo ago

In my case, co-pilot will be training on co-pilot code. I'm probably not alone so I don't think they're getting what they hope they're getting.

classified2mo ago

They steal your code to train their AI, and then they sell it back to you. Why didn't I think of that, I could be rich by now.

gafferongames2mo ago

If you guys didn't already realize that Microsoft was a garbage company in the 90s I really don't know what to say...

roegerle2mo ago

Do people not browse GitHub? All I’m reading is “I’m never at the web ui”.

I love falling into a rabbit hole looking at people’s projects

dalemhurley2mo ago

At least they are giving you the option to opt out, many other providers just trained on the source code.

hilti2mo ago

Oh - they didn't train silently already?! ;-) Going to move my repositories then next week.

mondainx2mo ago

Get ready for some dope code... ;)

api2mo ago

Not your storage, not your data (unless it's encrypted with keys you control).

VladVladikoff2mo ago

The most shocking part of this news to me is that they aren’t doing this already.

Uhhrrr2mo ago

Put an ORM in your private repo which randomly 1% of the time calls DROP TABLE.

frizlab2mo ago

Is there a way to disable training on repositories that are in organizations?

woodylondon2mo ago

jokes on them - all the code in all my repos are written by AI :)

moralestapia2mo ago

Is this the case even if you're a paid customer?

If so, this might be illegal.

totierne22mo ago

There is always other peoples ftp servers as Linus used to say.

hexage18142mo ago

If you opt out... they will also train on your private repos.

jonniebullie2mo ago

Any recommendations for light use GitHub users. ??

piekvorst2mo ago

Personally, I don’t mind. Train however you want.

pokot02mo ago

while I agree, I understood this is only when you use copilot? if not, their communication is very misleading

yakbarber2mo ago

train on my private code? jokes on them

daft_pink2mo ago

is there an easy way to shift all your repos to gitlab or to private if you don’t use ci/etc?

uwagar2mo ago

why all u programmers cant make ur own website and host ur own git servers?

contingencies2mo ago

Thank you.

victorbjorklund2mo ago

Thanks for the heads up.

jpcrs2mo ago

Good luck to them, my private repos are probably some of the worst code humanity has produced.

ljm2mo ago

Never have I seen a company try so damn hard to make something a thing than Microsoft and Copilot.

And it is absolute dogshit. And offensive to actual copilots.

nitrogen992mo ago

So? It’s not like some human is spying on your private emails or chats. This is just code. Relax.

Ancalagon2mo ago

This is the worst year of enshittification I can recall. Literally everything is going to shit.

starkeeper2mo ago

So now CoPilot will be EVEN better at writing viruses, worms and malware!

tantalor2mo ago

"Don't touch my garbage!"

shamelessdev2mo ago

This is the exact reason I vibe coded “artifact”.

Not for commercial success, just wanted a git and github like experience for my new game project.

Then I started getting into features specific to game dev like moving away from LFS and properly diffing binaries.

paganartifact.com/benny/artifact

Mirror: GitHub bennyschmidt/artifact

leej1112mo ago

Based

shell0x2mo ago

Shouldn’t this be “Tell HN”?

j / k navigate · click thread line to collapse

316 comments

216 comments · 80 top-level

martinwoodward2mo ago· 45 in thread

No we won’t. Details here https://github.blog/news-insights/company-news/updates-to-gi...

For users of Free, Pro and Pro+ Copilot, if you don’t opt out then we will start collecting usage data of Copilot for use in model training.

If you are a subscriber for Business or Pro we do not train on usage.

Hope that helps.

qaadika2mo ago

> Should you decide to participate in this program, the interaction data we may collect and leverage includes:

> - Outputs accepted or modified by you

> - Inputs sent to GitHub Copilot, including code snippets shown to the model

> - Code context surrounding your cursor position

> - Comments and documentation you write

> - File names, repository structure, and navigation patterns

> - Interactions with Copilot features (chat, inline suggestions, etc.)

> - Your feedback on suggestions (thumbs up/down ratings)

"should you decide to participate.."??? You didn't ask if I wanted to participate. You asked if I didn't.

I didn't get to decide to participate. I had to decide not to. You made me do work to prevent my privacy from being violated.

vscode-rest2mo ago

Do you use copilot?

jffry2mo ago

It's unnecessarily splitting hairs.

> interaction data—specifically inputs, outputs, code snippets, and associated context [...] will be used to train and improve our AI models

So using Copilot in a private repo, where lots of that repo will be used as context for Copilot, means GitHub will be using your private repo as training data when they were not before.

tptacek2mo ago

6 more replies

pverheggen2mo ago

munk-a2mo ago

tadfisher2mo ago

> By April 24th Github will train on private repos

This statement itself is misleading. Also, GitHub probably should have seen this coming.

They are not doing what I initially thought, which is slurping up your private repo, wholesale, into its training set. You don't have to opt out of anything to prevent that.

They are slurping any context and input containing code from your private repo which is provided to them as part of using Copilot.

So, in addition to the opt-out setting, there is an even easier way to avoid providing them your private repository data to train AI models, and that's by continuing to not use Copilot.

andoando2mo ago

Thats still pretty bad. Its no longer private if all your code goes through LLM training set and is resurfable to everyone publicly.

NewsaHackO2mo ago

layer82mo ago

In the EU, opt-out is not a legally valid way to obtain the necessary consent. How do you plan to handle this?

booi2mo ago

probably by paying the fine and doing it anyway

x0x02mo ago

For personal data. I don't believe you can reasonably claim code is personal data any more than a hammer is your personal data.

otterley2mo ago

""" Allow GitHub to use my data for AI model training

Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement. """

saghm2mo ago

Yes, you will. This is what the setting says on my account when I clicked the link:

> model training

> Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement

ziml772mo ago

Jolter2mo ago

If you are not willing to migrate out of GitHub, what you can do is to avoid using Copilot on your private repository.

wewtyflakes2mo ago

edelbitter2mo ago

> If you don’t use Copilot this will not affect you.

How does this work for a private repository with access granted to additional contributors? Which setting is consulted then?

daveguy2mo ago

Nice try. If you're training on "inputs" to Copilot then you are training on the private repos.

This suspect denial is why I will get my clients moved off of github.

grepfru_it2mo ago

Back in my day someone would post a HN article to the internal slack in order to sway conversation in their favor. Glad to see its still happening! :D

BoredPositron2mo ago

SirensOfTitan2mo ago

Honestly, if you work at GitHub, maybe you should focus on your uptime -- it's awful.

mrdependable2mo ago

I think the problem is more with using PRIVATE repos. My letters are also private and I would be pretty pissed if the mail carrier was reading them. Why does GitHub think it has the right to do this?

languid-photic2mo ago

Appreciate the clarification. But, it's still not great.

To the PM behind this - developers are sensitive to this kind of thing. Just make it opt-in instead?

dataflow2mo ago

mrits2mo ago

Thanks for confirming you train on our data

pokot02mo ago

ClikeX2mo ago

kingkandu2mo ago

Because microsuck is about to violate the law that many times

jawilson22mo ago

I'm in the process of moving all of my repos off of github and deleting that account.

Hope that helps.

_pdp_2mo ago

So you will train on data collected from free users working on GPL and copyrighted projects?

DougN72mo ago

And on users that don’t even use github, other than the required account to use CoPilot in Visual Studio.

https://news.ycombinator.com/item?id=37124188

johndough2mo ago

ziml772mo ago

Thanks for the clarification. The OP here made me think I missed something in both the blog post about the change and in the available settings.

gortok2mo ago

This is a distinction without a difference, according to the text of that enable/disable dialog,

> Allow GitHub to use my data for AI model training: Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement.

“Associated Context” is the repo. If I use copilot, I’m giving it access to my repo.

Jabrov2mo ago

Can't you just make it opt-in?

No? Because no one would opt-in, you say?

Wow. It's almost like this is a user-hostile feature that breaks the implicit promise behind a "private" repo.

nickvec2mo ago

happytoexplain2mo ago

As others have pointed out, this is somewhat dishonest. Which is depressing, if you represent GitHub.

buildbot2mo ago

>Hope that helps

Honestly, what the fuck? This changes was already pretty bad but this being the apparent corporate response is insane.

Done with Github and Microsoft after this. Just disgusting how little you care for users, ethics, or morals.

pesus2mo ago

Why not get user consent first?

irishcoffee2mo ago

I am aware of CUI data hosted on github by corporate entities. You’re saying you’ll essentially violate the entire point of CUI?

That’s fucking terrifying.

elAhmo2mo ago

Defaulting to opt-in is a malicious move, no matter how you present things.

ethanwillis2mo ago

"hope that helps"

Why the smug sarcastic attitude? nah, fuck github i'm out.

anarticle2mo ago

tl;dr: installed gitlab.

I'm not bidding against you to not train on my data.

inopinatus2mo ago

“Opt-out” is an egregiously toxic and unethical approach to consent and should be illegal everywhere that it isn’t already.

I didn’t think Github had much of a brand left to damage, but here we are.

kepano2mo ago· 24 in thread

I've been saying this since 2023

mememememememo2mo ago

Forgeties792mo ago

I worry about a post-Gabe valve for this reason.

miohtama2mo ago

Microsoft would never do this

(-:

random32mo ago

slowhadoken2mo ago

I’m still concerned about MS using the code I write on my laptop to train AI. Tinfoil hat wearing Linux users are starting to make a lot of sense to me.

qaadika2mo ago

I didn't become paranoid, everybody else didn't!

DougN72mo ago

account422mo ago

b1122mo ago

It's not tinfoil, it's aluminum foil. I.. I mean, I heard it's that.

ekjhgkejhgk2mo ago

You're right, of course, and I find it frustrating that people are so thick as to not see your claim as obvious.

Stallman is always right.

itsdesmond2mo ago

So I dunno bout that.

jamiek882mo ago

About technology.

About communication with other humans he’s pretty much always wrong.

Imagine we’d had a better communicator who wasn’t a gross toe nail picking troll fronting free software? It shouldn’t matter. Only the ideas should matter . But the reality is different.

worik2mo ago

> Stallman is always right.

Not really. Almost always right....

moralestapia2mo ago

Thank you for your service. We really need more "canaries in the mine" giving out early warnings of things that might not be evident on a first glance.

Any takes on what 2029 will look like? (related to this topic, ofc)

chistev2mo ago

Now this is sarcasm. Lol

hugodan2mo ago

and it is not end-to-end encrypted if you don't own the keys, avoid bullshit

cj2mo ago

Edit: Okay, sounds like you guys are pissed to the point where it seems like the pro tip here is to stop using GitHub.

Pro tip: sign up for the business/enterprise version when reasonable in price.

I do this with Google Workspace. You can also do it with GitHub.

(Google doesn’t train on Workspace, Github doesn’t train on business customers, etc)

worble2mo ago

Pro tip: You could instead spend that money to spin up a forgejo instance for as little as $2 a month https://www.pikapods.com/apps#development (not affiliated, just a happy customer)

Please don't reward these companies with money.

thot_experiment2mo ago

Probably don't reward extortion with money.

Lio2mo ago

An enterprise licence won't save you, Google, Microsoft, et al have happily been breaking copyright laws for years.

If the publishing industry can't win a case against the AI firms then you don't stand a chance when you finally find out they've been training on your private data the whole time.

They can tell you one thing and do the opposite and there's effectively nothing you can do about it. You'd be a fool to trust them.

saghm2mo ago

At the risk of stating the obvious, I don't think it makes sense to reward them with money for trying to pull a bait-and-switch on this.

groby_b2mo ago

Github's enterprise version "starts at" $21.99/seat, and requires you to "contact sales".

And I don't see any mention that that exempts you from being trained on. (Yes, the blog says you're still covered, but at that price I'd like to see a contract saying that)

margalabargala2mo ago

> Google doesn’t train on Workspace, Github doesn’t train on business customers, etc

...yet

[0]: https://forum.gitlab.com/t/can-i-opt-out-from-my-code-being-...

throwuxiytayq2mo ago

It's not a pro tip if it only fucks you over slightly later. How's the weather in Stockholm?

kristianp2mo ago· 16 in thread

What's a good alternative for free private repos?

eblume2mo ago

I've recently started hosting my own forgejo instance. It works so well! Free tailscale for connectivity. I expose mine over fly.io proxy, also free, but not to be done without caution.

Supermancho2mo ago

Gitlab?

Microsoft services are tech debt. I moved the moment they were acquired and never regretted it.

nottorp2mo ago

I opened gitlab.com and it starts with

"Finally, AI for the entire software lifecycle."

Not very trust inspiring, that.

Can I even have git hosting without anything else being crammed down my throat, or it's just like Microsoft?

mrweasel2mo ago

It's a fair question, but if you need private repos, I think you need to start considering a paid option, or self-host.

If it's really important to you that the repo is private, I'd self-host.

conductr2mo ago

piersj2252mo ago

I've not tried this, however https://github.com/AGWA/git-crypt

Apparently someone has developed something similar to this

Imustaskforhelp2mo ago

I would've recommended codeberg but codeberg isn't the finest to be recommended for free private repos.

I definitely feel like more can be done within this space and that there is space for more competitors (even forgejo instances for that matter)

sebastiennight2mo ago

GitLab would be a good bet here. We started on their free tier and used that for a couple of years, I was very happy with it. Not sure how the tiers might have evolved since.

And according to their PM and privacy policy, they're not training their models on your code[0].

pyjarrett2mo ago

It doesn't take much power or time to run your own local git server. My first one which lasted years was parts I mangled together from old computers from garage sales.

There's instructions on running a Git server in the git book: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protoco...

wuschel2mo ago

Sourcehut comes to my mind: https://sourcehut.org/

werdnapk2mo ago

I've been using gitosis to manage private repos for almost 2 decades now. It's extremely easy to host your own repositories.

I just looked up gitosis on github though and it was last updated 12 years ago.... still works for me though.

Overall, hosting your own repos is very easy.

bigstrat20032mo ago

I use Fossil for mine. Dead easy to set up, and while the workflow might not be great for public contributions like Github is, that doesn't matter on something where I'm the only user.

stephenr2mo ago

I've seen https://codefloe.com mentioned, can't say I've used it myself yet though.

bonestamp22mo ago

BitBucket.org (Atlassian)

JonChesterfield2mo ago

Any computer you have ssh access to.

throwaway6137462mo ago

A cheap mini-pc in your closet.

hedayet2mo ago· 10 in thread

To Github's credit, they have been showing a banner consistently. To my discredit - I never bothered to read that banner until I saw this HN headline

nottorp2mo ago

How does that help if you don't go to the github site but just use git from the command line?

fph2mo ago

Can you use git's Copilot from the command line? If you can't, then you have nothing to opt out from.

lkbm2mo ago

They also sent an email.

tomwheeler2mo ago

And even if you read the banner on the site, the email they sent, and the announcement itself, you would not see instructions that mention the specific thing(s) you must change in order to opt out.

jmward012mo ago

I've never seen the banner. Where does this show up?

arcanemachiner2mo ago

It's been on top of the web UI for 2 or 3 days now.

You might have closed it...

Just go to your account settings and find the opt-out option.

roegerle2mo ago

right up top. I'm not sure how anyone could miss it.

daveguy2mo ago

Probably have to have adblockers turned off.

_pdp_2mo ago

I have never seen any app reset/loose setting before.

lkbm2mo ago

What are you referring to? I set this to "Disabled" months/years ago and it's retained the disabled setting.

landl0rd2mo ago· 5 in thread

dotancohen2mo ago

What if one of my contributors uses copilot?

computomatic2mo ago

Then GitHub will train on their inputs, which includes your code.

Doesn’t seem to leave non-enterprise projects with much choice but to ban contributors from using copilot (to whatever extent they can - company policy, etc.)

hirako20002mo ago

olejorgenb2mo ago

ekjhgkejhgk2mo ago

I think this kind of nuance is useless or even harmful. That might be how it is now but they'll change it when you're not looking.

You see coders have this reasoning flaw where they go "Oh I've understood the system, now I can work out all the ramifications of my actions", and then they get tricked at every step of the life.

parsimo20102mo ago· 3 in thread

Jokes on them, my private repos are total dog dookie. If nobody but me can see the code then I don't have to worry about style, structure, comments, or any other best practices.

You don't want an LLM trained on my private repos. Trust me.

aduwah2mo ago

I will join the club. +1 for ruining M$ AI with my garbage code

forinti2mo ago

Poisoning LLMs is an interesting path of resistance.

roegerle2mo ago

Well known running code has more weight than unknown code that may not run. I think it’s pointless.

SunshineTheCat2mo ago· 3 in thread

RIP all the people who have been paying Github for years and never happen to see the notice.

tedivm2mo ago

I think opt out is stupid, but the notice is on every page of github using their banner display right now. They've also blasted out emails.

flykespice2mo ago

At least they are being very upfront with it (I guess?), most companies just slickly add the clause on their routinely TOS update.

malfist2mo ago

yonatan80702mo ago· 3 in thread

How do I opt out of this for my own private repos? I don't see anything related to this as I've got a ton of settings for Copilot itself (I have access to Copilot through my work org)

jamie_ca2mo ago

https://github.com/settings/copilot/features, it's near the bottom "Allow GitHub to use my data for AI model training"

forthac2mo ago

I believe it is under:

Settings->Copilot->Features->Privacy=>[ Allow GitHub to use my data for AI model training

Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement. ]

hedayet2mo ago

Under privacy.

> Allow GitHub to use my data for AI model training

uberman2mo ago· 2 in thread

If even one person in a repo does not disable this will copilot have full access to the repo? How can I determine if other members of my team have turned this off or not?

hirako20002mo ago

The same way you can't determine whether a team member pulling the repo dumped the code into a prompt.

It's convenient for MS to make this opt in by default for sure.

elAhmo2mo ago

It’s not convenient, it is a deliberate decision.

mxtbccagmailcom2mo ago· 2 in thread

Time to put adversarial code into GitHub to pollute the training set?

ethagnawl2mo ago

`:(){ :|:& };:`s all the way down.

encrypted_bird2mo ago

Ah, yes, the ol' Bobby Tables maneuver. Haha.

prmoustache2mo ago· 2 in thread

While I understand the network effect of github for public project, I don't really understand why one would want to use it for private repos.

w10-12mo ago

Sorry, which ones support 2-GB private repositories and are supported by package managers?

artyom2mo ago

The same reason b/c FreeBSD is great, but eventually it's transitioned to Linux at scale: commodity personnel.

You wouldn't believe the amount of people that would list Github, but not git, as a skill.

_pdp_2mo ago· 2 in thread

Rather than defending this absurd decision, GitHub could instantly win back trust by admitting they f*** up and reversing it entirely.

This is not hard.

NegativeK2mo ago

AI is maximizing the move fast and break things approach, including not asking for permission from its userbase.

Downside: Pissing off users is gross.

danaris2mo ago

The problem is, GitHub is owned by Microsoft, and Microsoft is desperately trying to shove AI into everything in hopes that it will save them.

jmward012mo ago· 2 in thread

ares6232mo ago

make sure you opt-out anyway before deleting your account. they'll probably train on some archived version if it sees your profile didn't opt-out at some point.

gverrilla2mo ago

sethops12mo ago· 2 in thread

doubled1122mo ago

Even the way modern software phrases questions is rapey.

Imagine a man asking a woman “want to have sex? Or maybe later?” out of the blue, then asking her again every 3 days until she says “yes”

kingstnap2mo ago

There is this distinct lack of giving a shit about the user that you see coming through in a lot of big tech nowadays.

Like its almost unthinkable that the YouTube app let you remove shorts or reorder the navigation bar and decide what you wanted to have there.

maplethorpe2mo ago· 1 in thread

What's the best way to poison my repos to sabotage LLM training? Asking for a friend.

NegativeK2mo ago

By migrating to another code forge and paying them so they're sustainable.

Which doesn't answer your question at all, but it is the metric they'll pay attention to. And it is the the thing that actually addresses the underlying problem.

bonestamp22mo ago· 1 in thread

Thanks for the heads up, I assumed they had already done this with my data.

seanw4442mo ago

Probably did. Now comes the legal ass-covering.

livinglist2mo ago· 1 in thread

Thanks for posting this, I was never made aware of this by GitHub..

lkbm2mo ago

If you don't use Github Copilot, this shouldn't effect you, and may be why you got no email. The current headline is fairly misleading--it's about Copilot usage, not private repos per se.

Esophagus42mo ago· 1 in thread

There’s a lot of furor in this thread, but people felt the same way when Google Street View came out. Eventually they worked through most of the thorny bits and people use Street View now.

computomatic2mo ago

That comparison doesn’t hold at all. This would be equivalent to Google publishing photos of inside your home.

Or, perhaps more directly, training their image-gen models on your private Google Photos.

mrled2mo ago· 1 in thread

jawilson22mo ago

Algorithms and models for a proprietary trading system? My personal notes? The latex text of my phd thesis?

bolangi2mo ago· 1 in thread

Hah, github can have my crap code. Anyone trained on it will be in for a world of hurt :-)

Esophagus42mo ago

Can’t wait for copilot to start saying stuff like

// todo… remove this before it goes to prod lol

Sohcahtoa822mo ago· 1 in thread

The possibilities are endless. I thought of this after remembering seeing a post a couple months ago about how it doesn't take a significant amount of bad data to poison an LLM's training.

munk-a2mo ago

I'm absolutely sure that there are state actors with gigantic budgets that are putting a lot of effort into similar attacks, though.

rakel_rakel2mo ago· 1 in thread

I'm looking forward to the class action lawsuit, even if only to establish a precedent!

I don't have much hope, but I wish that ignoring software licensing and attribution at scale becomes harder than it currently seems.

rrgok2mo ago

They would've done the math. Even with a class action they will come up positive. It just another bill for them.

jokoon2mo ago· 1 in thread

weren't they already using repos for training?

darthwalsh2mo ago

Not private repos.

Now, anything that gets referenced in a copilot chat is fair game

jollyllama2mo ago· 1 in thread

It's not clear to me what happens to personal repos if you're getting Copilot for work, or where to disable it there.

djsavvy2mo ago

yeah, how can I view the settings on my own personal account if my employer is managing the copilot settings?

jambutters2mo ago· 1 in thread

Where does it say it will train on private? This seems like a security nightmare if it trains on hardcoded keys

chistev2mo ago

Having hardcoded keys is a security nightmare regardless.

harikb2mo ago· 1 in thread

The UI options are also shady af. The setting reads

Enabled - "You will have access to this feature" as help text. Disabled - "You will not have access to this feature".

WTF does that mean?

gs172mo ago

I saw that too, it feels like it's worded to make it sound like it's mandatory for Copilot. Based on their blog post the "feature" is them training on your data.

jongjong2mo ago· 1 in thread

zelphirkalt2mo ago

This hints at something, that in my opinion isn't not discussed enough:

_bypa2mo ago· 1 in thread

Thanks for flagging this!

layer82mo ago

Note that “flagging” has a specific meaning on HN.

bdangubic2mo ago· 1 in thread

That training will be like “OMG this is horrible… WAIT I wrote this shit”

salawat2mo ago

God, there's always that moment when you see the most shit code on earth, just as you're typing "git blame" and you just start chanting "please don't be me".

ChrisArchitect2mo ago

[dupe] Discussion on source: https://news.ycombinator.com/item?id=47521799

https://github.com/settings/copilot/features

munk-a2mo ago

The only setting I'm seeing is on a per-user basis. Does anyone know how to blanket disable training on an organizational basis?

w10-12mo ago

The feature to opt out is at the bottom under privacy: "Allow GitHub to use my data for AI model training"

TIL: you cannot opt out of a copilot-pro subscription. How is it a subscription if I can't cancel?

(Honestly, who has time to evade all these traps? Or to migrate 150+ repo's on 6+ machines...)

sedatk2mo ago

GMoromisato2mo ago

I'm sure this is just me, but I don't mind if AI trains on my public or private repos. I suspect my imagination is just not good enough to come up with downsides.

So far it's been a benefit because coding agents seems to understand my code and can follow my style.

jacamera2mo ago

AndrewKemendo2mo ago

I see no reason to ever go back to holding my code elsewhere.

Don’t forget git is fairly new

When I first started doing production code it was pre-github so we used some other kind of repo management system

This is a perfect example of where the they’re starting to cannibalize their base and now we have the ability to get away from them entirely.

(1) https://about.gitea.com/

endofreach2mo ago

How did people forget that github was purchased by that one company?

maxloh2mo ago

Context: https://github.com/orgs/community/discussions/188488

TLDR: As long as you aren't using Copilot, your code should be safe (according to GitHub).

  What data are you collecting?

  When an individual user has this setting enabled, the interaction data we may collect includes:

  - Outputs accepted or modified by the user
  - Inputs sent to GitHub Copilot, including code snippets shown to the model
  - Code context surrounding the user’s cursor position
  - Comment and documentation that the user wrote
  - File names, repository structure, and navigation patterns
  - Interactions with Copilot features including Chat and inline suggestions

bsza2mo ago

I've been encrypting my private git repos for a while because I had suspected they were going to do something like this.

https://github.com/flolu/git-gcrypt

It's very easy to set up and integrates nicely into git. Obviously only works if you don't need Actions or anything that requires Github to know what's in your repo (duh).

rrgok2mo ago

I'm gonna put a license fee on all my repos. 10% of revenue if my private repos have been used for AI training. 5% on all my other repos.

JonChesterfield2mo ago

Don't give your code to Microsoft if you don't want them to have your code.

torben-friis2mo ago

How's the codeberg experience nowadays? I think it's finally time to switch for me.

tartoran2mo ago

If you opt out Github will probably still train on your private repo. Just migrate.

shevy-java2mo ago

Microslop tries to make money off of our data on github. Not a big surprise though.

holoduke2mo ago

mxtbccagmailcom2mo ago

Time to place some adversarial code into GitHub to pollute training set?

wilsonjholmes2mo ago

At least they are finally being honest about the direction of the business. I have thought for a long while that they were already doing this and just not telling anyone...

134152mo ago

It is the feature "Allow GitHub to use my data for AI model training" that needs to be disabled. Right?

Or am I missing some trick / dark GUI pattern? Just want to make sure.