Poor functionary creates political incident with humble template...sounds like a Gogol short story. "but it worked great for redirect.pizza!"
Btw I assume this recent thread was about the same feature:
Secret scanning is now available for free on public repositories - https://news.ycombinator.com/item?id=34007637 - Dec 2022 (70 comments)
Even though I'm a paid github customer, I had no idea they had a program called "secret scanning" and that it's actually beneficial.
So I obviously assumed they're letting China scan my private repos.
They really need to work on wording.
Fyi... this feature was also previously mentioned in the news for public repos: https://techcrunch.com/2022/12/15/github-brings-free-secret-...
>So I obviously assumed they're letting China scan my private repos.
To clarify, it's Microsoft/Github doing the scanning of private repos on behalf of the partners. They're just forwarding the tokens that match the partners' regexp.
Edit: how about dropping the corporatese and title it "github will now scan public repos for secret WeChat tokens"?
Assuming your question is not a joke...
The partner has to email the regex to secret-scanning@github.com for their approval. See the steps at: https://docs.github.com/en/developers/overview/secret-scanni...
Once it's in the scanning system, the partner receives JSON messages alerts such as:
[
{
"token":"NMIfyYncKcRALEXAMPLE",
"type":"mycompany_api_token",
"url":"https://github.com/octocat/Hello-World/blob/12345600b9cbe38a219f39a9941c9319b600c002/foo/bar.txt",
"source":"content"
}
]
So instead of ""token":"NMIfyYncKcRALEXAMPLE"," -- the private repo owners would worry about '.*' regex leaking full source code instead of API credentials such as ""token":"#include <stdio.h>\nmain(){\nprintf("hello world");\n}","The above scenario requires believing the following:
- Microsoft/Github is technically incompetent and an employee and/or their internal regex sanity checking tool will blindly accept open-ended regex like '.*'
- MS/Github will then allow that unbounded regex to leak petabytes of private source code out to China partners via the JSON "token:" response. (Github says they have 18+ petabytes of data and most of that is private repos: https://twitter.com/github/status/1569852682239623173)
If one believes their entire private repo source code is at risk of being copied to TenCent being leaked by the '.*' threat because the above scenario seems realistic, I assume the answer is to delete the repo.
Yes, that is nonsense.
1) secret scanning can be disabled (not even sure it's enabled by default). 2) the regexes are fairly specific, length limited, etc. 3) github is obviously reviewing regexes that are accepted.
Check the list of stuff supported: https://docs.github.com/en/code-security/secret-scanning/sec...
A bit sad, they don't publish the list of regexes, etc.
--------------
I added a similar thing to the package manager for Dart / Flutter, because we saw users accidentally publishing secrets. That code is public, it relies on regexes and entropy estimation:
https://github.com/dart-lang/pub/blob/eb8ee21a089ebe0f2c2dd8...
It was heavily inspired by the researchers in: https://www.ndss-symposium.org/wp-content/uploads/2019/02/nd...
Worth a read, and certainly provides motivation for Github to do this kind of work :D
(disclosure: I work for Google. The opinions stated here are my own)
Are the secret patterns all publicly available? Or is the secret scanning patterns themselves secret? Without public review, we cannot know what secrets they will obtain.
I for one do not trust GutHub/Microsoft to act in the interest of the average user. Their past actions disqualify them from receiving any benefit of doubt.
Though perhaps that’s just my own bias on the subtle differences in the meanings of those words.
It's not scanning that they're doing in secret. Credential scanning removes the ambiguity
But this is an excellent next step where they build an integration with these partners where, as soon as a secret is scanned, they can notify tencent/AWS/other providers automatically to instantly invalidate those keys before they’re abused.
That’s what’s novel here.
These "leaked" secrets GitHub forwards might be dissidents getting access without being tracked. It might not be a WeChat secret at all who knows? They're not a trustworthy partner, nothing should be shared with this company.
And to the folks saying it's public information and they already have it: That makes no sense, then they don't need GitHubs help. Obviously GitHub is supporting their scanning efforts here.
GitHub has a global stream API for all public events,[1] but it is delayed by five minutes, precisely so that sensitive actions like revoking leaked tokens can be performed before the world sees them. That’s what the secret scanning program is about, and you would have known if you spent 1/3 of the time of your rant learning about it.
Edit: Additionally, for private repos, secret scanning is opt-in and only alerts owners.
[1] https://docs.github.com/en/rest/activity/events?apiVersion=2...
There isn't a country in the world which does this. But the details are also not the main point, it's how extremely restricted and controlled simple access to information or forums of free expression is for people in China. Tencent has party officials working within the company. This isn't a regular business as Westerners might imagine it, it's an extended part of the CCP just like any other large corporation under Xi.
Again, people are saying it's no big deal but why would GitHub help them at all? It's not a good cause.
My government requires me to have ID, which contains a photo and finger prints and you cannot get a SIM without ID. That's Germany and it's true for many, many countries.
Does what? The thing extra is the fingerprint but literally every modern country requires ID registration and more. My government also knows this IP belongs exactly to me. Stop spouting nonsense.
Plus this is completely unrelated.
It was very fascinating to see, a near total domination of WeChat everywhere and relatively very hard onboarding for new accounts. Contrary to the west where most of services seek to streamline onboarding as much as possible - I guess that becomes an anti feature when you have total monopoly and _everyone_ has a WeChat account. I think it's a very effective (and very dystopian) form of control. P.S: Signal worked without any problems for me, even on a Chinese SIM (one "trick" to go around most of the GFW was buying a HK SIM in HK. Works across china and has a lot less blocks, but for various reasons I got a China SIM too).
GitHub is available in China, why shouldn't they protect their Chinese users?
And the SIM card requirements have nothing to do with Tencent, have you tried getting a SIM in Germany? Impossible without government ID and an address. And there are a lot of services which you can't sign up for without German ID / address. As a foreigner I also can't easily open a bank account in the US.
The threat here, in the worst case, is associating a GitHub ID with a WeChat ID.
> We have partnered with Tencent WeChat to scan for their tokens and help secure our mutual users on all public repositories and private repositories with GitHub Advanced Security.
This is GitHub scanning private repos and telling WeChat about them.
WeChat can already scan public repos.
They are not already screwed if they’re publishing something to a private repo, it might be the wrong way to do it, but it doesn’t mean they’re already screwed.
If you don’t trust GitHub’s private repo security then why are you using it in the first place?
https://docs.github.com/en/code-security/secret-scanning/abo...
This is about preventing things like API keys from being published to code. That’s not a dissident use-case…
Is this whataboutism? Possibly – but what I'd actually like to happen is US-based companies are charged company-hurting fines for mismanaging PII like this (Twitter, for example, is currently openly planning to sell user phone data [1] that they previously gathered for security purposes).
All this to say, we can't reasonably call out other dystopian companies if the ones we use everyday are doing the exact same thing. So we should call out secret scanning from Meta [2] and (if it ever happens) Twitter as well.
----------------------------------------
[1] https://www.businessinsider.com/twitter-plans-to-force-users...
[2] https://developers.facebook.com/blog/post/2021/11/09/meta-jo...
"Leaked" here means "made public", i.e. "published such that literally anyone can use them", for example when burned into a commit of a public repo. Even for a dissident, publishing an API key or other credential where literally anyone can find it to use it, is almost assuredly a mistake. Because external scrapers can also find it there, such that the key will be inevitably picked up and fed into a botnet to abuse — at which point the ops staff at the service will notice the abuse and revoke the key, thus "burning" it as useful from the dissident's perspective.
If you store a secret on Github somewhere that only people and people you trust have access to, rather than everyone having access to it, then this is not considered a "leak", and so Github does not detect this as a "leaked secret." For example, commit data of private repos is not scanned for secrets (if it was, GitOps as a concept would be impossible!); nor are a repo's formal Actions Secrets store (part of a repo's configuration readable only by triggered Github Actions CI jobs).
Github's own secret-scanning here, is trying to catch the cases where a user has done something stupid by accident. Whether or not they reported secrets to third parties, they'd still be doing leaked-secret scanning of their own Github API keys, to ensure that people aren't accidentally trying to configure Github Actions by burning their Github Actions CI API key into the workflow itself. If they find such keys, they revoke them.
The point of Github's secret-scanning partner program, is that because Github is doing this leaked-secret scanning for their own purposes anyway, you (the partner) can sign up to be told when API keys of yours are accidentally made public as well.
> That makes no sense, then they don't need GitHubs help.
Ignoring for a moment that Github is a website, and so anyone can just crawl it—
Did you know? Github pushes the commit data of all public repos to BigQuery as a public research dataset: https://codelabs.developers.google.com/codelabs/bigquery-git.... Literally anyone can do their own "secret scanning" with a simple BigQuery query. It costs about $500 to run such a query, because the Github dataset is pretty large. It's not a price most SMEs would pay. But it's definitely a price attackers could be willing willing to pay. It's a lot cheaper than running your own web-spider infrastructure!
The difference with Github's own secret scanning, is that it happens synchronously, on push of commits; whereas the ETL of commit data to Github et al happens asynchronously, some time after commits happen. Tencent — and every other secret-scanning partner — depends on Github to stay ahead of any third-party attackers trying to scrape leaked credentials for use in botnets et al.
Also, FYI, you yourself can sign up to be a Github secret-scanning partner. You just need 1. a regex that uniquely identifies your secrets, so that Github can recognize them on push, and 2. a webhook URL to report them to. (https://docs.github.com/en/developers/overview/secret-scanni...)
And by the way, this isn't a hypothetical nice-to-have. I run an API SaaS — and not one that's even very large, in relative terms. But my own customers' accidentally-leaked secrets have been scraped from their Github repos and used by botnets already! Signing up as a Github secret-scanning partner is on my to-do list.
It lets WeChat revoke tokens that GitHub finds in public repositories.
“GitHub will forward access tokens found in public repositories to Tencent WeChat, who will notify affected users.”
Here’s what I just copied from the blog post without modification:
> We have partnered with Tencent WeChat to scan for their tokens and help secure our mutual users on all public repositories and private repositories with GitHub Advanced Security.
It’s not just public repos, it’s private repos too.
However, this is already a well established and useful thing. When you publish your AWS (for example) secrets to your public repo, it will scan it and stop it leaking before damage can be done. This is just the same for another service.
If I was forced to pick one government to share my secrets with, it would be the Chinese, because there's nothing they can do about it. My own government and its allies is infinitely more dangerous to me than such a foreign one.
Are you talking about the China that bought huge areas in ports around the world? The same one that has secret police stations as well?
Unless you live or visit there. Wasn’t there reports of China having concentration camps?
They wouldn’t even need to learn all that much about you as an individual. Just enough to match you with a cluster from their own population that they have infinite data on.
What makes you so sure about that?
I worry about China because there’s no internal checks to prevent them from doing anything.
Western governments and allies have a long culture of court systems and thinking about balancing constituent needs. That is eroding and becoming more dangerous to the extent western leaders are envious of dictatorial powers and trying to emulate Chinese totalitarianism, but there is a lot of institutional and cultural bulwark against it.
Any powerful totalitarian country should worry people. People underestimate the level of covert aggression in all facets of foreign involvement in regimes with no internal accountability.
What makes you think this?
Yep totally harmless.
> China operating over 100 police stations across the world with the help of some host nations, report claims
[0]: https://docs.github.com/en/enterprise-cloud@latest/code-secu...
I assume that the regex is `TC:[a-z0-9]{20}` or something uninteresting like that.
So any string (which Github deems an access token) is forwarded to Tencent?
Or will Tencent share all their current access tokens with github?
.*
;-)That said, could one also generate tokens and essentially DDOS the wechat org by having them inform their customers unnecessarily?
Your wechat tokens, no that should never be public, hence why this feature exist?
That github reports that you leaked your wechat tokens, it was announced just recently, hence the post.
That github is giving wechat your secrets, not that is not what this is about although the article title would make you think that.
So technically the answer to GP is 'yes'.
Where are you seeing a privacy or security risk?
In any case, this announcement changes nothing. If you trusted GitHub with something before that you wouldn't trust them with now, your mental model is wrong. GitHub might allow any kind of partner (customer?) to scan their private or public repos in any way they want without making it public. In other words, if you are someone this announcement is problematic to, you shouldn't have anything on GitHub in the first place.
GitHub is a private company with one dual obligation, to prolong its existence and keep increasing its profit margin.
It is not any sort of arbiter for morality - morality being an externality to its central obligation - so it cannot be relief upon to “do the right thing”.
So it is not in any position of authority that would enable it to “approve”, in the moral sense of the word. They can only “allow” for the regex to be ran and the results sent off.
For example, the “right thing” for GH would be to increase profit, while for another entity might instead be to uphold its users’ privacy.
(You may think that it’s only for public repos, so they’re already made public, but isn’t GH here facilitating an aggressive collection and summation of information, that would otherwise be much more difficult and error-prone?)
The power of approval would rather come from an elected entity that would also determine who may request that such searches are executed, and which reasons would be valid.
Otherwise, we get a William Gibson-esque megacorp cyberspace future with clear but corporate Orwellian overtones.
Isn’t this obvious?
(I’m not being snarky at all - I’m genuinely asking: isn’t this glaringly and terrifyingly obvious?)
Well...
> Tencent
Here.
It's really the combination Tencent and Partnership that I find a problem. These things tend to lead to closer collaboration and WeChat is a huge surveillance tool.
Sure they have access to public info anyway because everyone does. Just let them scan it themselves then.
And yes I'd feel almost the same if it was Facebook.
Also, Github now has code recognise Tencent access tokens.
They already did. That's what public means. This is just an optimization to make it harder for WeChat access tokens to be inadvertently compromised without getting noticed.
If you're worried about the Chinese government having inappropriate influence over or access to various things outside China, that's in general a valid concern indeed, but facilitating credential scanning in public repositories really doesn't seem worrying.
For the last few years I've been running Git off my own servers with a cgit [0] frontend, and couldn't be happier.