I PM the secret scanning team at GitHub and wanted to mention what GitHub did behind the scenes here. GitHub scans every commit to a public repo for secrets one of our secret scanning partners may have issued. We forward those candidate secrets to the issuing partner, and they take action. In some cases they auto-revoke the secret (AWS normally does this, I believe), in some cases they notify the user, and in some cases the response is configurable.
I checked that GitHub detects these tokens myself - within 1 second of the commit GitHub had notified AWS and Slack of the leak. AWS and Slack will then have taken action and informed the token owner, which in this case is Thinkst Canary, rather than Andrezj (the OP). I believe AWS normally auto-revoke, but they may have a custom setup with Thinkst Canary's tokens that allows Thinkst to continue to monitor them even once compromised.
Finally, GitHub actually delays the indexing of our search by a couple of seconds to ensure that, for normal cases, our secret scanning partners have time to take action before anyone else can find the tokens.
We're always looking to make secret scanning at GitHub better, so feedback as always welcome. It's also fascinating (and validating!) to see what happens to exposed tokens.
* List of GitHub secret scanning partners: https://docs.github.com/en/free-pro-team@latest/github/admin...
* Thnkst Canary tokens: https://www.canarytokens.org/
I realize that the search space is huge for many tokens types, but it seems viable.
False positives are one of the big problems in secret scanning. Some partners issue credentials with patterns that make them very hard to distinguish from innocuous strings. For example, a Datadog token looks identical to a commit SHA. We would never block developers from pushing commits to GitHub just because they had 40 character hexadecimal strings in them!
GitHub's partnership approach works around the false positive problem by having the token issuer check whether a token is real and take action only if it is. However, this is a one-way communication from GitHub - the token issuer doesn't need to tell us whether the candidate secret we sent them was real or not, and in most cases we never know.
As a result, we can't replicate the zero false positive experience in a pre-receive hook (i.e., before the commit is pushed to GitHub). There would also be performance considerations from making 30+ http requests as part of a pre-receive hook.
In future, we are looking at creating a pre-receive hook solution that focuses on patterns that have a very low false positive rate. There are already some open source solutions that do this (links below) - in fact the OP linked to one from his Twitter thread. If/when GitHub offer is, it will definitely be opt-in, rather than opt-out!
I'll dig into it and make sure this is working and is fast - it's a critical time to do a full scan of the repo's git history.
If a secret is committed to a private repo then anyone with read access to that repo could use it. That might give those users more permissions than they're supposed to have. It's particularly a problem in large organisations, where thousands of developers may have access to a private repo, but should not necessarily have direct access to production infrastructure.
That said, the risk tradeoff when a secret is found in a private repo is different to when one is found in a public repo. If it's a personal private repo that no-one else has access to, the risk may be limited. If it's a corporate repo with hundreds of contributors, someone almost certainly wants to be aware of it. Even then, each organisation will want to respond in different ways, perhaps depending on who has access to the repo, and what access the leaked secret granted.
I'd be remiss not to say that GitHub has a beta offering for private repo secret scanning that we launched in May. It's a paid feature, targeted at large, security-conscious organisations, that scans your git history and each new commit for secrets and displays them in the GitHub UI.
Edit: I'm sorry, this came off as way more aggressive than I intended. I get why people use twitter to share stuff like this, but it's much harder to archive, find or reference in the future, not to mention it being much less readable than a simple webpage.
To anyone reading this, please consider publishing your findings on a blog as well as on twitter.
Thanks for writing about your experiment.
Twitter "threads" need to die.
I would even remove "threads". Sick of all the hate and fakedom.
I remember one time I installed Windows 95/98. I wanted the PC to be on internet but did not have a firewall for Windows. But I knew the internet address where I could get one.
So after installing Windows I took my chances, connected to the internet, downloaded the firewall asap, installed it, and was already too late. The PC was compromised within 10 minutes and I had to reinstall it.
For example, I recently configured something to use the Google calendar API from JavaScript on the client. It's fully safe to check in this key, since it is intended to be run in client-side JavaScript anyway, but I was still nagged about it.
"GitGuardian has detected the following Google Key exposed within your GitHub account."
My understanding was that they could use an API to check whether it was a real key, but perhaps that doesn't say whether it is a client-side or server-side key?
Docs:
https://developer.github.com/v3/activity/events/#list-public...
https://docs.github.com/en/free-pro-team@latest/rest/referen...
Then everyday they email you to see if you made any progress rotating the keys.
I made this meme about it that my boss didn't find funny.
I giggled at meme.
[1] https://developer.github.com/partnerships/secret-scanning/
e.g. https://github.com/aliostad/deep-learning-lang-detection
AWS uses an `AKIA` prefix for access keys (but none for secrets), SendGrid uses an `SG.` prefix on API keys, etc.
For starters I recommend reading "How Bad Can It Git" [1] and "Detecting and Mitigating Secret-Key Leaks inSource Code Repositories" [2] papers.
After that you can read "How I made $10K in bug bounties from GitHub secret leaks" [3] and some notable reports on HackerOne Hacktivity [4] [5] and [6]. This last one is interesting - leaking secrets is not only about code repository! Actually it's about entire toolset used for software development, hence secret scanning could (should?) be performed for other places such as CICD logs or even Slack messages [7].
Anyhow, back to code repositories. GitHub and GitLab both recognized secrets as a problem, so they came up with solutions. If you use GitHub you can easily integrate GitGuardian [8] into your workflow ($$$) but even if you don't GitHub provides you with Secret Scanning feature [9] (both are mentioned within the Twitter and HN threads). If you use GitLab you have a Secret Detection feature [10] at your disposal BUT in order to use it you need to setup Auto DevOps (that's why in my experiment GitLab didn't alert me - I just pushed commits to my public repo but didn't setup anything).
Apart from built-in solutions provided by GitHub and GitLab, one can use tooling of their own choice. For this I'd recommend two types of solutions: proactive and reactive. For proactive security, as mentioned in the Twitter thread, you can use Talisman [11] as pre-commit hook. For reactive security you can use GitLeaks [12] (used by GitLab) or similar tools - there are many of them but one stands out, namely truffleHog [13] which can sniff each and every commit across all branches (also used by GitLab).
What if you already commited a secret into the public repository? Start with revoking and continue with this tutorial [14]
gl, hf.
[1] https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git... [2] https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leak... [3] https://tillsongalloway.com/finding-sensitive-information-on... [4] https://hackerone.com/reports/716292 [5] https://hackerone.com/reports/396467 [6] https://hackerone.com/reports/496937 [7] https://github.com/PaperMtn/slack-watchman [8] https://www.gitguardian.com/ [9] https://developer.github.com/partnerships/secret-scanning/ [10] https://docs.gitlab.com/ee/user/application_security/sast/#s... [11] https://github.com/thoughtworks/talisman [12] https://github.com/zricethezav/gitleaks [13] https://github.com/dxa4481/truffleHog [14] https://docs.github.com/en/free-pro-team@latest/github/authe...
BTW. GitHub (apart from GitGuardian) also has Secret Scanning feature [1] that basically allows the provider to act on the leaked secret. Amazon is integrated and it should invalidate and inform the owner but this also went to Thinkst, not me, so I don't know if it was actually invalidated and alerted.
[1] https://developer.github.com/partnerships/secret-scanning/