story

What will happen when you commit secrets to a public Git repo? (opens in new tab)

twitter.com

133 points0xad5y ago64 comments

64 comments

Cool experiment!

I PM the secret scanning team at GitHub and wanted to mention what GitHub did behind the scenes here. GitHub scans every commit to a public repo for secrets one of our secret scanning partners may have issued. We forward those candidate secrets to the issuing partner, and they take action. In some cases they auto-revoke the secret (AWS normally does this, I believe), in some cases they notify the user, and in some cases the response is configurable.

I checked that GitHub detects these tokens myself - within 1 second of the commit GitHub had notified AWS and Slack of the leak. AWS and Slack will then have taken action and informed the token owner, which in this case is Thinkst Canary, rather than Andrezj (the OP). I believe AWS normally auto-revoke, but they may have a custom setup with Thinkst Canary's tokens that allows Thinkst to continue to monitor them even once compromised.

Finally, GitHub actually delays the indexing of our search by a couple of seconds to ensure that, for normal cases, our secret scanning partners have time to take action before anyone else can find the tokens.

We're always looking to make secret scanning at GitHub better, so feedback as always welcome. It's also fascinating (and validating!) to see what happens to exposed tokens.

* List of GitHub secret scanning partners: https://docs.github.com/en/free-pro-team@latest/github/admin...

* Thnkst Canary tokens: https://www.canarytokens.org/

the_duke5y ago

Couldn't auto-revokation be used for a "DOS" attack of sorts by generating a lot of randomized tokens and pushing them to any repo?

I realize that the search space is huge for many tokens types, but it seems viable.

thdrdt5y ago

Selecting (reading) data is very fast most of the time. So if no token matches I don't think this will result in a DOS.

2 more replies

tobr5y ago

Why not refuse to publish a detected secret at all until the repo owner takes an action to allow it?

greysteil5y ago

There are a few considerations on that one, but one very practical reason is the developer experience of dealing with false positives.

False positives are one of the big problems in secret scanning. Some partners issue credentials with patterns that make them very hard to distinguish from innocuous strings. For example, a Datadog token looks identical to a commit SHA. We would never block developers from pushing commits to GitHub just because they had 40 character hexadecimal strings in them!

GitHub's partnership approach works around the false positive problem by having the token issuer check whether a token is real and take action only if it is. However, this is a one-way communication from GitHub - the token issuer doesn't need to tell us whether the candidate secret we sent them was real or not, and in most cases we never know.

As a result, we can't replicate the zero false positive experience in a pre-receive hook (i.e., before the commit is pushed to GitHub). There would also be performance considerations from making 30+ http requests as part of a pre-receive hook.

In future, we are looking at creating a pre-receive hook solution that focuses on patterns that have a very low false positive rate. There are already some open source solutions that do this (links below) - in fact the OP linked to one from his Twitter thread. If/when GitHub offer is, it will definitely be opt-in, rather than opt-out!

* https://github.com/thoughtworks/talisman/

* https://github.com/awslabs/git-secrets

1 more reply

watt5y ago

Think about it in terms of incentives and nudges.

0xadOP5y ago

Awesome, thanks for the background information!

dorfsmay5y ago

Do you also scan when a private repo is changed to public?

greysteil5y ago

I think so, and we 100% should do, but I just did a test and the secret I committed was still working a full minute after I converted the repo. Could be that the scan was in a queue, could be that it didn't trigger.

I'll dig into it and make sure this is working and is fast - it's a critical time to do a full scan of the repo's git history.

1 more reply

mackenzie-gg5y ago

GitGuardian scans on every event, this includes a public event (when a Repo is made public) and will alert if secrets are found within.

jakub_g5y ago

Why secret scanning is enabled only for public repos but not for private ones?

greysteil5y ago

Private repos need a different approach, but committing secrets to them can still be a problem.

If a secret is committed to a private repo then anyone with read access to that repo could use it. That might give those users more permissions than they're supposed to have. It's particularly a problem in large organisations, where thousands of developers may have access to a private repo, but should not necessarily have direct access to production infrastructure.

That said, the risk tradeoff when a secret is found in a private repo is different to when one is found in a public repo. If it's a personal private repo that no-one else has access to, the risk may be limited. If it's a corporate repo with hundreds of contributors, someone almost certainly wants to be aware of it. Even then, each organisation will want to respond in different ways, perhaps depending on who has access to the repo, and what access the leaked secret granted.

I'd be remiss not to say that GitHub has a beta offering for private repo secret scanning that we launched in May. It's a paid feature, targeted at large, security-conscious organisations, that scans your git history and each new commit for secrets and displays them in the GitHub UI.

1 more reply

Lex-20085y ago

Because it should be OK to commit secrets to private repos - that's why they're _private_, after all, right?

1 more reply

amelius5y ago

I suppose you could still XOR your secret S with a random bitstring B, then commit both S^B and B. Am I missing something?

1 more reply

notRobot5y ago

Just use a fucking blog, man. I'm so sick of threads like this.

Edit: I'm sorry, this came off as way more aggressive than I intended. I get why people use twitter to share stuff like this, but it's much harder to archive, find or reference in the future, not to mention it being much less readable than a simple webpage.

To anyone reading this, please consider publishing your findings on a blog as well as on twitter.

0xadOP5y ago

Hey, OP here. I agree that a blog post would be more readable. In this particular case I just didn't expect that it will catch fire. If I would then I would spend more time on the form. I won't make that mistake again (i.e. in the future I will use a blog post as main driver of such twitter thread).

viraptor5y ago

Don't worry, post wherever you want. Content existing somewhere is better than someone thinking it may take too much effort and not writing it in the first place. People may have their preferences about publishing platform, but telling someone off for not following that preference is not fair.

Thanks for writing about your experiment.

notRobot5y ago

Thank you!

sofixa5y ago

Or post it on Reddit or Medium or whatever if you can't be bothered with a blog.

Twitter "threads" need to die.

forgotmypw175y ago

Reddit and Medium are no better in terms of weight and complexity.

1 more reply

xeyownt5y ago

> Twitter "threads" need to die.

I would even remove "threads". Sick of all the hate and fakedom.

forgotmypw175y ago

You can use Nitter to make it a bit more readable and a bit less bloated. https://nitter.net/andrzejdyjak/status/1324360905237372929

thdrdt5y ago

It is amazing how fast and effective those bots are.

I remember one time I installed Windows 95/98. I wanted the PC to be on internet but did not have a firewall for Windows. But I knew the internet address where I could get one.

So after installing Windows I took my chances, connected to the internet, downloaded the firewall asap, installed it, and was already too late. The PC was compromised within 10 minutes and I had to reinstall it.

thih95y ago

Where did the malicious code come from?

thdrdt5y ago

Well not from the firewall because it was highly trusted software. With the firewall there were also no problems.

It was just that there were some holes in Windows that were exploited by bots.

forgotmypw175y ago

More readable version:

https://nitter.net/andrzejdyjak/status/1324360905237372929

jefftk5y ago

I think secret detection is great overall, but the only times I've run into it are false positives with client side API keys that are by their nature public.

For example, I recently configured something to use the Google calendar API from JavaScript on the client. It's fully safe to check in this key, since it is intended to be run in client-side JavaScript anyway, but I was still nagged about it.

mackenzie-gg5y ago

It's a difficult challenge. Secrets detection is probabilistic, without checking the credentials it's nearly impossible to determine, with 100% accuracy, a true vs a false positive. But it has made big improvements. What detection solutions have you been using?

jefftk5y ago

I get automated emails that I didn't sign up for from GitGuardian:

"GitGuardian has detected the following Google Key exposed within your GitHub account."

My understanding was that they could use an API to check whether it was a real key, but perhaps that doesn't say whether it is a client-side or server-side key?

rwmj5y ago

Is there a way (outside Github) that adversaries can get access to the "full feed" of commits? I don't understand how the attackers can find a new key from all the changes that must go into github across millions of repos, within 11 minutes.

oefrha5y ago

The firehose is simply the /events endpoint of GitHub API v3 off all public events. It’s delayed by 5 minutes. Anyone has access (subject to rate limits of course, which is 5000/hr when authenticated?). You can even have a look at the response in your browser, without any authentication: https://api.github.com/events

Docs:

https://developer.github.com/v3/activity/events/#list-public...

https://docs.github.com/en/free-pro-team@latest/rest/referen...

bostik5y ago

There are bots (some even run by security and threat intel companies) feeding off of the firehose. For a public display of one type of scanning functionality, take a look at shhgit[0,1].

0: https://www.shhgit.com/

1: https://github.com/eth0izzle/shhgit

rwmj5y ago

Is the firehose public or do these companies have a relationship with github? If the latter, I assume github doesn't give the firehose feed to attackers who are only looking for AWS keys.

2 more replies

halfjoking5y ago

You'll probably get an email from AWS that your account is compromised and you have 5 days to rotate your keys or your account could be terminated.

Then everyday they email you to see if you made any progress rotating the keys.

I made this meme about it that my boss didn't find funny.

https://imgur.com/ZCUu9rr

0xadOP5y ago

Yes you will, but only because GitHub already recognised this class of problems and came up with their own solution [1]. Bear in mind that it works only for vendors that integrated, so while it's true for AWS it might not be for your FOO API.

I giggled at meme.

[1] https://developer.github.com/partnerships/secret-scanning/

yyyk5y ago

In some cases, github will require you to remove the offending file from the commit history - or make the repo private.

e.g. https://github.com/aliostad/deep-learning-lang-detection

peterwwillis5y ago

This would make a good blog post. Maybe they should consider making one so we can have the article [in an easily readable/shareable/updateable form] after it's deleted from Twitter

0xadOP5y ago

OP here. I'm planning to do so, however it will require more work (better description of the problem, wider description of viable solutions, additional case studies). Most probably it will land on Medium and Dev.to.

remram5y ago

Couldn't we come up with a standard format for secret keys, that would make it obvious they are a secret and which service they're from? This would make scanners easier to implement, and would remove the requirement to partner with GitHub to get your key format supported.

AWS uses an `AKIA` prefix for access keys (but none for secrets), SendGrid uses an `SG.` prefix on API keys, etc.

0xadOP5y ago

Greetings fellow Hackers! OP here. I see that my experiment got some traction which means more awareness should be spread about this class of bugs.

For starters I recommend reading "How Bad Can It Git" [1] and "Detecting and Mitigating Secret-Key Leaks inSource Code Repositories" [2] papers.

After that you can read "How I made $10K in bug bounties from GitHub secret leaks" [3] and some notable reports on HackerOne Hacktivity [4] [5] and [6]. This last one is interesting - leaking secrets is not only about code repository! Actually it's about entire toolset used for software development, hence secret scanning could (should?) be performed for other places such as CICD logs or even Slack messages [7].

Anyhow, back to code repositories. GitHub and GitLab both recognized secrets as a problem, so they came up with solutions. If you use GitHub you can easily integrate GitGuardian [8] into your workflow ($$$) but even if you don't GitHub provides you with Secret Scanning feature [9] (both are mentioned within the Twitter and HN threads). If you use GitLab you have a Secret Detection feature [10] at your disposal BUT in order to use it you need to setup Auto DevOps (that's why in my experiment GitLab didn't alert me - I just pushed commits to my public repo but didn't setup anything).

Apart from built-in solutions provided by GitHub and GitLab, one can use tooling of their own choice. For this I'd recommend two types of solutions: proactive and reactive. For proactive security, as mentioned in the Twitter thread, you can use Talisman [11] as pre-commit hook. For reactive security you can use GitLeaks [12] (used by GitLab) or similar tools - there are many of them but one stands out, namely truffleHog [13] which can sniff each and every commit across all branches (also used by GitLab).

What if you already commited a secret into the public repository? Start with revoking and continue with this tutorial [14]

gl, hf.

[1] https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git... [2] https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leak... [3] https://tillsongalloway.com/finding-sensitive-information-on... [4] https://hackerone.com/reports/716292 [5] https://hackerone.com/reports/396467 [6] https://hackerone.com/reports/496937 [7] https://github.com/PaperMtn/slack-watchman [8] https://www.gitguardian.com/ [9] https://developer.github.com/partnerships/secret-scanning/ [10] https://docs.gitlab.com/ee/user/application_security/sast/#s... [11] https://github.com/thoughtworks/talisman [12] https://github.com/zricethezav/gitleaks [13] https://github.com/dxa4481/truffleHog [14] https://docs.github.com/en/free-pro-team@latest/github/authe...

weejewel5y ago

Thanks for sharing. Did you also investigate what they actually did with the keys?

0xadOP5y ago

You mean adversaries? No. For token generation I used https://canarytokens.org/ so the only information I got was abou triggering the token, but not the context in which it was triggered.

BTW. GitHub (apart from GitGuardian) also has Secret Scanning feature [1] that basically allows the provider to act on the leaked secret. Amazon is integrated and it should invalidate and inform the owner but this also went to Thinkst, not me, so I don't know if it was actually invalidated and alerted.

[1] https://developer.github.com/partnerships/secret-scanning/

_the_inflator5y ago

Nice honey pot experiment.

0xadOP5y ago

Thanks!

j / k navigate · click thread line to collapse

64 comments

greysteil5y ago

Cool experiment!

We're always looking to make secret scanning at GitHub better, so feedback as always welcome. It's also fascinating (and validating!) to see what happens to exposed tokens.

* List of GitHub secret scanning partners: https://docs.github.com/en/free-pro-team@latest/github/admin...

* Thnkst Canary tokens: https://www.canarytokens.org/

the_duke5y ago

Couldn't auto-revokation be used for a "DOS" attack of sorts by generating a lot of randomized tokens and pushing them to any repo?

I realize that the search space is huge for many tokens types, but it seems viable.

thdrdt5y ago

Selecting (reading) data is very fast most of the time. So if no token matches I don't think this will result in a DOS.

2 more replies

tobr5y ago

Why not refuse to publish a detected secret at all until the repo owner takes an action to allow it?

greysteil5y ago

There are a few considerations on that one, but one very practical reason is the developer experience of dealing with false positives.

* https://github.com/thoughtworks/talisman/

* https://github.com/awslabs/git-secrets

1 more reply

watt5y ago

Think about it in terms of incentives and nudges.

0xadOP5y ago

Awesome, thanks for the background information!

dorfsmay5y ago

Do you also scan when a private repo is changed to public?

greysteil5y ago

I'll dig into it and make sure this is working and is fast - it's a critical time to do a full scan of the repo's git history.

1 more reply

mackenzie-gg5y ago

GitGuardian scans on every event, this includes a public event (when a Repo is made public) and will alert if secrets are found within.

jakub_g5y ago

Why secret scanning is enabled only for public repos but not for private ones?

greysteil5y ago

Private repos need a different approach, but committing secrets to them can still be a problem.

1 more reply

Lex-20085y ago

Because it should be OK to commit secrets to private repos - that's why they're _private_, after all, right?

1 more reply

amelius5y ago

I suppose you could still XOR your secret S with a random bitstring B, then commit both S^B and B. Am I missing something?

1 more reply

notRobot5y ago

Just use a fucking blog, man. I'm so sick of threads like this.

To anyone reading this, please consider publishing your findings on a blog as well as on twitter.

0xadOP5y ago

viraptor5y ago

Thanks for writing about your experiment.

notRobot5y ago

Thank you!

sofixa5y ago

Or post it on Reddit or Medium or whatever if you can't be bothered with a blog.

Twitter "threads" need to die.

forgotmypw175y ago

Reddit and Medium are no better in terms of weight and complexity.

1 more reply

xeyownt5y ago

> Twitter "threads" need to die.

I would even remove "threads". Sick of all the hate and fakedom.

forgotmypw175y ago

You can use Nitter to make it a bit more readable and a bit less bloated. https://nitter.net/andrzejdyjak/status/1324360905237372929

thdrdt5y ago

It is amazing how fast and effective those bots are.

I remember one time I installed Windows 95/98. I wanted the PC to be on internet but did not have a firewall for Windows. But I knew the internet address where I could get one.

thih95y ago

Where did the malicious code come from?

thdrdt5y ago

Well not from the firewall because it was highly trusted software. With the firewall there were also no problems.

It was just that there were some holes in Windows that were exploited by bots.

forgotmypw175y ago