One difference between how GitLab and GitHub run their infrastructure is that GitLab doesn't keep reflogs, and uses git's default "gc" settings.
As a result they won't have the data in question anymore in many cases[1]. Well, I don't 100% know that for sure, but it's the default configuration of their software, and I'm assuming they use like that themselves.
Whereas GitHub does keep reflogs, and runs "git repack" with the "--keep-unreachable" option. They don't usually delete git data unless someone bothers to manually do it, and they usually have data to reconstruct repositories as they were at any given point in time.
GitHub doesn't expose that to users in any way, although perhaps they'd take pity on some of their users after such an incident.
This isn't a critique of GitLab, just trivia about the storage trade-offs different major Git hosting sites have made, which might be informative to some other people.
I'm surprised no major Git hosting site has opted to provide such a "we have a snapshot of every version ever" feature. People would probably pay for it, you could even make them opt to pay for access to backups you kept already if they screwed things up :)
1. Well, maybe as disaster backups or something. But those are harder to access...
Certainly one should not ever keep using a secret once it has escaped into a Git repo, but I'm sure it happens quite frequently.
This should be a moot point because anyone (in IT) should realize that an accidentally committed secret is now 100% public for all eternity and needs to be rendered irrelevant to restore secure operations.
It also requires an attacker to know at least the partial SHA-1 anyway. It's infeasible to start brute-forcing that without being banned for dDoSing them, and if you know what the SHA-1 is you probably had access to the data already.
But yeah. It definitely creates security caveats peculiar to git, e.g. a hostile actor guessing that a force push in an IRC commit announcement clobbered secret data, and the accessing the old commit in the web UI.
Of course, automatic secret rotation is hard. Vault is a great help, but it can't be grafted onto everything. Good DevSecOps engineers are worth their weight in gold.
https://about.gitlab.com/handbook/engineering/infrastructure...
I know, back up everything at least twice. But still, when somebody loses one of your copies, they don’t get to say “it’s cool, no data was lost, you have other copies, right?”
I'm also on a team that runs an in-house enterprise GitLab instance for an S&P 100, so I have experience with it in that configuration, which I understand isn't different from what gitlab.com uses in this regard.
None of this is secret or some sort of insider knowledge. If you know how "git gc" works you can trivially observe most of the behavior of these hosting sites from the outside.
E.g. try pushing a commit and then view it at to git{hub,lab}.com/YOU/PROJECT/commit/SHA-1. Then "push --delete" the branch that references it.
You'll find that you can still view it on both sites, even if when you clone the relevant repository you won't get that SHA-1. This is because it's expensive to do a reachability check before serving up the content, and the web frontends access the object store directly.
Then if you e.g. keep making pushes sufficient to trigger a "gc --auto" and it's been longer than the relevant git "gc.Expire" time(s) you can deduce that the site uses something close to git's default "gc" semantics, or not. If you do this on GitHub.com you'll find you can access the data for longer than that, possibly "forever".
Which is actually a thing relevant to data recovery in this case. If those impacted by this security incident have lost their data, but have some of the SHA-1s involved (e.g. because they were pasted in IRC) they might find they can still view that content on gitlab.com if they were to browse it in the commit/tree/blob view, and painfully recover it that way. They won't be able to clone it since neither site turns on uploadpack.allowAnySHA1InWant=true.
As on GitLab, it's open source, so you can easily check that.
Well, links to orphaned commits still work, and GitHub has recently started surfacing UI when you force push a branch.
Does this also apply to self hosted GitLab CE/EE? Also how does Gogs/Gitea handle this?
Of course if you self-host you can simply change the defaults in /etc/gitconfig (which is in /opt/... if you're using the omnibus package).
circumventing this is very trivial for an attacker.
Besides, without monetisation, you're relying on the goodwill of a surprisingly small number of people. I like to call this "Postel decentralisation" - in the early days of the internet before IANA was the bureaucracy it is today, a lot of functions which people might naievely assume were decentralised were in fact done by hand by John Postel.
OSS is huge on HN, and a ton of HN users release OSS all the time. Yet, we all have bills to pay, and a lot of us look for ways to make money as well. Food and whatnot.
I'm not really sure what you're objecting to here? You make it sound like because a user talked about monetizing a feature to a hypothesized product that they're the same as a pharmaceutical company with life-needing medication forcing users to pay absurd amounts.
I agree that in certain scenarios how you monetize matters heavily. Yet, I can't help but feel that only applies to freedom and life-essentials. Things like basic internet access and medications.
But a git hosting service? In my view, you could open one and make it as colossally greedy as you like. It seems you disagree with this, can you voice your thoughts in more depth?
Thanks :)
Implementing such a feature would cost resources that someone would have to pay for. Storage costs would go up, it's not atypical that e.g. a repo that's 100MB on disk might be 1.5x or 2x that (or beyond) if you were keeping every version of every ref ever. Think e.g. accumulating throwaway topic branches with library imports you never ended up using.
So how do you pay for running such a thing, nevermind the initial development cost?
You could just make it "free", but then you'd need to roll the cost onto customers across the board. Or you could only enable such "backups" for opt-in paying customers, but most people aren't going to think to enable/pay for that, or think "I won't need this", until they day they do.
So wouldn't it be neat to have such a service on in the background, funded by high premiums to recover the data in case their backup version is your last option?
I've certainly permanently lost personal data by accident where I'd wished I could have paid hundreds of dollars to get back, nevermind someone for whom such a thing might be of critical business importance.
Think about it as being able to pay money after-the-fact to undo the car crash you just got into. With technology that becomes feasible in some cases, and in particular due to how git stores data & what people tend to store there it's relatively cheap compared to some other types of storage.
https://news.ycombinator.com/newsguidelines.html
(Nearly all such generalizations about HN users are just sample bias anyhow.)
For a long time GMT was a good reference point. Times have changed.
I used to work with a gentleman who would always schedule meetings on the phone as:
> Great, let's put that on the schedule for 2:00 o'clock Eastern Standard Time.
There was always a bit of officiousness to his tone and I think he just liked the idea of being precise.
And he certainly was precise. He was also off by an hour for half the year. Somehow no one ever missed a meeting, though.
I always sat on the other side of the room and ground my teeth.
It bugs me to no end when I have to select something like "-5:00 Eastern Time (US/Canada)" in those dialogs. I think a lot of people just don't care enough to truly understand time zones and there is enough flexibility in human communication to just absorb the endless ream of off-by-one-time-zone errors.
Whatever I assume, I might be off by an hour.
https://github.com/search?o=desc&q=1ES14c7qLb5CYhLMUekctxLgc...
If we dont receive your payment in the next 10 Days, we will make your code public or use them otherwise.
Too bad they don't make backups of users repositories?
https://about.gitlab.com/handbook/engineering/infrastructure...
Looks like someone was scraping for `.git/config`
A better defense-in-depth strategy would be to scan each public repo for credentials, and act accordingly when credentials are discovered in repos. We are working on this strategy, currently.
You could start with email warnings of suspicious activity and fine tune the model parameters based on feedback from false positives. But generally a login from a device that has no previous cookie, from an ASN the account has never used before, especially if that ASN is a known data center, that then immediately attempts a destructive action, should be a pretty big warning flag.
The original title is "Critical security announcement: Suspicious git activity detected".
We are updating our title to better reflect what happened.
Yeah, until I go to my computer and use "git push" again. No?
Also gitsbackup.com is registered but has no A/MX records so...
Gitlab should really note that in their blog posts and emails to users. Just in case someone is thinking of paying the ransom.
To me, it seems like a good measure would be to mark deleted repos as "delete requested" then notify the users involved and give them a week or two to undo a total delete. Especially if it is an older repo with lots of commits.
I took a good look at how my personal tokens were used in Github and Gitlab.
- Enable 2FA.
- Enable Commit signing with GPG. for the past 2-3 years, I have slowly moved to sign commits and tags. GPG keys take a log of hygiene to work with (sub keys, revocation, etc), but they definitely can help in a situation like.
Git is a distributed VCS. If you have a repo cloned in a secure location (your server, Dev machine, etc), that is just as good as your Gitlab/hub hosted copy.
Also, someone else noted the ransom email domain has no MX or A records, so the instructions to email them won't work. They seem to be hoping someone will blindly pay the ransom.
It still suggests Gitlab's infrastructure (internally) was compromised: "Suspicious git activity detected on Gitlab"
Something like "Gitlab users' repos held for ransom" seems more appropriate.
For those who might not be aware: It's possible to configure your .git config to push to different remotes.
What an idiotic strategy to take with git repositories. Every local copy is a complete and fully-functioning copy of not just the code, but all history, etc. It's a non-centralized protocol.