GitHub incident 2022-03-23 (opens in new tab)

(githubstatus.com)

273 pointstpaksoy4y ago198 comments

198 comments

121 comments · 22 top-level

nimbius4y ago· 55 in thread

https://www.githubstatus.com/history

21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive.

edgyquant4y ago

I’m not sure at what organization that is true. My company lives out of GitHub and Jira and I’ve hardly noticed the three month surge. GitHub would have to do a lot worse to get many companies to want to host their own services. This is the argument people have said about the cloud from day one.

People want to know it isn’t their problem, that makes cloud computing (and things like GitHub) worth their weight in gold. I have real problems to solve I don’t want to deal with a git repo manager on top of that.

thaeli4y ago

Also, looking at this it seems like GitHub isn't doing the common SaaS thing of just lying on their status page. Many providers, both internal and external, would look a lot worse if they had honest status pages.

3 more replies

wnevets4y ago

> I’ve hardly noticed the three month surge.

This has been my experience as well. I don't know if that means GitHub is being overly transparent about issues or I've just been lucky but I would hate if people punished services for being transparent and informative on their status pages.

1 more reply

SkyPuncher4y ago

> I’m not sure at what organization that is true. My company lives out of GitHub and Jira and I’ve hardly noticed the three month surge

These have been minor inconveniences for us - at worst. Most of the time it simply means people jump to something else then come back later in the day.

Failing tests and PR feedback cycles are more of a blocker to our team than these outages.

gjulianm4y ago

At my organization it's always been true. Setting up GitLab is fairly easy, in my company we do it and it's cheap (on-prem hosting is basically zero, and we had the IPs/domains already) and it hasn't given us too many headaches. I think last time I had to do something was maybe a few months ago when I restarted it so that it picked up the updated SSL certificate.

3 more replies

jeltz4y ago

Maybe you are in a different time zone because our organization certainly noticed and was disrupted by this.

1 more reply

jhugo4y ago

I think the impact was for some reason not consistent between users (maybe due to geographical factors or maybe sharding of accounts?). We're in Asia and I think we've had three different days recently where we couldn't actually get much work done due to GitHub being flakey or down for the entire day and our CI/CD and development processes being built around it. We ended up moving off GitHub onto a self-hosted system, which took about a day of work for one engineer (CI/CD itself was already self-hosted, so just Git, issues and PRs), and there have already been two more GitHub outages since then.

jmartens4y ago

My company monitors the functionality, performance and availability of apps like Github, and we have certainly noticed the increase in issues lately.

1 more reply

ManWith2Plans4y ago

I will say that for us this is a huge deal. We're a devops services company, and our customers expect their deployment pipelines to work. This is becoming a huge pain-point for a few of our customers and we recommended Github Actions to them. A couple of our customers want us to move away from GitHub actions because of how disruptive outages have been.

jacobr4y ago

20 PRs waiting in line for half a day to be merged is pretty annoying. We’ve had that on multiple occasions the last few weeks due to GitHub incidents.

mrkurt4y ago

If you want companies to be honest on their status pages (I do!), you can't just count incidents like that. Status pages can be an amazing place to communicate all kinds of problems.

Most issues have a relatively narrow impact, but the impacted people _still_ benefit from seeing them listed.

jmartens4y ago

How can we solve this as customers, or push the vendor to do better?

2 more replies

gwbas1c4y ago

> At this rate the benefits of running your own gitea or gitlab are starting to become competitive

When you host things yourself, you still have downtime. And, having worked with Github for over a decade, the actual disruption to my work is from downtime is much less than if I had to host my own.

That being said: I briefly worked for a company that hosted its own source code control system. For us, as a small team, it wasn't worth it. The system was outdated and hosted in an insecure manner. No one ever did any "admin" work except the founder. He ran it because he had irrational fears of switching, not because of any tangible advantages over Github (and competitors.)

Keep in mind that Github (and competitors) are often cheaper than the time needed to invest in hosting your own. (Estimate 10-20 hours a year of invested time. Calculate your hourly rate. Github and competitors are cheaper.) In order to come ahead, you need tangible benefits other than "I think I can have less downtime."

megous4y ago

Dunno, I got blocked from my work SaaS hosted gitlab for about a month by cloudflare. Nobody at gitlab or cf helped. I only figured it myself after about 4 hours of research, that it was caused by some disabled (by me years ago) web tracking APIs no-one should have hard dependence on.

I certainly would not have this problem on self hosted instance, because it would not be behind CF. I'm sure I'd have other problems though. :)

All software is crap. You can be either spending time fixing it yourself, or spending time begging online for fixes/help from some SaaS company/community with resolution time in months, somtimes, all that while you may not be able to use it fully.

Also with SaaS it will be constantly shifting under you. Things will be moved around, restyled, iconized, popupized, etc. This doesn't help productivity either. With self-hosting, you can at least avoid upgrading, if you dislike this kind of thing. Or choose FOSS software that values UX permanency/stability, which seems to be really hard ask from SaaS business.

belter4y ago

Excluding ones reported as [Errors], [Scheduled] or [Notifications]

2019 -> 39 Incidents

2020 -> 67 Incidents

2021 -> 86 Incidents

2022 -> 20 Incidents so far

Edit: Using Linear Regression...Prediction for total end 2022: 111 Incidents.

omoikane4y ago

I wondered if those error rates were proportional to Github's growth over time, so I looked it up. It seems that they have 40M users in 2019[1] and 73M users in 2021[2], which translates to 0.975 incidents per million users per year in 2019 compared to 1.178 in 2021.

So perhaps they are not exactly improving, but maybe there is some other way to normalize the data.

[1] https://github.blog/2019-11-06-the-state-of-the-octoverse-20...

[2] https://octoverse.github.com/

mrkramer4y ago

One would thought when they got acquired by Microsoft that the number of incidents would go down considering all resources Microsoft would provide but no.

1 more reply

mbesto4y ago

The number of incidents isn't so much of a problem as the amount of downtime is. That would be more interesting to see.

1 more reply

ejb9994y ago

thats not the kind of progression you like to see - that is, error rates increasing over time instead of decreasing.

2 more replies

antiquark4y ago

Based on the same interpolation, github will reach one incident per day by 2032.

TheRealPomax4y ago

But how many of those actually affected you? For example, no amount of issues around codespaces or github packages would impact my professional use of github, so whether there are 21 or 5000 or those parts get permanently taken offline makes no difference in what I need out of the platform.

How many core incidents? The part that affects whether you can even push to and pull from a repo, and access issues and PRs? Because everything else is nice to have, but you can do work perfectly fine without them if they go down for a few hours.

jeltz4y ago

I was affected by the one last week, the one yesterday and the one today. The one today was harmless but the other two disrupted out work. All three were "core incidents", but the one today felt shorter.

CanSpice4y ago

Yesterday's affected me, I couldn't pull or push and when I tried to look at the repo to do PRs I got 500 errors. That only lasted maybe 30 minutes though.

sitzkrieg4y ago

i could not even sso login so it was a bit more impactful than it sounds on the paper

1 more reply

Jiejeing4y ago

If you are a closed org, that is. Running your own gitea or gitlab with registration enabled and having to deal with spam is a real hurdle.

julianlam4y ago

Is it not possible to restrict access to the git server from a VPN server only?

Just off the top of my head, that's one thing you can do.

2 more replies

ironmagma4y ago

We run Gitea at my company. In fact, we forked it. It could reeeaaaalllly use a rewrite. If anyone is even mildly ambitious about creating a new alternative to Github/Gitea, it's a great time to do that.

KronisLV4y ago

Another self-hosted project in the space that i've seen was GitBucket, although it runs on the JVM (not necessarily a bad thing, just different from Go): https://gitbucket.github.io/

mynameismon4y ago

You might be interested in sourcehut: https://sr.ht

devwastaken4y ago

And whom pays for fixing it? Downtimes of self hosted systems using external software can be far longer. GitHub, unlike Amazon and friends, doesn't lie about their downtime. Every saas has hundreds of downtime instances across the board every month. Some are small enough you don't see them. Yet the services still work exceptionally well - and when they don't they get fixed in a quick manner. What takes them an hour would take most private orgs a day.

AlexandrB4y ago

> GitHub, unlike Amazon and friends, doesn't lie about their downtime.

Are you kidding? The last 2 incidents were called "degraded performance". Where "degraded" meant I would get nothing but 500 errors accessing GitHub.com either via browser or git itself for the duration of the outage. How is this not lying?

everfrustrated4y ago

GitHub is notorious for only noticing outages once the USA morning starts.

If you're using GitHub in Europe or Asia it's not uncommon for GitHub to be offline for many hours before they acknowledge anying.

rvz4y ago

Well, I think I have said that since 2020 [0] and it is self-evident that you are better off self-hosting your own Git repo. If you can host a website you can do it. If GNOME, ReactOS, Wireguard, Linux Kernel Project, Mozilla, etc can do it, so can you. Or even use it as a backup / failsafe just in case.

But going 'all in' on GitHub just doesn't make any sense anymore.

[0] https://hn.algolia.com/?dateRange=all&page=1&prefix=true&que...

Someone4y ago

But who can host a website? I would be wary of hosting something that isn’t a 100% static site, out of fear of the amount of attention maintenance would take.

Also, quite a few of the non-profits behind the projects you mentioned have multi-million dollar budgets that they can use to administer their git instance, if needed. I don’t think “if they can do it, you can” is a strong argument for those.

2 more replies

rglullis4y ago

It's almost like people forget that git is a Distributed Version Control System, after all...

1 more reply

mhh__4y ago

The company I work for has a bunch of non-programmers using and working in gitlab (or "the git"), I can't really see it happening with GitHub regardless of where it was hosted.

Gitlab just seems better for actually running a software project.

dbrgn4y ago

Does Gitea support some kind of federation / cross-instance PRs? That's the main thing I'd miss from a self-hosted instance, the ease of getting contributions.

After all, you don't even need Gitea for pure Git hosting. If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.

The reason I'm still using GitHub is not code hosting. It's collaboration.

dbrgn4y ago

It seems there's a tracking issue here, but it seems stalled: https://github.com/go-gitea/gitea/issues/1612

tokumei4y ago

> If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.

Used to do that years ago for my personal projects. Honestly does the trick.

brimble4y ago

Gitea gets you: a nice GitHub-like web GUI, including for stuff like managing users; 2FA; some integrations; web hooks without having to add git-hooks to all your repos; and extremely-useful-to-some-projects features like git-lfs support.

If you don't want or need those things, bare git repos are fine and certainly easier to support (not that Gitea's that hard, though a few issues/PRs I've noticed have caused me more than a little concern about the overall quality of the project).

encryptluks24y ago

But by using GitHub for "collaboration" you are sacrificing decentralization.

1 more reply

Melatonic4y ago

The marketing for building your own new "private cloud" will begin soon I am sure :-D

pid-14y ago

You just made a few Openstack consultants raise from their graves.

dijonman24y ago

GitHub enterprise is amazing, but I agree that a centrally hosted Git instance of any variety is a liability.

With the advent of the Okta breach I think we will see a reverse in the centralization trend.

ransom15384y ago

"21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive."

Oh stop the drama. Fine. Setup your gitlab.

adamsmith1434y ago

Is it really though? Are engineers committing so frequently that they can't make it through a few hours without Github?

imiric4y ago

GitHub doesn't just host Git repositories. It's the central location for discussions, issues, code reviews, milestone planning, and any CI process like testing or releases. If it's unavailable whole teams can be interrupted.

Git is distributed. GitHub is very much not.

lallysingh4y ago

It depends on how many engineers you have! But also, there are plenty of other functions in GH besides raw git, like Wiki/PR/Issues/test/deploy pipelines, etc. It can become pretty critical.

jeffwask4y ago

Yes

queuebert4y ago

Maybe they measure performance in git commits.

duped4y ago

An outage of a few hours can tank a release deadline for me, so yes.

Chris20484y ago

> running your own

assuming that would be flawless, which it wouldn't

sonicggg4y ago

Come on, don't be so dramatic. This is not a 911 call center, people will survive these minor outages.

mdellavo4y ago

sure but three days in a row?

chockchocschoir4y ago

> At this rate the benefits of running your own gitea or gitlab are starting to become competitive

No need, just use Codeberg.org instead. They run Gitea and is a free collaboration platform (+ git hosting) for free projects. FOSS/OSS should really consider alternatives to GitHub and GitLab, especially when there are much more FOSS/OSS friendly platforms around.

blueplanet2004y ago· 15 in thread

I hope they figure out what’s going on every morning. Heard from inside they don’t know why the db dies everyday but restarting it fixes it.

exikyut4y ago

What's "the db"? It sounds like something of small to medium scale if you can just restart it like that.

In any case, why not just relocate some vendor engineers on site for a bit? Or, better, why does the vendor not have a small presence in the corner?

Sounds like whatever "the db" is it's probably some (objectively) small but very scary thing that's currently on fire and people are trying to figure out how to put it out without crashing the plane and also making too many waves internally, which is probably even harder. So asking about making vendor noises is (as useful as it may be) probably going down the wrong path - in much the same way this is probably not related to the outages (it may well be, but from the outside it's all coincidence anyway).

fundmondawyaya4y ago

Cock crows. DB crashed.

Systemctl restart

mysqld

(Or mariadb, if you pronounce "SQL" as "sequel")

yebyen4y ago

Sounds like it was a MySQL database:

https://github.blog/2022-03-23-an-update-on-recent-service-d...

shepardrtc4y ago

IIS Server had/has a memory leak in worker threads that many years ago always forced us to restart the server every few days. Starting in 6.0, they added worker thread recycling and made it a mandatory to choose a time period for every thread to be recycled. Why fix the error when you can just restart the service?

djbusby4y ago

Apache prefork had that since forever. Seems just a garbage collect type pattern.

2 more replies

whimsicalism4y ago

I doubt they use IIS

1 more reply

cube004y ago

Break out the early morning restart cron job.

gaoshan4y ago

Here you go, Github:

0 4 * * * /etc/init.d/postgresql restart

I'll take an architect position as compensation, but only if there is equity.

1 more reply

Kostic4y ago

Early morning in which timezone?

2 more replies

MuffinFlavored4y ago

How long does restarting it take?

raffraffraff4y ago

Yuck. Honestly, restarting a database to fix a major outage sounds like "we have no idea what we're doing"

blueplanet2004y ago

It sounds like "they don't know why it's going down." I've worked with plenty super competent people that have taken time to root cause incidents.

Guide to incidents: Step 1: Stop the bleeding Step 2: Prevent it in the future

Doing Step 1 doesn't make you incompetent.

1 more reply

bpicolo4y ago

Sporadic database performance issues can certainly make you feel that way. They are definitely not trivially debugged at scale

vimda4y ago

Would you rather it stay down while they spend a day debugging it?

2 more replies

karmakaze4y ago

They could use multiple writer hosts and rollover the restarts. MySQL has had GTIDs since 5.6 and replication groups rather than writer-replicas since some 5.7.x version.

iBotPeaches4y ago· 6 in thread

It seems like we haven't had a non-robot status update on the status page in days since this what seems like daily occurrence. I figure at this point we'd get something of why this is happening.

I also don't appreciate our builds freezing, unable to be cancelled and then eating up hundreds of minutes.

lucasyvas4y ago

Billing should always be built on a "ping" IMO and not start/stop hooks. The latter is shockingly bad for customers during times of unreliability. The former sounds stupid and requires more infrastructure from the one offering the service, but I think it's more fair.

I haven't used GA in a way where it actually costed me anything, but having minutes just tick away while you can't do anything is really stupid if that's the case.

Edit: Another sane solution would probably be to record outage periods and have Billing automatically reconcile for every customer when invoicing. This would require them to admit the outage durations however, so it may be flawed from a human perspective.

drusepth4y ago

The "ping" solution is an interesting one that I haven't seen proposed before.

At what rate would you do these pings? I don't know how upgrading/downgrading works at GitHub but if they do any sort of refund/credit when you downgrade, it seems like there's some interesting implications for abusing the system (e.g. upgrading/downgrading between pings for "free" service if the time between them is too long) versus performance (e.g. how do you update all users per ping in a timely manner if the time between them is too short?).

Would love to read up more on this approach; seems interesting!

MattIPv44y ago

> I figure at this point we'd get something of why this is happening.

I've created a new discussion in their feedback repo asking for this, three major outages in a week could really do with a post-mortem: https://github.com/github/feedback/discussions/13344

mhitza4y ago

I suggest you add the timeout-minute property on the job/step, so even if the web interface isn't responsive the job times out eventually. Saves you from spending time emailing support about consumed minutes.

Of course, assuming that a future bug won't affect the timeout-minute itself.

easton4y ago

Do they give you the minutes back if there's an incident during the period where a job is running?

no_wizard4y ago

You will have to contact them for them to credit you, that's what we did

1 more reply

jonnybarnes4y ago· 4 in thread

2nd day in a row isn’t it?

stepri4y ago

And 6 days ago: https://news.ycombinator.com/item?id=30711269

momothereal4y ago

Yes: https://news.ycombinator.com/item?id=30767635

rvz4y ago

It is. 24 hours later [0] and I only expected it to happen once every month. Looks like it is getting worse.

Oh dear. Not a good idea to go 'all in' on GitHub.

[0] https://news.ycombinator.com/item?id=30767821

fishnchips4y ago

Yesterday they had two.

etimberg4y ago· 4 in thread

The quality of GH seems to be slipping

Trasmatta4y ago

I've actually been pretty impressed with the quality of the product and new features over the past couple of years, but it seems to be having a lot of stability issues recently.

etimberg4y ago

I've liked the new features too, especially after so many years of not many features. Maybe they've moved too fast now

xtracto4y ago

Funny that it happened since they were acquired by Microsoft... reminds me of Hotmail, Skype, LinkedIn, Rare, among several others.

2 more replies

amelius4y ago

I hope it doesn't affect security ...

frjalex4y ago· 3 in thread

Looking at the "GitHub" prefix in the title, I was half-expecting this to point to a report explaining the outage a week ago... But rest assured, it is a new outage!

teekert4y ago

Oh I thought it was about the one from yesterday :)

aaaaaaaaata4y ago

Are their CI/CD toys that shiny that people still willingly choose them even with all the issues?

I find myself regularly asking this — about every major SaaS used for critical ops stuff like this.

1 more reply

annexrichmond4y ago

I thought it was going to be a Postmortem. I couldn't have been more wrong!

mfashby4y ago· 2 in thread

I'm inclined to look at tools like fossil again, for it's distributed issue tracking and wiki capability

https://fossil-scm.org/home/doc/trunk/www/index.wiki

JonChesterfield4y ago

Fossil is faultless for a team size of one. I've been using it for nearly a decade, doing totally non-optimal things like using versions released years apart on different OSs with the same database. I also ctrl-c it when I spot a typo in a commit message and check in binaries. Never missed a beat.

As headcount goes up I think the inability to locally rewrite history into easily reviewable patches would be sorely missed. So it's git for team stuff and fossil for my own.

edgyquant4y ago

I had forgotten about that, thanks!

koolba4y ago· 2 in thread

I really wish they would add the word “outage” to these titles.

“Incident” alone makes me think something got hacked or leaked.

arez4y ago

That's SRE lingo --> https://sre.google/sre-book/managing-incidents/

zufallsheld4y ago

It's also itil lingo, which predates sre.

1 more reply

okareaman4y ago· 2 in thread

What's the difference between GitHub and GrubHub?

GrubHub delivers

darknavi4y ago

Watching Lion King as a youth I always though grubs looked delicious.

Little did I know...

jadbox4y ago

First HN comment that ever made me laugh, well done.

Xarodon4y ago· 2 in thread

This has been a pretty rough week for GitHub

stuff4ben4y ago

Github Enterprise hasn't been faring too well at my work either this week. When you work on both open and closed source products and GH and GHE are both down, it leads to a very unproductive week.

jrowley4y ago

Does GitHub enterprise result in dedicated instance or any better availability?

2 more replies

mirekrusin4y ago· 2 in thread

What's the best crowdsourced status monitor?

eckza4y ago

https://outage.bingo/

mirekrusin4y ago

+1 :)

mirekrusin4y ago· 1 in thread

Status page says only degraded performance.

It's a nice way of putting it.

I'm trying to run github action for couple of hours now. They don't work at all. But apparently this means they run, but in infinite time, hence == degraded performance, nice.

raffraffraff4y ago

It's just a way to avoid SLA breaches. "Of course it wasn't down! It was just infinitely slow!"

bob10294y ago· 1 in thread

We are scheduling a call with an enterprise sales person next week.

If I can get all the Github features I had as of ~2020, but on an instance that wont get hit by the public cloud/update bus, I would be exceptionally happy.

The only complaints we have are regarding availability. If we can fix that one problem, this is a perfect product in our view.

andruby4y ago

How do you evalute running your own gitlab instance?

cube22224y ago

Looks like they really want to get a PR deployed, but there's still not enough duct tape on it.

higeorge134y ago

The usual services (actions) again down around the same time. This is embarrassing.

intunderflow4y ago

With how often these happen we might as well sticky this thread for the next one

rvz4y ago

Again? Last time that happened was 24 hours ago? [0] It is really getting unreliably bad. Like I said before, having a self-hosted backup seems to make more sense.

[0] https://news.ycombinator.com/item?id=30767821

einpoklum4y ago

The page at the link is not much more informative than the link itself :-(

grumple4y ago

Again?! Jeez. I wish I had customers this tolerant.

max23_4y ago

Looks like the same services that were affected in yesterday incident.

eatonphil4y ago

Github Actions are back for me now.

toastal4y ago

And to think Git can easily be decentralized. I wonder if the community could fork GitHub to fix it. Oh, it's not open source. Devs must be too busy working on more 'social' features like "For You (Beta)" to milk the attention economy.

j / k navigate · click thread line to collapse

198 comments

121 comments · 22 top-level

nimbius4y ago· 55 in thread

https://www.githubstatus.com/history

21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive.

edgyquant4y ago

thaeli4y ago

3 more replies

wnevets4y ago

> I’ve hardly noticed the three month surge.

1 more reply

SkyPuncher4y ago

> I’m not sure at what organization that is true. My company lives out of GitHub and Jira and I’ve hardly noticed the three month surge

These have been minor inconveniences for us - at worst. Most of the time it simply means people jump to something else then come back later in the day.

Failing tests and PR feedback cycles are more of a blocker to our team than these outages.

gjulianm4y ago

3 more replies

jeltz4y ago

Maybe you are in a different time zone because our organization certainly noticed and was disrupted by this.

1 more reply

jhugo4y ago

jmartens4y ago

My company monitors the functionality, performance and availability of apps like Github, and we have certainly noticed the increase in issues lately.

1 more reply

ManWith2Plans4y ago

jacobr4y ago

20 PRs waiting in line for half a day to be merged is pretty annoying. We’ve had that on multiple occasions the last few weeks due to GitHub incidents.

mrkurt4y ago

If you want companies to be honest on their status pages (I do!), you can't just count incidents like that. Status pages can be an amazing place to communicate all kinds of problems.

Most issues have a relatively narrow impact, but the impacted people _still_ benefit from seeing them listed.

jmartens4y ago

How can we solve this as customers, or push the vendor to do better?

2 more replies

gwbas1c4y ago

> At this rate the benefits of running your own gitea or gitlab are starting to become competitive

When you host things yourself, you still have downtime. And, having worked with Github for over a decade, the actual disruption to my work is from downtime is much less than if I had to host my own.

megous4y ago

I certainly would not have this problem on self hosted instance, because it would not be behind CF. I'm sure I'd have other problems though. :)

belter4y ago

Excluding ones reported as [Errors], [Scheduled] or [Notifications]

2019 -> 39 Incidents

2020 -> 67 Incidents

2021 -> 86 Incidents

2022 -> 20 Incidents so far

Edit: Using Linear Regression...Prediction for total end 2022: 111 Incidents.

omoikane4y ago

So perhaps they are not exactly improving, but maybe there is some other way to normalize the data.

[1] https://github.blog/2019-11-06-the-state-of-the-octoverse-20...

[2] https://octoverse.github.com/

mrkramer4y ago

One would thought when they got acquired by Microsoft that the number of incidents would go down considering all resources Microsoft would provide but no.

1 more reply

mbesto4y ago

The number of incidents isn't so much of a problem as the amount of downtime is. That would be more interesting to see.

1 more reply

ejb9994y ago

thats not the kind of progression you like to see - that is, error rates increasing over time instead of decreasing.

2 more replies

antiquark4y ago

Based on the same interpolation, github will reach one incident per day by 2032.

TheRealPomax4y ago

jeltz4y ago

CanSpice4y ago

Yesterday's affected me, I couldn't pull or push and when I tried to look at the repo to do PRs I got 500 errors. That only lasted maybe 30 minutes though.

sitzkrieg4y ago

i could not even sso login so it was a bit more impactful than it sounds on the paper

1 more reply

Jiejeing4y ago

If you are a closed org, that is. Running your own gitea or gitlab with registration enabled and having to deal with spam is a real hurdle.

julianlam4y ago

Is it not possible to restrict access to the git server from a VPN server only?

Just off the top of my head, that's one thing you can do.

2 more replies

ironmagma4y ago

KronisLV4y ago

Another self-hosted project in the space that i've seen was GitBucket, although it runs on the JVM (not necessarily a bad thing, just different from Go): https://gitbucket.github.io/

mynameismon4y ago

You might be interested in sourcehut: https://sr.ht

devwastaken4y ago

AlexandrB4y ago

> GitHub, unlike Amazon and friends, doesn't lie about their downtime.

everfrustrated4y ago

GitHub is notorious for only noticing outages once the USA morning starts.

If you're using GitHub in Europe or Asia it's not uncommon for GitHub to be offline for many hours before they acknowledge anying.

rvz4y ago

But going 'all in' on GitHub just doesn't make any sense anymore.

[0] https://hn.algolia.com/?dateRange=all&page=1&prefix=true&que...

Someone4y ago

But who can host a website? I would be wary of hosting something that isn’t a 100% static site, out of fear of the amount of attention maintenance would take.

2 more replies

rglullis4y ago

It's almost like people forget that git is a Distributed Version Control System, after all...

1 more reply

mhh__4y ago

The company I work for has a bunch of non-programmers using and working in gitlab (or "the git"), I can't really see it happening with GitHub regardless of where it was hosted.

Gitlab just seems better for actually running a software project.

dbrgn4y ago

Does Gitea support some kind of federation / cross-instance PRs? That's the main thing I'd miss from a self-hosted instance, the ease of getting contributions.

After all, you don't even need Gitea for pure Git hosting. If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.

The reason I'm still using GitHub is not code hosting. It's collaboration.

dbrgn4y ago

It seems there's a tracking issue here, but it seems stalled: https://github.com/go-gitea/gitea/issues/1612

tokumei4y ago

> If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.

Used to do that years ago for my personal projects. Honestly does the trick.

brimble4y ago

encryptluks24y ago

But by using GitHub for "collaboration" you are sacrificing decentralization.

1 more reply

Melatonic4y ago

The marketing for building your own new "private cloud" will begin soon I am sure :-D

pid-14y ago

You just made a few Openstack consultants raise from their graves.

dijonman24y ago

GitHub enterprise is amazing, but I agree that a centrally hosted Git instance of any variety is a liability.

With the advent of the Okta breach I think we will see a reverse in the centralization trend.

ransom15384y ago

"21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive."

Oh stop the drama. Fine. Setup your gitlab.

adamsmith1434y ago

Is it really though? Are engineers committing so frequently that they can't make it through a few hours without Github?

imiric4y ago

Git is distributed. GitHub is very much not.

lallysingh4y ago

It depends on how many engineers you have! But also, there are plenty of other functions in GH besides raw git, like Wiki/PR/Issues/test/deploy pipelines, etc. It can become pretty critical.

jeffwask4y ago

Yes

queuebert4y ago

Maybe they measure performance in git commits.

duped4y ago

An outage of a few hours can tank a release deadline for me, so yes.

Chris20484y ago

> running your own

assuming that would be flawless, which it wouldn't

sonicggg4y ago

Come on, don't be so dramatic. This is not a 911 call center, people will survive these minor outages.

mdellavo4y ago

sure but three days in a row?

chockchocschoir4y ago

> At this rate the benefits of running your own gitea or gitlab are starting to become competitive

blueplanet2004y ago· 15 in thread

I hope they figure out what’s going on every morning. Heard from inside they don’t know why the db dies everyday but restarting it fixes it.

exikyut4y ago

What's "the db"? It sounds like something of small to medium scale if you can just restart it like that.

In any case, why not just relocate some vendor engineers on site for a bit? Or, better, why does the vendor not have a small presence in the corner?

fundmondawyaya4y ago

Cock crows. DB crashed.

Systemctl restart

mysqld

(Or mariadb, if you pronounce "SQL" as "sequel")

yebyen4y ago

Sounds like it was a MySQL database:

https://github.blog/2022-03-23-an-update-on-recent-service-d...

shepardrtc4y ago

djbusby4y ago

Apache prefork had that since forever. Seems just a garbage collect type pattern.

2 more replies

whimsicalism4y ago

I doubt they use IIS

1 more reply

cube004y ago

Break out the early morning restart cron job.

gaoshan4y ago

Here you go, Github:

0 4 * * * /etc/init.d/postgresql restart

I'll take an architect position as compensation, but only if there is equity.

1 more reply

Kostic4y ago

Early morning in which timezone?

2 more replies

MuffinFlavored4y ago

How long does restarting it take?

raffraffraff4y ago

Yuck. Honestly, restarting a database to fix a major outage sounds like "we have no idea what we're doing"

blueplanet2004y ago

It sounds like "they don't know why it's going down." I've worked with plenty super competent people that have taken time to root cause incidents.

Guide to incidents: Step 1: Stop the bleeding Step 2: Prevent it in the future

Doing Step 1 doesn't make you incompetent.

1 more reply

bpicolo4y ago

Sporadic database performance issues can certainly make you feel that way. They are definitely not trivially debugged at scale

vimda4y ago

Would you rather it stay down while they spend a day debugging it?

2 more replies

karmakaze4y ago

They could use multiple writer hosts and rollover the restarts. MySQL has had GTIDs since 5.6 and replication groups rather than writer-replicas since some 5.7.x version.

iBotPeaches4y ago· 6 in thread

It seems like we haven't had a non-robot status update on the status page in days since this what seems like daily occurrence. I figure at this point we'd get something of why this is happening.

I also don't appreciate our builds freezing, unable to be cancelled and then eating up hundreds of minutes.

lucasyvas4y ago

I haven't used GA in a way where it actually costed me anything, but having minutes just tick away while you can't do anything is really stupid if that's the case.

drusepth4y ago

The "ping" solution is an interesting one that I haven't seen proposed before.

Would love to read up more on this approach; seems interesting!

MattIPv44y ago

> I figure at this point we'd get something of why this is happening.

I've created a new discussion in their feedback repo asking for this, three major outages in a week could really do with a post-mortem: https://github.com/github/feedback/discussions/13344

mhitza4y ago

Of course, assuming that a future bug won't affect the timeout-minute itself.

easton4y ago

Do they give you the minutes back if there's an incident during the period where a job is running?

no_wizard4y ago

You will have to contact them for them to credit you, that's what we did

1 more reply

jonnybarnes4y ago· 4 in thread

2nd day in a row isn’t it?

stepri4y ago

And 6 days ago: https://news.ycombinator.com/item?id=30711269

momothereal4y ago

Yes: https://news.ycombinator.com/item?id=30767635

rvz4y ago

It is. 24 hours later [0] and I only expected it to happen once every month. Looks like it is getting worse.

Oh dear. Not a good idea to go 'all in' on GitHub.

[0] https://news.ycombinator.com/item?id=30767821

fishnchips4y ago

Yesterday they had two.

etimberg4y ago· 4 in thread

The quality of GH seems to be slipping

Trasmatta4y ago

I've actually been pretty impressed with the quality of the product and new features over the past couple of years, but it seems to be having a lot of stability issues recently.

etimberg4y ago

I've liked the new features too, especially after so many years of not many features. Maybe they've moved too fast now

xtracto4y ago

Funny that it happened since they were acquired by Microsoft... reminds me of Hotmail, Skype, LinkedIn, Rare, among several others.

2 more replies

amelius4y ago

I hope it doesn't affect security ...

frjalex4y ago· 3 in thread

Looking at the "GitHub" prefix in the title, I was half-expecting this to point to a report explaining the outage a week ago... But rest assured, it is a new outage!

teekert4y ago

Oh I thought it was about the one from yesterday :)

aaaaaaaaata4y ago

Are their CI/CD toys that shiny that people still willingly choose them even with all the issues?

I find myself regularly asking this — about every major SaaS used for critical ops stuff like this.

1 more reply

annexrichmond4y ago

I thought it was going to be a Postmortem. I couldn't have been more wrong!

mfashby4y ago· 2 in thread

I'm inclined to look at tools like fossil again, for it's distributed issue tracking and wiki capability

https://fossil-scm.org/home/doc/trunk/www/index.wiki

JonChesterfield4y ago

As headcount goes up I think the inability to locally rewrite history into easily reviewable patches would be sorely missed. So it's git for team stuff and fossil for my own.

edgyquant4y ago

I had forgotten about that, thanks!

koolba4y ago· 2 in thread

I really wish they would add the word “outage” to these titles.

“Incident” alone makes me think something got hacked or leaked.

arez4y ago

That's SRE lingo --> https://sre.google/sre-book/managing-incidents/

zufallsheld4y ago

It's also itil lingo, which predates sre.

1 more reply

okareaman4y ago· 2 in thread

What's the difference between GitHub and GrubHub?

GrubHub delivers

darknavi4y ago

Watching Lion King as a youth I always though grubs looked delicious.

Little did I know...

jadbox4y ago

First HN comment that ever made me laugh, well done.

Xarodon4y ago· 2 in thread

This has been a pretty rough week for GitHub

stuff4ben4y ago

Github Enterprise hasn't been faring too well at my work either this week. When you work on both open and closed source products and GH and GHE are both down, it leads to a very unproductive week.

jrowley4y ago

Does GitHub enterprise result in dedicated instance or any better availability?

2 more replies

mirekrusin4y ago· 2 in thread

What's the best crowdsourced status monitor?

eckza4y ago

https://outage.bingo/

mirekrusin4y ago

+1 :)

mirekrusin4y ago· 1 in thread

Status page says only degraded performance.

It's a nice way of putting it.

I'm trying to run github action for couple of hours now. They don't work at all. But apparently this means they run, but in infinite time, hence == degraded performance, nice.

raffraffraff4y ago

It's just a way to avoid SLA breaches. "Of course it wasn't down! It was just infinitely slow!"

bob10294y ago· 1 in thread

We are scheduling a call with an enterprise sales person next week.

If I can get all the Github features I had as of ~2020, but on an instance that wont get hit by the public cloud/update bus, I would be exceptionally happy.

The only complaints we have are regarding availability. If we can fix that one problem, this is a perfect product in our view.

andruby4y ago

How do you evalute running your own gitlab instance?

cube22224y ago

Looks like they really want to get a PR deployed, but there's still not enough duct tape on it.

higeorge134y ago

The usual services (actions) again down around the same time. This is embarrassing.

intunderflow4y ago

With how often these happen we might as well sticky this thread for the next one

rvz4y ago

Again? Last time that happened was 24 hours ago? [0] It is really getting unreliably bad. Like I said before, having a self-hosted backup seems to make more sense.

[0] https://news.ycombinator.com/item?id=30767821

einpoklum4y ago

The page at the link is not much more informative than the link itself :-(

grumple4y ago

Again?! Jeez. I wish I had customers this tolerant.

max23_4y ago

Looks like the same services that were affected in yesterday incident.

eatonphil4y ago

Github Actions are back for me now.

toastal4y ago

j / k navigate · click thread line to collapse