GitHub incident – now resolved (opens in new tab)

(githubstatus.com)

73 pointsxvello3y ago75 comments

75 comments

47 comments · 16 top-level

pilif3y ago· 11 in thread

When I decided to move away from self-hosted git ages ago and then from Jenkins to GA two years ago, reliability was a huge factor in my decision because Github, I supposed, would be much better at keeping their infrastructure running than I am at keeping ours.

Turns out the uptime of both our git server and even the Jenkins instance beat GitHub by far and while the former only cost a marginal amount of CPU time on infrastructure I was running anyways, GitHub is a noticeable expense for us.

Of course it still saves me from the panic attacks every time I'm compelled to press the "Update now" button in Jenkins because either I do nothing and get my instance RCEd or I do press the button and who knows what plugin update will break which part of our setup, but while that was a constant fear in my mind, the amount of downtime caused by Jenkins plugin updates was zero whereas what GitHub is doing lately is way, way, way worse than zero.

I'm starting to get frustrated and like I presume many other paying users, I think I'm at a point where I feel like we should get partial refunds of our subscription money given the very spotty uptime all year now.

_joel3y ago

PSA:

Never expose Jenkins to the public internet, make sure it's via VPN. If you need webhooks, there are services for that which allow you to broker webhooks whilst calling in from the Jenkins side (i.e. not exposing ports). Even so if you do have to use native webhooks, at least lock it down to the upstream's IP range(s).

Ideally have a dev jenkins to test all the things first before hitting upgrade on your prod instance and killing some plugins (hell even better if it's all IaaC and can just spin up a jenkins host per env, but ££$$$££/Time etc)

NhanH3y ago

Nowadays, tailscale or cloudflare access + tunnel works amazing well for private service that you might need access on untrusted network. So the needs for keeping them up to can be delayed a lot more (of course, jenkins is a special case since it might be pulling and executing untrusted code, but I think that is something you need to care even without security issue specific to jenkins itself).

1 more reply

sascha_sl3y ago

We have open tickets about the SLA being broken for Q1 and Q2. They've been open for a while, and we're out of ways to escalate them (despite enterprise).

And GitHub's SLA is not great to begin with: 10% of spend refund at 3 nines (99,9%) and 25% of spend at 2 nines (99%).

AtNightWeCode3y ago

In general, don't use Jenkins. It is mediocre piece of software with an even worse ecosystem. It self-implodes every now and then. Big tech companies build a lot of custom stuff to get it to work correctly. Once that is done sure, it works. I am not a fan of Github actions but to host GIT and Jenkins is not really a serious option. Pick another CI platform in that case.

blown_gasket3y ago

This comment makes a lot of assertions without any backing data.

How is it mediocre? Is it because of the CVEs that have been released in the prior years? I recall GitLab also having quite a bad week of CVEs in February[1].

How is it a bad ecosystem? If this is about plugins in order to do things, I actually like this framework - it lets there be specific owners for portions of the open source development.

Self-implodes? This seems like it would be tracked as a bug. I've encountered an instance where Jenkins wouldn't start due to a crypto issue but that was due to a bug and all I needed to do was install a patch.

I think that using Jenkins can be a thought of a serious option if like anything else, you follow security protocols ie: don't allow public access, maintain RBAC standards, have a maintenance schedule.

[1]https://about.gitlab.com/releases/2022/02/25/critical-securi...

1 more reply

highmastdon3y ago

I have always been happy using GitLabs CI/CD tooling [1]. Also, the integration with the source code this way is like Github with Github Actions.

[1] https://docs.gitlab.com/ee/ci/yaml/gitlab_ci_yaml.html

tapoxi3y ago

GitHub is running a complex, planet-scale product. I think w they've crossed the threshold where doing it yourself is more likely to be reliable for some use cases.

We've been running GitLab on GKE for the past three years, no problems outside of initial migration pains.

fartcannon3y ago

Also: "you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services."

https://www.microsoft.com/en-ca/servicesagreement/upcoming.a...

Pryde3y ago

Am I missing something, or is GitHub distinctly not listed in the Covered Services section of that services agreement?

1 more reply

mey3y ago

https://docs.github.com/en/site-policy/github-terms/github-t...

GitHub has a separate policy...

1 more reply

iso16313y ago

> When I decided to move away from self-hosted git ages ago and then from Jenkins to GA two years ago, reliability was a huge factor in my decision because Github, I supposed, would be much better at keeping their infrastructure running than I am at keeping ours.

Oh you sweet summer child

benburwell3y ago· 7 in thread

I really want to like GitHub Actions. But it feels like every time I'm trying to get something done, they are broken.

hatf03y ago

The problem that I've found is that no other CI/CD provider has feature parity to GitHub Actions & integration with Git. Sure, there's external providers (Travis-CI, Buildkite, etc) but none of them feel like they have the polish of GitHub Actions. GitLab & Azure DevOps also don't compare at all - I've migrated whole organizations off of both because they just don't feel polished / break rather frequently. So I'm personally stuck with GitHub, simply because no other company provides anything better.

cookiecaper3y ago

GitLab CI is definitely competitive with GitHub Actions. What's the specific feature set you're missing? "Polish" is pretty vague, especially in the context of a post that laments GitHub Actions' poor uptime record.

AtNightWeCode3y ago

Not at all true. GH actions are about as simple as they can be and can be replaced by mostly any other commercial service that does the same. Also, any CI/CD tool can integrate with GH.

_joel3y ago

Gitlab IMHO is definitely in parity, if not better. I've had loads of success with it at a few places now, using basic stuff right up to full Autodevops with custom buildpacks for Elixir/Pheonix.

Tainnor3y ago

I have the exact opposite experience. GitHub CI is probably the worst CI I've used so far (except for custom homegrown messes), and Gitlab CI by far the best.

1 more reply

hatware3y ago

Wouldn't it be nice, 2 years into these hiccups, if GitHub could explain why the same problem keeps coming up?

There's a point where it's funny, and we're way past that.

Queue293y ago

They did shed some light into their internal struggles https://github.blog/2022-03-23-an-update-on-recent-service-d...

1 more reply

4khilles3y ago· 5 in thread

Highly recommend https://builds.sr.ht. I've been using for ~2 years, never had an issue.

gabrielgio3y ago

Just recently I found out about sourcehut. The project is refreshing. I don't know if there is something similar in the market but I enjoy how simple and straightforward to use it is.

hatf03y ago

LMAO, the entire front page of SourceHut is broken.. that's not a good look...

gabrielgio3y ago

What is broken? Both https://sourcehut.org/ and https://builds.sr.ht/ renders just fine to me.

1 more reply

eatonphil3y ago

How is it broken? sr.ht? It seems to work for me...

mcint3y ago

Probably not the best use of archive resources, but you could store a snapshot of it, so hn commenters can believe you.

fartcannon3y ago· 4 in thread

Not only is it not more reliable than your own server, using them launders your IP through copilot which they then sell to you and your competitors.

notriddle3y ago

This is axe grinding. Please stop. HN has already had several threads related to Copilot. Take your grievances there, instead.

fartcannon3y ago

Hey, does your website accountkiller work for HN? They seem to be rather strict about their accounts.

oneepic3y ago

Source? In other places, it is stated they only use public data, such as from public repositories. See the faq at the bottom here: https://github.com/features/copilot/

fartcannon3y ago

That's what I'm looking for. The source - but more so the exact words - that says they're legally allowed to do that. Because even if it's publicly viewable, it's still my intellectual property. I want to read the part that says, "even though it's your IP, we have the right to launder it through copilot and sell it back to you/others".

eatonphil3y ago· 2 in thread

What's annoying about this is that the PR doesn't even say it's trying to run tests. It says everything is passing and just doesn't list the actions.

For a second I thought someone must have deleted the actions yaml files.

This is a dangerous failure mode.

https://github.com/multiprocessio/dsq/pull/82

Screenshot here: https://twitter.com/phil_eaton/status/1542168020516216832

OJFord3y ago

As in you have it configured to prevent merge until they passed, they're not running at all, and it's allowing merges?

eatonphil3y ago

That's right. This button shouldn't be green. But it's not just that the actions aren't running but their service that reports that actions exist must be down too. That is a bad design. They should still report that the actions exist even if they can't run them. This PR button shouldn't be green/pressable.

2 more replies

zippergz3y ago· 1 in thread

Title says "now resolved" but the link says the incident is ongoing and they "are actively working on a mitigation."

xvelloOP3y ago

Agreed, the admins re-titled it too early. Git operations and API requests are back to green, but other subsystems are still impacted.

brunojppb3y ago· 1 in thread

The company I currently work for is in the process of migrating out of Jenkins to Github Actions. With all problems that Github has, it has been, by far, a much better experience, even with all these issues and trade-offs, Github Actions in combination with the Github UI has been a net positive in all aspects.

Jenkins is slow and a nightmare to maintain. It became a huge ball of mud that nobody wants to touch. Just keeping the lights on it's a large burden for the infrastructure team.

rglullis3y ago

A bit a false dichotomy: there are other open source CI alternatives that are way more modern than Jenkins: Drone/Woodpecker, Gitlab, sourcehut all come to mind.

freedomben3y ago

It seems a little early to call it "now resolved." I'm still seeing issues. If it's gotta munch through a queue or something, it would be helpful to announce that info.

Edit: It's just the HN title that says "now resolved." This github status says:

> We have identified the source of disruptions and are actively working on a mitigation. The systems are in recovery and services are returning to green.

synu3y ago

I have been so happy to be on GitLab again after some time working at a company that used GitHub. The issues and epics in particular are so much better, and CI seems to be more reliable.

rglullis3y ago

So, it seems like there is at least once a week a partial outage on github. For how long ate the CTOs and Engineering managers all around continue to accept this?

Who shouldn't at the very least donate a bit to the various open source CI solutions, as a way to have some kind of hedge?

rvz3y ago

Again? Last time that happened was 9 days ago. [0] Just like I said before, at least twice a month GitHub Actions, Pages or something else goes down.

Each time this happens, it makes no sense to go all in on GitHub. Perhaps companies like ARM, and projects like Wine [1], ReactOS [2], etc already went with self-hosting or have a failsafe solution to fall back on.

[0] https://news.ycombinator.com/item?id=31815918

[1] https://www.phoronix.com/scan.php?page=news_item&px=Wine-Git...

[2] https://github.com/reactos/reactos#code-mirrors

debarshri3y ago

I think it is not just github actions, github in general is experiencing degraded performance [1].

[1] https://www.githubstatus.com/

jerryjerryjerry3y ago

Emm, sounds not good... this kind of incidents can impact not just developers but also business insight applications being heavily analyzing github activities and projects in real-time, like this one (https://ossinsight.io/).

Anyway to minimize the impact of such github incident on everyone's daily projects and business?

mfashby3y ago

I don't really use github actions (like, ever), but it's the default setup for a terraform provider I'm working on and the very minute I queue a bunch of jobs; the system goes down. Interesting.

mario_kart_snes3y ago

My actions are simply not running.

zwilliamson3y ago

Maybe they need to hire high end talent? Anyone have experience with their recruiting process?

j / k navigate · click thread line to collapse

75 comments

47 comments · 16 top-level

pilif3y ago· 11 in thread

_joel3y ago

PSA:

NhanH3y ago

1 more reply

sascha_sl3y ago

We have open tickets about the SLA being broken for Q1 and Q2. They've been open for a while, and we're out of ways to escalate them (despite enterprise).

And GitHub's SLA is not great to begin with: 10% of spend refund at 3 nines (99,9%) and 25% of spend at 2 nines (99%).

AtNightWeCode3y ago

blown_gasket3y ago

This comment makes a lot of assertions without any backing data.

How is it mediocre? Is it because of the CVEs that have been released in the prior years? I recall GitLab also having quite a bad week of CVEs in February[1].

How is it a bad ecosystem? If this is about plugins in order to do things, I actually like this framework - it lets there be specific owners for portions of the open source development.

[1]https://about.gitlab.com/releases/2022/02/25/critical-securi...

1 more reply

highmastdon3y ago

I have always been happy using GitLabs CI/CD tooling [1]. Also, the integration with the source code this way is like Github with Github Actions.

[1] https://docs.gitlab.com/ee/ci/yaml/gitlab_ci_yaml.html

tapoxi3y ago

GitHub is running a complex, planet-scale product. I think w they've crossed the threshold where doing it yourself is more likely to be reliable for some use cases.

We've been running GitLab on GKE for the past three years, no problems outside of initial migration pains.

fartcannon3y ago

https://www.microsoft.com/en-ca/servicesagreement/upcoming.a...

Pryde3y ago

Am I missing something, or is GitHub distinctly not listed in the Covered Services section of that services agreement?

1 more reply

mey3y ago

https://docs.github.com/en/site-policy/github-terms/github-t...

GitHub has a separate policy...

1 more reply

iso16313y ago

Oh you sweet summer child

benburwell3y ago· 7 in thread

I really want to like GitHub Actions. But it feels like every time I'm trying to get something done, they are broken.

hatf03y ago

cookiecaper3y ago

AtNightWeCode3y ago

Not at all true. GH actions are about as simple as they can be and can be replaced by mostly any other commercial service that does the same. Also, any CI/CD tool can integrate with GH.

_joel3y ago

Gitlab IMHO is definitely in parity, if not better. I've had loads of success with it at a few places now, using basic stuff right up to full Autodevops with custom buildpacks for Elixir/Pheonix.

Tainnor3y ago

I have the exact opposite experience. GitHub CI is probably the worst CI I've used so far (except for custom homegrown messes), and Gitlab CI by far the best.

1 more reply

hatware3y ago

Wouldn't it be nice, 2 years into these hiccups, if GitHub could explain why the same problem keeps coming up?

There's a point where it's funny, and we're way past that.

Queue293y ago

They did shed some light into their internal struggles https://github.blog/2022-03-23-an-update-on-recent-service-d...

1 more reply

4khilles3y ago· 5 in thread

Highly recommend https://builds.sr.ht. I've been using for ~2 years, never had an issue.

gabrielgio3y ago

Just recently I found out about sourcehut. The project is refreshing. I don't know if there is something similar in the market but I enjoy how simple and straightforward to use it is.

hatf03y ago

LMAO, the entire front page of SourceHut is broken.. that's not a good look...

gabrielgio3y ago

What is broken? Both https://sourcehut.org/ and https://builds.sr.ht/ renders just fine to me.

1 more reply

eatonphil3y ago

How is it broken? sr.ht? It seems to work for me...

mcint3y ago

Probably not the best use of archive resources, but you could store a snapshot of it, so hn commenters can believe you.

fartcannon3y ago· 4 in thread

Not only is it not more reliable than your own server, using them launders your IP through copilot which they then sell to you and your competitors.

notriddle3y ago

This is axe grinding. Please stop. HN has already had several threads related to Copilot. Take your grievances there, instead.

fartcannon3y ago

Hey, does your website accountkiller work for HN? They seem to be rather strict about their accounts.

oneepic3y ago

Source? In other places, it is stated they only use public data, such as from public repositories. See the faq at the bottom here: https://github.com/features/copilot/

fartcannon3y ago

eatonphil3y ago· 2 in thread

What's annoying about this is that the PR doesn't even say it's trying to run tests. It says everything is passing and just doesn't list the actions.

For a second I thought someone must have deleted the actions yaml files.

This is a dangerous failure mode.

https://github.com/multiprocessio/dsq/pull/82

Screenshot here: https://twitter.com/phil_eaton/status/1542168020516216832

OJFord3y ago

As in you have it configured to prevent merge until they passed, they're not running at all, and it's allowing merges?

eatonphil3y ago

2 more replies

zippergz3y ago· 1 in thread

Title says "now resolved" but the link says the incident is ongoing and they "are actively working on a mitigation."

xvelloOP3y ago

Agreed, the admins re-titled it too early. Git operations and API requests are back to green, but other subsystems are still impacted.

brunojppb3y ago· 1 in thread

Jenkins is slow and a nightmare to maintain. It became a huge ball of mud that nobody wants to touch. Just keeping the lights on it's a large burden for the infrastructure team.

rglullis3y ago

A bit a false dichotomy: there are other open source CI alternatives that are way more modern than Jenkins: Drone/Woodpecker, Gitlab, sourcehut all come to mind.

freedomben3y ago

It seems a little early to call it "now resolved." I'm still seeing issues. If it's gotta munch through a queue or something, it would be helpful to announce that info.

Edit: It's just the HN title that says "now resolved." This github status says:

> We have identified the source of disruptions and are actively working on a mitigation. The systems are in recovery and services are returning to green.

synu3y ago

I have been so happy to be on GitLab again after some time working at a company that used GitHub. The issues and epics in particular are so much better, and CI seems to be more reliable.

rglullis3y ago

So, it seems like there is at least once a week a partial outage on github. For how long ate the CTOs and Engineering managers all around continue to accept this?

Who shouldn't at the very least donate a bit to the various open source CI solutions, as a way to have some kind of hedge?

rvz3y ago

Again? Last time that happened was 9 days ago. [0] Just like I said before, at least twice a month GitHub Actions, Pages or something else goes down.

[0] https://news.ycombinator.com/item?id=31815918

[1] https://www.phoronix.com/scan.php?page=news_item&px=Wine-Git...

[2] https://github.com/reactos/reactos#code-mirrors

debarshri3y ago

I think it is not just github actions, github in general is experiencing degraded performance [1].

[1] https://www.githubstatus.com/

jerryjerryjerry3y ago

Anyway to minimize the impact of such github incident on everyone's daily projects and business?

mfashby3y ago

I don't really use github actions (like, ever), but it's the default setup for a terraform provider I'm working on and the very minute I queue a bunch of jobs; the system goes down. Interesting.

mario_kart_snes3y ago

My actions are simply not running.

zwilliamson3y ago

Maybe they need to hire high end talent? Anyone have experience with their recruiting process?

j / k navigate · click thread line to collapse