The status page says all is well, though: https://www.githubstatus.com/. Hilarious.
Good reason why companies shouldn't be using Twitter/X for status updates anymore!
Also probably a class action suit lurking somewhere in there eventually.
Now 4 out of 10 services are marked as "Incident", yet most of the others are also completely dead.
Migration to a new host takes another 15 seconds thanks to both zfs and containers.
I don't know how many GitHub downtime reports I've seen during that time, we're probably into high dozens by now.
I've been moving most of my projects off of GitHub and into Gitea, and will continue to do so.
We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.
So long as I can fetch/commit to my repos, pretty much everything else is of secondary, tertiary, or no real importance to me.
(At work, I do indeed have systems running that monitor 200 statuses from client project homepages, almost all of which show better that 99.999% uptimes. And are practically useless. Most of them also monitor "canary" API requests which I strive to keep at 99.99% but don't always manage to achieve 99.9% - which is the very best and most expensive SLA we'll commit to.)
I don't think GitHub has recovered from the monthly incidents that keeps occurring. Quite frankly it is the expectation that something will go down every month at GitHub which shows how unreliable the service is and this has happened for years.
I guess this 4 year old prediction post really aged well after all about self-hosting and not going all in on Github [0]
I remember a time when systems would boast about their "five nines" uptime. It was before anything "cloud" appeared.
People use this page for guidance. I guess now we know how much it can be trusted.
- an incident be declared internally to github
- support / incident team submits a new status page entry (with details on service(s) impact(ed))
- incident is worked on internally
- incident fixed
- page updated
- retro posted
Even aws now seems to have some automation for their various services per region. But it doesn't automatically show issues because it could be at the customer level or subset of customers, or subset of customers if they are in region foo in AZ bar, on service version zed vs zed - 1. So they chose not to display issues for subsets.
I do agree it would be nice to have logins for the status page and then get detailed metrics based on customerid or userid. Someone start a company to compete with statuspage.
Hope I don't appear in the incident report.
So, you're both responsible and not responsible at the same time :)
> Hope I don't appear in the incident report.
Appearing in an incident report with your HN username could be pretty funny...
I had a github page that was public, but it was made private and the DNS config was removed. Fast forward to today. I made the private repo public again and forced a deploy of the page without making a new commit. It said the DNS config was incomplete, so I tweaked it and hit "check again" and github went down.
Probably unrelated, but the timing was spooky.
It stated this but with Twitter; it will monitor latest tweets searching for a custom word combo and raise a server alert when found. I found it hilarious. Will post the source once GitHub is back on.
Both seem to be doing too much all at once. But really it is worse with Github if this is what Microsoft stewardship is incidents every, single week and each month guaranteed for years.
Anyways. #hugops for the GitHub team.
What’s the point of bringing up twitter? It is strange to seek victimhood for a petulant billionaire. Of course, it is worse with GitHub because GitHub actually provides useful functionality.
Good opportunity to think about mirroring your repos somewhere else like Gitea or Gitlab.
Another reason is that MS may be in phase when it will ask to pay for using GitHub just for reads (rate limiter).
When you would usually create a PR, you use `git format-patch` to create a patch file and send that to whoever is going to merge it.
They create a branch and use `git am` to apply the patch to it, review the changes, and merge it to main.
It is nice that git supports multiple remotes, though. It feels good to know that `git push` might not work for my project right now, but I know `git push srht` will get the code off of my laptop.
Well, that's how it was designed to work! The whole point of Git is that it's a distributed version control system, and doesn't need to rely on a centralized source of truth.
I also had to setup a bidirectional mirror back when bandwidth to some countries was restricted. We would push and pull as normal, and a job would keep our mainline in sync.
It is sad that most organizations forget that git is distributed by nature. We often get requests to setup VPNs and all sorts of craziness, when a simple push to a bare mirror would suffice. You don't even need anything running, other than SSH.
The real reason not to use github anyway though is that it's terrible (the basic "github model" for doing code review was basically made up on the back of a napkin IMO)
https://www.bleepingcomputer.com/news/security/github-action...
That happened to me a while back with an app listing that was almost 10 years old because the server I was hosting the policy on went down. Ironically, I switched it to Github pages so it wouldn't happen again.
give the poor github ops folks a second to get things moving.
You ideally do not want to be making a decision on whether to update a status page or not during the first few minutes of an incident, bean counters inevitably tend to get involved to delay/not declare downtime if there is a manual process.
It is more likely the threshold is kept a bit higher than couple of minutes to reducing false positives rates, not because of manual updates.
[1] https://www.atlassian.com/software/statuspage/integrations
This is a pretty good place to check. The lag is pretty minimal traditionally.
At the time of posting everything is broken.
Nix: barfs voluminous errors I've never seen before
Me: whaaaat the farrrrk
* nixos updates are pulled from a github repo
Also they had IPFS attempts, but not finished.
The static content on the error page might also be on akami or cloudflare side.
the images on the page are all just base64 encoded right into the html
```
Received a 503 error. Data returned as a String was: <!DOCTYPE html> <!- -
Hello future GitHubber! I bet you're here to remove those nasty inline styles, DRY up these templates and make 'em nice and re-usable, right?
Please, don't. https://github.co...
```
That's where it's cut off on my screen.
Curious what the link is :)
I like to think, someone did.
What made me laugh though was when the "X is functioning normally" immediately followed by "X is degraded, continuing to monitor" messages that kept popping up then right back to "normal" again, all in the same 30 second timespan... made me giggle
Here is a good article on how to prepare for the situations like that, when GitHub is down: https://gitprotect.io/blog/github-restore-and-github-disaste...
Update - Issues is experiencing degraded availability. We are continuing to investigate.
Aug 14, 2024 - 23:19 UTC
Update - Git Operations is experiencing degraded availability. We are continuing to investigate.
Aug 14, 2024 - 23:19 UTC
Update - Packages is experiencing degraded availability. We are continuing to investigate.
Aug 14, 2024 - 23:18 UTC
Update - Copilot is experiencing degraded availability. We are continuing to investigate.
Aug 14, 2024 - 23:13 UTC
Update - Pages is experiencing degraded availability. We are continuing to investigate.
Aug 14, 2024 - 23:12 UTCSeeing it all kind of went sideways at the same time, my money is on the typical load balancer config rollout snafu.
"As part of a routine configuration deploym..." [splat]Time to consider self-hosting like the old days instead of this weekly chaos at GitHub.
> We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. Aug 14, 2024 - 23:29 UTC
Seems like they’re back up though. Or at least the Rust blog is back up.
"Why isn't this project done yet?"
"Didn't you hear? GitHub is down!"
and I get to go out for a long lunch
Good luck to the devs and dev-analogues involved in getting the ship righted.
[error] [auth] Response content-type is text/html; charset=utf-8 (status=503)
fatal: unable to access: 502
cli, web, and iOS app :-/
Senior: Ah found it! Let's just rollback one revision on the db. Newguy: let me fix this! `kubectl rollout undo ... --to-revision=1` Newguy: Ok, Started rollback to revision one! Senior: Uh-oh..