In any case, why not just relocate some vendor engineers on site for a bit? Or, better, why does the vendor not have a small presence in the corner?
Sounds like whatever "the db" is it's probably some (objectively) small but very scary thing that's currently on fire and people are trying to figure out how to put it out without crashing the plane and also making too many waves internally, which is probably even harder. So asking about making vendor noises is (as useful as it may be) probably going down the wrong path - in much the same way this is probably not related to the outages (it may well be, but from the outside it's all coincidence anyway).
Systemctl restart
mysqld
(Or mariadb, if you pronounce "SQL" as "sequel")
https://github.blog/2022-03-23-an-update-on-recent-service-d...
0 4 * * * /etc/init.d/postgresql restart
I'll take an architect position as compensation, but only if there is equity.
Guide to incidents: Step 1: Stop the bleeding Step 2: Prevent it in the future
Doing Step 1 doesn't make you incompetent.
I also don't appreciate our builds freezing, unable to be cancelled and then eating up hundreds of minutes.
I haven't used GA in a way where it actually costed me anything, but having minutes just tick away while you can't do anything is really stupid if that's the case.
Edit: Another sane solution would probably be to record outage periods and have Billing automatically reconcile for every customer when invoicing. This would require them to admit the outage durations however, so it may be flawed from a human perspective.
At what rate would you do these pings? I don't know how upgrading/downgrading works at GitHub but if they do any sort of refund/credit when you downgrade, it seems like there's some interesting implications for abusing the system (e.g. upgrading/downgrading between pings for "free" service if the time between them is too long) versus performance (e.g. how do you update all users per ping in a timely manner if the time between them is too short?).
Would love to read up more on this approach; seems interesting!
I've created a new discussion in their feedback repo asking for this, three major outages in a week could really do with a post-mortem: https://github.com/github/feedback/discussions/13344
Of course, assuming that a future bug won't affect the timeout-minute itself.
As headcount goes up I think the inability to locally rewrite history into easily reviewable patches would be sorely missed. So it's git for team stuff and fossil for my own.
“Incident” alone makes me think something got hacked or leaked.
It's a nice way of putting it.
I'm trying to run github action for couple of hours now. They don't work at all. But apparently this means they run, but in infinite time, hence == degraded performance, nice.
GrubHub delivers
If I can get all the Github features I had as of ~2020, but on an instance that wont get hit by the public cloud/update bus, I would be exceptionally happy.
The only complaints we have are regarding availability. If we can fix that one problem, this is a perfect product in our view.
Oh dear. Not a good idea to go 'all in' on GitHub.
21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive.
People want to know it isn’t their problem, that makes cloud computing (and things like GitHub) worth their weight in gold. I have real problems to solve I don’t want to deal with a git repo manager on top of that.
This has been my experience as well. I don't know if that means GitHub is being overly transparent about issues or I've just been lucky but I would hate if people punished services for being transparent and informative on their status pages.
These have been minor inconveniences for us - at worst. Most of the time it simply means people jump to something else then come back later in the day.
Failing tests and PR feedback cycles are more of a blocker to our team than these outages.
Most issues have a relatively narrow impact, but the impacted people _still_ benefit from seeing them listed.
When you host things yourself, you still have downtime. And, having worked with Github for over a decade, the actual disruption to my work is from downtime is much less than if I had to host my own.
That being said: I briefly worked for a company that hosted its own source code control system. For us, as a small team, it wasn't worth it. The system was outdated and hosted in an insecure manner. No one ever did any "admin" work except the founder. He ran it because he had irrational fears of switching, not because of any tangible advantages over Github (and competitors.)
Keep in mind that Github (and competitors) are often cheaper than the time needed to invest in hosting your own. (Estimate 10-20 hours a year of invested time. Calculate your hourly rate. Github and competitors are cheaper.) In order to come ahead, you need tangible benefits other than "I think I can have less downtime."
I certainly would not have this problem on self hosted instance, because it would not be behind CF. I'm sure I'd have other problems though. :)
All software is crap. You can be either spending time fixing it yourself, or spending time begging online for fixes/help from some SaaS company/community with resolution time in months, somtimes, all that while you may not be able to use it fully.
Also with SaaS it will be constantly shifting under you. Things will be moved around, restyled, iconized, popupized, etc. This doesn't help productivity either. With self-hosting, you can at least avoid upgrading, if you dislike this kind of thing. Or choose FOSS software that values UX permanency/stability, which seems to be really hard ask from SaaS business.
2019 -> 39 Incidents
2020 -> 67 Incidents
2021 -> 86 Incidents
2022 -> 20 Incidents so far
Edit: Using Linear Regression...Prediction for total end 2022: 111 Incidents.
So perhaps they are not exactly improving, but maybe there is some other way to normalize the data.
[1] https://github.blog/2019-11-06-the-state-of-the-octoverse-20...
How many core incidents? The part that affects whether you can even push to and pull from a repo, and access issues and PRs? Because everything else is nice to have, but you can do work perfectly fine without them if they go down for a few hours.
Just off the top of my head, that's one thing you can do.
Are you kidding? The last 2 incidents were called "degraded performance". Where "degraded" meant I would get nothing but 500 errors accessing GitHub.com either via browser or git itself for the duration of the outage. How is this not lying?
If you're using GitHub in Europe or Asia it's not uncommon for GitHub to be offline for many hours before they acknowledge anying.
But going 'all in' on GitHub just doesn't make any sense anymore.
[0] https://hn.algolia.com/?dateRange=all&page=1&prefix=true&que...
Also, quite a few of the non-profits behind the projects you mentioned have multi-million dollar budgets that they can use to administer their git instance, if needed. I don’t think “if they can do it, you can” is a strong argument for those.
Gitlab just seems better for actually running a software project.
After all, you don't even need Gitea for pure Git hosting. If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.
The reason I'm still using GitHub is not code hosting. It's collaboration.
Used to do that years ago for my personal projects. Honestly does the trick.
If you don't want or need those things, bare git repos are fine and certainly easier to support (not that Gitea's that hard, though a few issues/PRs I've noticed have caused me more than a little concern about the overall quality of the project).
With the advent of the Okta breach I think we will see a reverse in the centralization trend.
Oh stop the drama. Fine. Setup your gitlab.
Git is distributed. GitHub is very much not.
assuming that would be flawless, which it wouldn't
No need, just use Codeberg.org instead. They run Gitea and is a free collaboration platform (+ git hosting) for free projects. FOSS/OSS should really consider alternatives to GitHub and GitLab, especially when there are much more FOSS/OSS friendly platforms around.
I find myself regularly asking this — about every major SaaS used for critical ops stuff like this.