Gitlab is down (opens in new tab)

(gitlab.com)

66 pointsriyadparvez4y ago59 comments

59 comments

GitLab team member here. We're aware of the incident and the status page has been updated. We will provide further updates on the status page as they become available.

(Edited now that the status page has been updated).

totaldude874y ago

Awesome, thank you! Godspeed!

hackerlytest4y ago

Thanks. Seems to be back for me.

Dayshine4y ago

It took (by my measure) 13 minutes for a full outage to be represented on the status page.

I was under the impression that gitlab use gitlab.com for their work. Surely someone would have noticed within seconds that it was down?

Why have the misleading "updated a few seconds" ago text if it doesn't update on complete failure? :)

john_cogs4y ago

Your impression is correct. We use GitLab.com and notice these incidents as they happen.

The delay in updating status is a result of our Incident Management process [0]. We have a Communications Manager on Call (CMOC) who leads communication throughout an incident. One of their responsibilities includes updating the status page. The slight delay between noticing the issue and updating the status page is a result of the time it takes for the CMOC to get alerted, assess the situation, and write the communication that is shared on the status page.

I'm not sure how the "updated a few seconds ago" messages are generated but I'll try to find out once the incident has been resolved.

0 - https://about.gitlab.com/handbook/engineering/infrastructure...

linuxdeveloper4y ago

Why is "Active Incident" and "System Wide Outage" on the status page with a background color of green? Why not red?

At first glance it looks like everything is operational with no issues.

2 more replies

rasz4y ago

Not a "status page" then, but merely "a page where Communications Manager post messages on after assessing the situation and consulting/getting permission from management"

2 more replies

vasco4y ago

After you notice I assume you have to declare an incident, get a call going, assess the extent of the issues, get the needed people involved, and then you'd announce on the status page. 13 minutes isn't amazing but it also isn't terrible. Perhaps you have better ways of keeping status pages updated much faster while also not ending up ramping up the posting of false positives.

aejnsn4y ago

13 minutes is pretty solid compared to anything of recent AWS outages.

pixl974y ago

It doesn't matter if each individual detects the outage because they'll start blame at the local source and move further up the tree rather than assign blame to a full system failure right off the bat. 99.9% of the time it's going to be a local failure affecting the individual.

Also, most alerting systems like check multiple times before declaring a public outage, many times 2 to 3 failures some seconds apart are needed.

voidfunc4y ago

To add onto this, my experience is you never want a fully automated status page for another two reasons:

1. External engineers will start to automate recovery/mitigation processes around your status page if it has real time status.

2. You now need to bug test your status page thoroughly because of #1. It basically becomes an actual API.

1 more reply

totaldude874y ago

and the status page is all green (sigh)- https://status.gitlab.com/ where as downdetector definitely shows that there are issues - https://downdetector.com/status/gitlab/

I guess, the status pages should now have a button to get data from public.. crowd sourced status page?

dnsmichi4y ago

GitLab team member here - sorry for the delay, SREs are investigating.

https://status.gitlab.com/ is updated. Edit: https://status.gitlab.com/pages/incident/5b36dc6502d06804c08...

m4lvin4y ago

It is updated by now - some seconds delay is fine I think and if they would not cache the status page it might go down in a blink now too ;-)

teekert4y ago

It's there now, added seconds ago.

oriettaxx4y ago

yes, you're right.

I was just working on gitlab, so I would say the status page reflected the issue about 5 minutes later

oriettaxx4y ago

mmhhh, make it 10

routeroff4y ago

overleaf.com is also down, https://status.overleaf.com/

Maybe some common severs ?

tovej4y ago

Definitely related, overleaf probably depends on gitlab. The overleaf outage ended right when the gitlab outage did.

hobo_mark4y ago

It's interesting that different pieces of gitlab.com appear to be running on a hodge-podge of GCP, DO, AWS and AZ... I wonder why that would be the case?

karmakaze4y ago

This could make good sense if they want to provide service where the customers use it.

temptemptemp1114y ago

But but CLOUD NATIVE! https://about.gitlab.com/cloud-native/

Traubenfuchs4y ago

Maybe they fell for the polynimbus meme.

simon044y ago

https://status.gitlab.com/pages/incident/5b36dc6502d06804c08... – January 31, 2022 15:22 UTC – System Wide Outage

markdog124y ago

Prob just a coincidence, but our Memorystore (hosted redis) instance went down with a "repairing" status around the same time.

rvz4y ago

For SaSS, it is down. But not if you are self-hosting your own.

Just look at Gnome: [0]. They are doing it right.

[0] https://git.gnome.org

iamcreasy4y ago

Is gitlab.gnome.org/GNOME set to forward to git.gnome.org?

teddyh4y ago

And this is why you self-host on your own instance.

analogsalad4y ago

Indeed, I can't remember a single time where a self-hosted server crashed. They run for decades with 0 downtime.

rvz4y ago

Exactly. That is the whole point. I keep telling that for GitHub since that goes down once a month. [0][1] GitLab SaSS is the same but a self-hosted backup is better.

[0] https://news.ycombinator.com/item?id=29901564

[1] https://news.ycombinator.com/item?id=29379648

miltonlaxer4y ago

FYI GitHub Enterprise can be self-hosted https://docs.github.com/en/get-started/learning-about-github...

rovr1384y ago

Everyone should have backups and these things aren't infallible.

Gitlab is a perfect example. They had database issues and had to restore from backups already.

sdoering4y ago

Not sure if this is irony (I often don't identify irony as such).

But I fatfingered a lot of self hosted stuff in my time.

analogsalad4y ago

It's full-blown sarcasm. Sorry for omitting the /s.

1 more reply

manquer4y ago

It doesn't, but I can fix it as opposed to waiting for their team to do it.

Also at gitlab.com scale the problems they face are very different from a typical deployment.

It is like having maintaining your car and using the train.

On average if you can fix your car (or hire a good mechanic i.e. consulting) you would probably have a better experience than public transport breaking down, that you are powerless to do anything about.

I would rather run a business depending on my car than the train ?

analogsalad4y ago

As a customer of Gitlab, I'm satisfied with their uptime and I have no reason to believe that they can't fix these issues in good time.

Yes, I can also fix it if the server was my mine but more than likely I'll be busy doing my actual job (which does not involve fiddling with self-hosted gitlab instances) so I'll take my chances with the Gitlab engineering team. They do fix things and me being busy, asleep, sick, or travelling have no impact on their response. I intend to keep it this way.

1 more reply

sofixa4y ago

> On average if you can fix your car (or hire a good mechanic i.e. consulting) you would probably have a better experience than public transport breaking down, that you are powerless to do anything about

Spoken as someone who has never taken a train i suppose? Transit at scale can handle maintenance much better than a single vehicle and/or mechanic, and they do so proactively and on schedules. And when things get really bad ( catastrophic failure of some component you can't just "fix" on the spot), public transit will organise a backup ( a new train or a bunch of buses) to get you to your destination.

1 more reply

dengolius4y ago

And even if it goes down you might have more options to get it back to work.

rovr1384y ago

I mean.. you can still use your git repos.

Need to do a launch? Build it and push it.

Need to share a change with someone so they can review?, `git diff` and send a patch via email. Want to use a server? Spin up a server, add users and keys and push up to it.

Gitlab, GitHub and these hosted solutions haven't always existed. They're convenient, but not a OMGWTF moment... unless of course you don't have backups.

karmakaze4y ago

They could, if you stuck to the yak shaving full-time.

mschuster914y ago

What? Running your own Gitlab instance is one docker command away. No need to shave any yaks.

1 more reply

oefrha4y ago

Well, my GitLab instance at some point started to have its Prometheus eat 100% CPU all the time until I disabled the Prometheus component altogether, so there’s that. A cursory glance at the tracker just now says the issue is still open. That’s the kind of problems you get for self-hosting, it’s not all rainbows and unicorns.

dnsmichi4y ago

Hi, Developer Evangelist at GitLab here.

Can you link the issue please? :)

For context, Prometheus and observability will be handled with Opstrace in the future [0]. I'd like to learn about your use-case and see which troubles you have been running into. Thanks!

[0] https://opstrace.com/blog/gitlab

1 more reply

hknapp4y ago

Seeing the same error

oriettaxx4y ago

yes, while the status page https://status.gitlab.com says everything is fine :(

grrr... I am stuck with my job now .... :(

qayxc4y ago

Must be caching issue - it shows "System Wide Outage" for me.

j / k navigate · click thread line to collapse

59 comments

john_cogs4y ago

GitLab team member here. We're aware of the incident and the status page has been updated. We will provide further updates on the status page as they become available.

(Edited now that the status page has been updated).

totaldude874y ago

Awesome, thank you! Godspeed!

hackerlytest4y ago

Thanks. Seems to be back for me.

Dayshine4y ago

It took (by my measure) 13 minutes for a full outage to be represented on the status page.

I was under the impression that gitlab use gitlab.com for their work. Surely someone would have noticed within seconds that it was down?

Why have the misleading "updated a few seconds" ago text if it doesn't update on complete failure? :)

john_cogs4y ago

Your impression is correct. We use GitLab.com and notice these incidents as they happen.

I'm not sure how the "updated a few seconds ago" messages are generated but I'll try to find out once the incident has been resolved.

0 - https://about.gitlab.com/handbook/engineering/infrastructure...

linuxdeveloper4y ago

Why is "Active Incident" and "System Wide Outage" on the status page with a background color of green? Why not red?

At first glance it looks like everything is operational with no issues.

2 more replies

rasz4y ago

Not a "status page" then, but merely "a page where Communications Manager post messages on after assessing the situation and consulting/getting permission from management"

2 more replies

vasco4y ago

aejnsn4y ago

13 minutes is pretty solid compared to anything of recent AWS outages.

pixl974y ago

Also, most alerting systems like check multiple times before declaring a public outage, many times 2 to 3 failures some seconds apart are needed.

voidfunc4y ago

To add onto this, my experience is you never want a fully automated status page for another two reasons:

1. External engineers will start to automate recovery/mitigation processes around your status page if it has real time status.

2. You now need to bug test your status page thoroughly because of #1. It basically becomes an actual API.

1 more reply

totaldude874y ago

and the status page is all green (sigh)- https://status.gitlab.com/ where as downdetector definitely shows that there are issues - https://downdetector.com/status/gitlab/

I guess, the status pages should now have a button to get data from public.. crowd sourced status page?

dnsmichi4y ago

GitLab team member here - sorry for the delay, SREs are investigating.

https://status.gitlab.com/ is updated. Edit: https://status.gitlab.com/pages/incident/5b36dc6502d06804c08...

m4lvin4y ago

It is updated by now - some seconds delay is fine I think and if they would not cache the status page it might go down in a blink now too ;-)

teekert4y ago

It's there now, added seconds ago.

oriettaxx4y ago

yes, you're right.

I was just working on gitlab, so I would say the status page reflected the issue about 5 minutes later

oriettaxx4y ago

mmhhh, make it 10

routeroff4y ago

overleaf.com is also down, https://status.overleaf.com/

Maybe some common severs ?

tovej4y ago

Definitely related, overleaf probably depends on gitlab. The overleaf outage ended right when the gitlab outage did.

hobo_mark4y ago

It's interesting that different pieces of gitlab.com appear to be running on a hodge-podge of GCP, DO, AWS and AZ... I wonder why that would be the case?

karmakaze4y ago

This could make good sense if they want to provide service where the customers use it.

temptemptemp1114y ago

But but CLOUD NATIVE! https://about.gitlab.com/cloud-native/

Traubenfuchs4y ago

Maybe they fell for the polynimbus meme.

simon044y ago

https://status.gitlab.com/pages/incident/5b36dc6502d06804c08... – January 31, 2022 15:22 UTC – System Wide Outage

markdog124y ago

Prob just a coincidence, but our Memorystore (hosted redis) instance went down with a "repairing" status around the same time.

rvz4y ago

For SaSS, it is down. But not if you are self-hosting your own.

Just look at Gnome: [0]. They are doing it right.

[0] https://git.gnome.org

iamcreasy4y ago

Is gitlab.gnome.org/GNOME set to forward to git.gnome.org?

teddyh4y ago

And this is why you self-host on your own instance.

analogsalad4y ago

Indeed, I can't remember a single time where a self-hosted server crashed. They run for decades with 0 downtime.

rvz4y ago

Exactly. That is the whole point. I keep telling that for GitHub since that goes down once a month. [0][1] GitLab SaSS is the same but a self-hosted backup is better.

[0] https://news.ycombinator.com/item?id=29901564

[1] https://news.ycombinator.com/item?id=29379648

miltonlaxer4y ago

FYI GitHub Enterprise can be self-hosted https://docs.github.com/en/get-started/learning-about-github...

rovr1384y ago

Everyone should have backups and these things aren't infallible.

Gitlab is a perfect example. They had database issues and had to restore from backups already.

sdoering4y ago

Not sure if this is irony (I often don't identify irony as such).

But I fatfingered a lot of self hosted stuff in my time.

analogsalad4y ago

It's full-blown sarcasm. Sorry for omitting the /s.

1 more reply

manquer4y ago

It doesn't, but I can fix it as opposed to waiting for their team to do it.

Also at gitlab.com scale the problems they face are very different from a typical deployment.

It is like having maintaining your car and using the train.

I would rather run a business depending on my car than the train ?

analogsalad4y ago

As a customer of Gitlab, I'm satisfied with their uptime and I have no reason to believe that they can't fix these issues in good time.

1 more reply

sofixa4y ago

1 more reply

dengolius4y ago

And even if it goes down you might have more options to get it back to work.

rovr1384y ago

I mean.. you can still use your git repos.

Need to do a launch? Build it and push it.

Need to share a change with someone so they can review?, `git diff` and send a patch via email. Want to use a server? Spin up a server, add users and keys and push up to it.

Gitlab, GitHub and these hosted solutions haven't always existed. They're convenient, but not a OMGWTF moment... unless of course you don't have backups.

karmakaze4y ago

They could, if you stuck to the yak shaving full-time.

mschuster914y ago

What? Running your own Gitlab instance is one docker command away. No need to shave any yaks.

1 more reply

oefrha4y ago

dnsmichi4y ago

Hi, Developer Evangelist at GitLab here.

Can you link the issue please? :)

For context, Prometheus and observability will be handled with Opstrace in the future [0]. I'd like to learn about your use-case and see which troubles you have been running into. Thanks!

[0] https://opstrace.com/blog/gitlab

1 more reply

hknapp4y ago

Seeing the same error

oriettaxx4y ago

yes, while the status page https://status.gitlab.com says everything is fine :(

grrr... I am stuck with my job now .... :(

qayxc4y ago

Must be caching issue - it shows "System Wide Outage" for me.

j / k navigate · click thread line to collapse