- youtube returning error
- gmail returning 502
- docs returning 500
- drive not working
status page now reflecting outage: https://www.google.com/appsstatus
--------------
services look to be restored.
Edit: Services were down from ~12:55pm to ~1:52pm, it's 57minutes. Thanks hiby007
[1] https://workspace.google.com/intl/en/terms/sla.html
[2] https://en.wikipedia.org/wiki/High_availability#Percentage_c...
And I have never seen them load so fast before - gmail progress bar barely seen for a fraction of a second whereas I am more used to seeing it for multiple seconds (2-3 sec) until it loads.
I observe the same anecdotal speedup for other sites... drive, youtube, calendar. I wonder if they are throwing all the hardware they have at their services or I am encountering underutilized servers since it is not fixed for everyone.
It is nice to experience (even if it is short lived) the snappiness of Google services if they weren't so multi-tenented.
Of course it will, - at least, it better - but what if it doesn't? And if it does, are you going to take countermeasures in case it happens again or is it just going to be 'back to normal' again?
/s - for now ;)
I already imagined the only solution now was to write a medium post and hope it gets some traction on hackernews and google support steps in. Thinking to myself I was an idiot for knowing all this and still thinking it wouldn't happen to me.
And even though it turns out to be an outage, it gave me a bad enough feeling to start using a domain name I own for my email.
all green, which does not reflect reality for me (e.g. Gmail is down)
edit: shows how incredibly difficult introspection is
Did Maria Christensen make a mistake when adding "return true;"?
(This is a joke referencing Tom Scott's 2014 parable[1] about the danger of designing a system with a single point of failure. Tom's tells the fictional tale of a high level employee at Google adding "return true;" to the global "handleLogin()" function.)
[1] https://www.youtube.com/watch?v=y4GB_NDU43Q (you might need to open this link in a private window...)
Give people a bonus when things didn’t break, not only when there is a superhero that fixes broken things. Then you’re rewarding fragile systems that need superheroes.
Edit: As I comment it looks like things are coming back! Timing or what...
"Gmail is temporarily unable to access your Contacts. You may experience issues while this persists."
Disclaimer - Googler here whose workplace chat is not working.
Don't forget to #HugOps all the people who've been woken up on a monday morning with this! Hope this gets resolved soon :)
(delivery temporarily suspended: lost connection with aspmx.l.google.com[2a00:1450:400c:c04::1b] while sending DATA command)
Then Google said my accounts did NOT exist which made me feel very uneasy. Banned? Lost all my data? OMG.
Then I got error "502 That's all we know" after entering my passwords.
Finally I realized it couldn't be just me, so I opened HN and confirmed my suspicions.
It's all up and running now but I was really scared.
I'm really curious what happened because it surely looks like Google has a single point of failure in their authentication mechanism which is just horribly wrong. The company has over two billion users - a situation/configuration like this just shouldn't be even theoretically possible.
news.google.com returning 500 finance.google.com returning 502
"The outage started shortly before noon UK time, with Google sites returning server errors when visited.
Users around the world reported problems with Gmail, Google Drive, the Android Play Store, Maps, and more.
...
Despite the widespread outage, Google's service dashboard for its services reported no errors."
Cheers to all the small SaaS businesses out there keeping their services up and running without much of a hick-up all year round.
EDIT: You can permanently refresh and see the score increase by 2-3 points each second. Wow.
EDIT 2: 1800 after one hour, but a few points were probably lost due to HN downtime. Being quick to post is important, guys;-)
I'm pretty sure there will be some internal conferences at Google after this to make sure infrastructure problems can't propagate across the entire company and world at this rate even in the event of a sysop fatfingering a console...
Ah btw HN is being crushed by the load lol.
https://news.ycombinator.com/item?id=20132520 Maybe this beast kicked the bucket.
And now because of that it seems HN is also hurting.
I'm sorry internet.
Cheers all you selfhosted-FOSS-alternatives! Time to bump up those Patreon contributions...
Has anyone on a "platinum"/"enterprise" google workspace plan received any relevant communication about what's happening and ETA on uptime?
Launched YouTube app on the Roku and was prompted to sign in. Opened browser on PC, entered "activation code" from YouTube app, after entering my Google account username on the login page, it presented me with the reassuring "Couldn't find your Google Account" message.
Tried logging in to Gmail directly with the same effect.
Thanks to the Twitters, I realized that Google hadn't canceled me, specifically (apparently they decided to temporarily cancel everyone!).
FWIW, I typically get directed to their Chicago datacenter(s).
(Note: 7 A.M. on the U.S. east coast on a Monday morning; at least they have impeccable timing!)
This isn't true. Which make it more complicated as we try to figure out what systems are down -- since their status board is usually faked.
gcloud compute instance-groups list-instances [removed] --zone=us-central1-c | awk '{print $1}' | grep -v NAMEERROR: (gcloud.compute.instance-groups.list-instances) Some requests did not succeed: - Internal Error
I'm sure its stressful right now. But someday, these engineers will look back and retell the stories about how it happened and the lessons they learned. Hopefully with a laugh.
Running: timeout 60 gcloud compute instance-groups list-instances [removed] --zone=us-central1-c | awk '{print $1}' | grep -v NAMEERROR: (gcloud.compute.instance-groups.list-instances) Some requests did not succeed:
- Internal Error
a bot is flagging ips as abusive. it should clear up later.
Was deploying some test applications and kubectl started complaining about gcloud auth helper throwing not zero errors. Trying to launch cloud shell from the website and nothing happened.
The web application which is a site that does not rely on any external API is running fine.
Btw, that's funny you listed all the google services exactly in the order I found them unavailable: first youtube, then gmail, then docs
EDIT: Looks like they logged everyone out while they fix the auth issue, presumably so people can use YouTube and other stuff that doesn't necessarily need login?
South Africa
Crisis averted, was not locked out. Gotta take out all the data though when it gets back up.
After that I realised how frequently you can get down time
They NEVER show red, or oddly even go down themselves during outages. ( Last time amazon went down springs to mind).
Not because I use it as a live failover for Google services, but because these outages are good reminders of the mortality of online services.
I was so scared for a while, then I found that all of my family members have the same issue. Then checked HN and breathed a sigh of relief :)
:D
Did some asshole forget to renew a certificate or run the cron job to renew?
New norm, let's have status page, but make it green all the time :-). Surprisingly consistent practice.
But I have it configured with some of the recommendations from Privacytools.io
* DuckDuckGo instead of Google Search
* Bitchute / LBRY / Odyssee instead of YouTube
*Nextcloud instead of Google Drive
World economic output is - $150mn/minute
Current world population is 7.8bn.
Google has 4.8bn users. That's approx 61% of the population.
Let's assume about 50% of the users were impacted = 2.4bn (which is 30% of the population).
So, the loss could have been about $50mn/minute.
This is without taking SLA into consideration. There will be losses incurred on that too, wouldn't it?
Or is it ahem airgapped?
Youtube & Gmail are down for me.
Something something someone else's computer.
> Did they try to fix them by inverting a binary tree?
>> Yeah maybe implementing a quick LRU cache on the nearest whiteboard will help them out here
>> Did they try checking what shape their manhole cover is?
>> Dev ops was too busy out counting all the street lights in the United States
[1] https://www.reddit.com/r/programming/comments/kcwqij/every_s...
;)