https://status.cloud.google.com/incident/compute/19003
Status page reports all green, however the outage is affecting YouTube, Snapchat, and thousands of other users.
We're having what appears to be a serious networking outage. It's disrupting everything, including unfortunately the tooling we usually use to communicate across the company about outages.
There are backup plans, of course, but I wanted to at least come here to say: you're not crazy, nothing is lost (to those concerns downthread), but there is serious packet loss at the least. You'll have to wait for someone actually involved in the incident to say more.
There's some irony in that.
So memegen is down?
Can confirm with Gmail in Europe. Everything works but it's sluggish (i.e. no immediate reaction on button clicks).
Shouldn't that outage system be aware when service heartbeats stop?
Could this be a solar flare?
AWS, on the other hand, has given us very few problems. When we do have an issue with an AWS service, we're able to quickly get an engineer on the phone who, thus far, has been able to explain exactly what our issue is and how to fix it.
GCP has quarterly-ish global blackouts, and generally on the data plane at that which makes them significantly more severe.
I'm overall happy with it, but if I needed to run a service with a 99.95% uptime SLA or higher, I wouldn't rely solely on GCP.
AWS refunded me in the first reply on the same day!
GCP sales rep just copy pasted a link to a self support survey that essentially told me, after a series of YES or NO questions that they can't refund me.
So why not just tell your customers like it is? Google Cloud is super strict when it comes to billing. I have called my bank to do a chargeback and put a hold on all future billing with GCP.
I'm now back to AWS and still on a Free Tier. Apparently the $300 Trial with Google Cloud did not include some critical products, AWS Free tier makes it super clear and even still I sometimes leave something running on and discover it in my invoice....
I've yet to receive a reply from Google and its been a week now.
I do appreciate other products such as Firebase but honestly for infrastructure and for future integration with enterprise customers I feel AWS is more appropriate and mature.
I think it's weird to say you get credit in dollars and then not be able to spend it on everything. That's not how money works. But that's the way hosting providers work and afaik it's quite well known. Especially with a large sum of "free money", even if it's not well known, it was on you to check any small print.
>I have called my bank to do a chargeback
You're issuing a chargeback because you made a mistake and spent someone else's resources? And you're admitting to this on HN? I'm not a lawyer, but that sounds like fraud and / or theft to me.
The infinite money spout that is Google Ads has created a situation in which devs are at Google just to have fun - there really is no incentive to maintain anything because the money will flow regardless of quality.
Source: I interned at Google.
I don’t miss being on pager duty one bit. I see it looming in my headlights, sadly.
... but not for everybody now.
Nothing you or I or the pager can do will speed that up.
I am aware some bosses won't believe that and I am not trying to make light of it. But there really isn't much else to do except wait.
From that linked page:
"Customer Must Request Financial Credit
In order to receive any of the Financial Credits described above, Customer must notify Google technical support within thirty days from the time Customer becomes eligible to receive a Financial Credit. Customer must also provide Google with server log files showing loss of external connectivity errors and the date and time those errors occurred. If Customer does not comply with these requirements, Customer will forfeit its right to receive a Financial Credit. If a dispute arises with respect to this SLA, Google will make a determination in good faith based on its system logs, monitoring reports, configuration records, and other available information, which Google will make available for auditing by Customer at Customer’s request."
I would pay a premium for a cloud provider happy to give 100 percent discount for the month for 10 minutes downtime, and 100 percent discount for the year for an hour's downtime.
It's always interesting to see these outages at large cloud providers spider out across the rest of the internet, a lot of the world depends on Google to stay up.
Yup, I'm trying to check the Associated Press News right now and it's having trouble connecting to "storage.googleapis.com".
Why are they operating one with a different networking infrastructure from the other?
Since original Google infrastructure was developed specifically for first kind of services, cloud org still has problems adopting it to its needs.
I hope they come back. This is still pretty scary
Incident #19008 began at 2019-06-02 12:48. https://status.cloud.google.com/incident/cloud-networking/19...
Incident #19009 began at 2019-06-02 12:53. https://status.cloud.google.com/incident/cloud-networking/19...
Times are US/Pacific
Better than the monthly outage from Azure.
The networking incident looks like the one to follow for updates now.
The interesting thing is that a couple of minutes before everything went wrong, kubectl returned a "error: You must be logged in to the server (Unauthorized)" error
According to https://twitter.com/bgp4_table, we have just exceeded 768k Border Gateway Protocol routing entries, which may be causing some routers to malfunction.
[21:55:19] POP< +OK send PASS
[21:55:19] POP> PASS ********
[21:55:21] POP< +OK Welcome.
[21:55:21] POP> STAT
[21:55:21] POP< -ERR [SYS/TEMP] Temporary system problem.
Please try again later.It's amazing how far-reaching outages can be these days.
This is a networking issue, and your data is safe. Cloud SQL stores instance metadata regionally, so it shares a failure domain with the data it describes. When the region is down or inaccessible, instances are missing from the list results, but that doesn't say anything about the instance availability from within region.
is what I’ve heard so far. east seems to be OK, and Europe too
I can see my GKE clusters in one region but not in another, so I am guessing it's the former.
Looks like we'll need a cluster in each region going forward...
I'm not seeing anything at 12:47.
Next update is in about 25 minutes.
404 - Impressive
Cloud services live and die by their reputation, so I'd be shocked if Google ever tried to get out of following an SLA contract based on a technicality like that. It would be business suicide, so it doesn't seem like something to be too worried about?
Edit: just got one email from the downtime, so perhaps my initial conclusion was incorrect
WARNING: The following zones did not respond: us-west2, us-west2-a, southamerica-east1-c, us-west2-b, southamerica-east1, us-east4-b, us-east4, us-east4-a, northamerica-northeast1-c, northamerica-northeast1-b, us-west2-c, southamerica-east1-b, northamerica-northeast1, southamerica-east1-a, northamerica-northeast1-a, us-east4-c. List results may be incomplete.
Luckily for us eu-west1 seems to be working normally.
Pretty much every service is down
Systems that fail 'open'...
So I searched for "gmail down" on bing and I got some results [1]. But searching on Google for "gmail down" does not return any results [2].
[1] https://www.bing.com/news/search?q=gmail+down&qs=n&form=QBNT
[2] https://www.google.com/search?q=gmail+down&source=lnms&tbm=n...
There appears to be some irregularities on consumer services as well that are of course certainly related, youtube was behaving a bit oddly for me.
The impact seems to be cascading down from just GCE to other services as well - that status page certainly does not reflect the reality of the situation. You can't even sign into GCP right now, and things that run on GCE, like appengine seem impacted.
One thing with gmail though. When it's down it's similar to a snow storm if you only do business in a city. Everyone is impacted and everyone understands a missed deadline is unavoidable.
[1] For those not old enough to know what I mean read this: https://www.ibm.com/ibm/history/ibm100/us/en/icons/personalc...
>We will provide an update at 16:00 US/Pacific.
it's 16:22 and no updates were posted. a bit unprofessional..
It might be something security related if it triggers a mandatory identity confirmation.
edit: I tried to send me a mail from another account and it worked but out of 4 or 5 mail checks at least two failed giving the same error.
[23:44:27] POP< -ERR [SYS/TEMP] Temporary system problem. Please try again later.
The problem seems much more complex.
I wanted to upload a video of the project to YouTube and add a link to it in the report. YouTube takes a long time to process the video, and then says it's unavailable.
I go to Vimeo: it's down.
I upload the video to Dropbox, and copy its link to the report.
But my report was a Google doc. And when I tried to export it as PDF (which I had not done yet) it couldn't do it. I never hated google more.
Eventually the video went through to YouTube, and I could export the PDF on the third try, but this really made me conscious of my dependance on Google.
> Error: Download failed: server returned code 502. URL: https://storage.googleapis.com/chromium-browser-snapshots/Li...
The cloud components may be directly affected but for consumers, there's nothing which will provide info on what consumer facing services are getting some issues.
The status page says GCS is fine but that's highly unlikely.
Scary stuff. What happens when Murphy's law decides to crash things even more?
> We are investigating an issue with Google Compute Engine. We will provide more information by Sunday, 2019-06-02 12:45 US/Pacific.
The next update is at 12:59. Just ... no.
GCE, GKE, BQ, Pub/Sub, GAE
asia-south1 us-west1 us-central1 us-west2