Tell HN: AWS appears to be down again

879 pointsthadjo4y ago468 comments

Anyone else seeing this?

468 comments

207 comments · 108 top-level

rexreed4y ago· 9 in thread

It's not just AWS - check the down reports:https://downdetector.com/

Cloudflare having some significant issues as well on certain domains.

It's possible people are reporting the issue as CloudFlare because that's whose error page they see when a box on EC2 is unreachable.

1 more reply

jgrahamc4y ago

No, we are not. But customers who use AWS are having trouble.

1 more reply

nerdjon4y ago

The list of affected services is a bit all over the place, especially since I highly doubt Xbox Live or Halo is running on AWS.

3 more replies

the_pwner2244y ago

HN was also (briefly) down around that same time (roughly 1 hour ago from now).

PragmaticPulp4y ago

DownDetector is showing everything down during that period, including Google.

I suspect DownDetector itself suffered some outages during this period, which it shows as outages of every service it monitors.

1 more reply

buryat4y ago

downdetector.com uses users complaints so it’s unreliable as people can blame anything

ren_engineer4y ago

some sort of widescale attack would be the only explanation right?

forgotpwd164y ago

This looks weird. At the same time all those services had a spike in outage reports.

chasd004y ago

can confirm i have multiple salesforce instances down.

QuiiBz4y ago· 8 in thread

I tried to monitor services status using https://stop.lying.cloud, but they are also hosted to AWS, and down too.

civilized4y ago

If they're monitoring AWS downtime they might want to rethink this.

1 more reply

synergy204y ago

AWS should monitor itself from Azure or GCP, even DO or Linode makes more sense.

Eat your own dog food shows confidence, but monitoring it is a different dimension, you need use anything but your own dog food there.

3 more replies

QuinnyPig4y ago

Yeah, I homed https://stop.lying.cloud out of us-west-2. Oops.

2 more replies

saganus4y ago

How does this service work?

It seems to have all the look and feel of AWS, and somehow has more up to date info than the official AWS status page?

1 more reply

tinco4y ago

Now that they're back up they're not reporting any problems, how is it supposed to work? It looks like it is just repeating the status reported on the Amazon status page.

1 more reply

taormina4y ago

I mean, sounds like it's working as intended then?

mishftw4y ago

Funny I didn't know that and assumed it was okay

moneywoes4y ago

That’s hilarious

tomlagier4y ago· 7 in thread

I wonder if AWS will make more or less money from these outages?

Will large players flee because of excessive instability? Or will smaller players go from single-AZ to more expensive multi-AZ?

My guess is that no-one will leave and lots of single-AZ tenants who should be multi-AZ will use this as the impetus to do it.

Honestly, having events like this is probably good for the overall resilience of distributed systems. It's like an immune system, you don't usually fail in the same way repeatedly.

andy_ppp4y ago

* Free chaos monkey installed in every AZ

1 more reply

kenhwang4y ago

If my company is any indication, they're going to make more money since everyone will simply check the multi-AZ or multi-region checkboxes they didn't before and throw more money at the problem instead of doing proper resiliency engineering themselves.

1 more reply

jorblumesea4y ago

No one just "moves off" AWS. Once your apps are spaghetti coded with lambdas, buckets and all sorts of stuff, it's basically impossible to get off. More than likely, as you noticed, it will increase spending since multi-AZ/multi-region will become the norm.

s_dev4y ago

>I wonder if AWS will make more or less money from these outages?

There is no possibility that outages are good for AWS. Nor is there more money to be made from "publicity" of the outages.

2 more replies

urthor4y ago

The actual answer?

In the next 5 calendar years the bottom line will still grow.

However, the brand damage means they permananently lose market share. Which impacts their growth ceiling.

gjvr4y ago

I would not go multiple Availability Zone within the same Infra/Cloud provider...

ransom15384y ago

"Or will smaller players go from single-AZ to more expensive multi-AZ"

Yes! When you have a service interruption pay 2x more! With a region down I am sure other regions wont have any interruptions either! /s

earthboundkid4y ago· 7 in thread

HOST THE GODDAMN STATUS PAGE ON AZURE FOR FUCKS SAKE.

There is zero excuse for this shit. Be professional. Acknowledge reality. It is logically impossible to run your own status page. Trying to do so just wastes everyone else on the internet's time when you have an outage.

jeroenhd4y ago

They should host their status page on IPFS instead. If you're never going to change the contents of your status page, you might as well put it into immutable storage!

Aldipower4y ago

If the status page is down, you know the system is down. Mission accomplished. Go ahead.

bnt4y ago

Isn't it on S3 or something? And a few years ago we had that whole S3 is down situation and the status page was also down? xD

1 more reply

r3trohack3r4y ago

I don’t understand, are folks looking at a different status page than me?

This morning we saw some weird behavior in us-west-2, our traffic just _vanished_. I thought: there is no way this is us.

Went to https://status.aws.amazon.com/

Top of the board showed “Internet Connectivity Issues (Oregon)”

And that was that. The board worked exactly as it should - it immediately explained my missing traffic and kept me up-to-date with the status of the outage on their side.

tommek40774y ago

They should automatically update as well. Currently it is a static "all green" page and might be manually changed if a managet would give his go. Insane.

aaronharnly4y ago

Seriously.

boopboopbadoop4y ago

You don’t even know what the problem is yet. Stop shouting solutions.

3 more replies

dijit4y ago· 6 in thread

Is it AWS or could it be an ISP?

AWS seems to be working for me, but I’ve worked with clients in the US and spectrum internet tended to drop connections to us sporadically, which looks like an outage to our clients but is something we obviously can’t control.

tw044y ago

If it's a network issue, it's on their side. I've verified from centurylink, comcast, cogent, he.net, at&t, and verizon - all of them are having issues. This isn't like: Cox is having an outage and just can't get to AWS.

ukyrgf4y ago

I have an outage way over in the southeast, looks to be affecting the major monopoly ISP. Can't get a tech to our data center until 2PM.

banana_giraffe4y ago

Things were working during the event, but connectivity was pretty messed up

https://imgur.com/a/VsrS0JZ

(This is two similarly spec'd boxes on us-east-2 and us-west-2). Looking at GeoIP of connecting clients, the only pattern I can see is the region itself.

adriand4y ago

I'm wondering the same thing. We have stuff hosted in us-west-2 and multiple people across the US are reporting that our systems are down, however our system is working fine for me here, which is near Toronto.

1 more reply

albatross134y ago

Currently we're seeing 40kms response times from CloudFront distributions, we can't hit PagerDuty (probably runs on AWS), etc.

I guess it could be an ISP thing but I guess we're all assuming 80/20.

1 more reply

treis4y ago

It was an AWS networking issue 90%+ packet loss pinging to Google & Facebook.

jrs2354y ago· 5 in thread

I checked their health status page. All is good. /s

https://downdetector.com/status/aws-amazon-web-services/

tyingq4y ago

They did add an update, faster than last time:

"7:42 AM PST We are investigating Internet connectivity issues to the US-WEST-2 Region."

https://status.aws.amazon.com/

Edit: They added US-WEST-1:

"7:52 AM PST We are investigating Internet connectivity issues to the US-WEST-1 Region."

Edit: Found root case, maybe?

"8:01 AM PST We have identified the root cause of the Internet connectivity to the US-WEST-1 Region and have taken steps to restore connectivity. We have seen some improvement to Internet connectivity in the last few minutes but continue to work towards full recovery."

"8:01 AM PST We have identified the root cause of the Internet connectivity to the US-WEST-2 Region and have taken steps to restore connectivity. We have seen some improvement to Internet connectivity in the last few minutes but continue to work towards full recovery."

7 more replies

sgt4y ago

Ok, so it can't be down then. This is proof!

ornornor4y ago

Yep, when it loads, it's all green. "nine nines!!!"

2 more replies

Cort3z4y ago

Down detector is just a statistical page, it does not actually detect downtime, and is in no way aws's status page.

drcongo4y ago

What does downdetector run on?

1 more reply

TekMol4y ago· 5 in thread

It is surprising that their status page is down too:

https://status.aws.amazon.com

Their CDN, CloudFront, always works reliable for me. Couldn't they put the status page on CloudFront?

mhitza4y ago

Takes minutes to update a CloudFront distribution (they say around 5 minutes in their blog post from last year when speed was improved [1]). I think they might want to be able to change it to "everything's back to normal" in an instant, based on the SLA argument I've seen thrown around last time an AWS region was down.

[1] https://aws.amazon.com/blogs/networking-and-content-delivery...

1 more reply

electroly4y ago

The status page is working great for me. Did they make it multi-region after the last failure? I'm on the east coast.

1 more reply

hericium4y ago

Works for me. It's the usual static page with everything green.

1 more reply

drcongo4y ago

Not working for me either in the UK.

clavicat4y ago

Down for me, as well.

iso16314y ago· 4 in thread

Must be a Y in the day.

It amazes me how many projects exist that don't even have multi-region capability, let alone no single point of failure

electroly4y ago

Multi-region is difficult and expensive, and a lot of projects aren't that important. Most of our infrastructure just isn't that vital; we'd rather take the occasional outage than spend the time and money implementing the sort of active-active multi-region infrastructure that a "correct" implementation would use. We took the recent 8 hour us-east-1 outage on the nose and have not reconsidered this plan. It was a calculated risk that we still believe we're on the right side of. Multi-AZ but single-region is a reasonable balance of cost, difficulty, and reliability for us.

1 more reply

staticassertion4y ago

IDK, don't you end up with a bunch of extra costs? Like you're going to literally pay more money because now you have cross region replication charges, and then you're going to pay a latency cost, and then you may end up needing to overprovision your compute, etc.

All to go from, idk, 99.9% uptime to 99.95% (throwing out these numbers)? The thing is when AWS goes down so much of the internet goes down that companies don't really get called out individually.

1 more reply

ahallock4y ago

You're saying that as if it's a walk in the park to set up and not cost prohibitive, in terms of opportunity cost and budget, especially for smaller companies.

1 more reply

Spivak4y ago

This might be a multi-region problem. Auth0 as an example has three US regions and two of them are down.

thadjoOP4y ago· 4 in thread

obligatory comment about status page showing seas of green: https://status.aws.amazon.com

tubignaaso4y ago

The status page appears to be down now as well.

2 more replies

arsome4y ago

For this kind of thing it's usually better to just use a user-driven site like: https://downdetector.ca/status/aws-amazon-web-services/

Some users are clueless, but the clueless users average out over time and the spikes make it clear when there are actual issues.

gregmfoster4y ago

Love to see the manually updated status page not updating

rpadovani4y ago

For me, it's down as well

1 more reply

sam0x174y ago· 3 in thread

They really need to stop requiring SVPs or higher to show non-green status on the status page, as other HNers have revealed in last week's AWS post. It's effectively not a status page, and they could probably be sued if it can be demonstrated that X service was down but the status page showed green (since the SLA is based on status page). Should be automated and based on sample deployments running in every region and every service. And they should use non-AWS instances to do the sampling, so they can actually sample when, say, we experience the obligatory black friday us-east-1 outage every year.

lljk_kennedy4y ago

I think SVP / GM approval is only needed for yellow / red status. From my time in AWS Support, the Support Oncall and Call Leader / GM delegate worked to approve green-i posts.

1 more reply

ceejayoz4y ago

They were much faster than usual about updating the AWS Status page.

3 more replies

vineyardmike4y ago

> we experience the obligatory black friday us-east-1 outage every year.

Is this a thing?

adnauseum4y ago· 3 in thread

Seems like ever since Microsoft bought AWS, it's been going down an awful lot.

endisneigh4y ago

> Seems like ever since Microsoft bought AWS, it's been going down an awful lot.

What?

1 more reply

masterof04y ago

Didn't know Tim Dillon is hanging on here in HN.

exdsq4y ago

Haha wtf?

dhruvarora0134y ago· 3 in thread

Looks like its taken down SendGrid, NPM, Twitch, Auth0 so far

hericium4y ago

PlayStation Network went down at the same time.

cyral4y ago

Stripe as well

ents4y ago

Notion as well

yawnxyz4y ago· 3 in thread

Vercel is down too.

My sites run on Cloudflare and Vercel, and I can't even log in to those right now.

I'm curious — what does Hacker News run on? It seems impervious to any kind of downtime...

qeternity4y ago

> I'm curious — what does Hacker News run on? It seems impervious to any kind of downtime...

On a dirty, disgusting dedicated server.

1 more reply

ceejayoz4y ago

HN definitely gets overloaded at times, including during big outages when everyone stampedes here. I got a bunch of "sorry, we can't serve your request" a little while back.

Pobody's nerfect.

1 more reply

hn_throwaway_694y ago

DNS A record suggests a dedicated server from this company:

https://www.m5hosting.com/

cebert4y ago· 2 in thread

This outage is extremely frustrating to me. My company hosts all our apps in gov cloud. Gov Cloud West 1 is also down, but the AWS Gov Cloud status page indicates that everything is healthy and green. I thought AWS's incident response to the East outage last week was that they'd update the status page to better reflect reality.

Gov Cloud Status Page: https://status.aws.amazon.com/govcloud

texasviking4y ago

We are in the same boat. Finally updated "We are investigating Internet connectivity issues to the US-GOV-WEST-1 Region"

chasd004y ago

i had multiple govcloud hosted salesforce instances down but they appear to be coming back up now.

nic_wilson4y ago· 2 in thread

We are seeing issues with requests to Auth0, which I believe is hosted on AWS and has historically gone down when AWS has had issues

romanhotsiy4y ago

We see issues with Auth0 too. Other AWS services we use seem to be working fine so far (us-east-1)

1 more reply

ramesh314y ago

Auth0 went down for us as well right when AWS did. At least it's not like those two systems run our entire company...

iamricks4y ago· 2 in thread

How much do you guys think these frequent outages will effect their market share in cloud products?

Is this enough of a push for organizations to actually move over their infrastructure to other providers?

ceejayoz4y ago

Not at all.

The other cloud providers have had their own outages.

1 more reply

pm904y ago

It’s prompted discussions of building multi regional services in my org but not multi cloud. They would have to really really really screw up for that to happen… maybe be down for like a week or something.

cblconfederate4y ago· 2 in thread

Reminder that the internet was literally invented to avoid this kind of nuclear attack. But i guess people are herdish animals and prefer to die as a group

throw_m2393394y ago

More like ultimately all these companies buy into a certain form of vendor lock-in and they have no competence or willingness to migrate or even consider the competition. It's starts with "oh I'm just renting a remote virtual server" and in no time it's "Oh, all my stack is tied to AWS proprietary products" because convenience. That's what Amazon wants.

doublepg234y ago

Seems like the Internet level networking is quite robust at this point.

baskethead4y ago· 2 in thread

It sounds like their systems design interviews aren’t rigorous enough.

tyingq4y ago

I'm guessing lots of people fled us-east-1 for us-west-2, after the last outage, and overwhelmed something there.

1 more reply

pm904y ago

At this point they should hire specifically for config management and rollout.

Mostly /s; I wish the aws engineers the best of luck through this.

1 more reply

CoastalCoder4y ago· 2 in thread

Asking as a non-cloud-developer: why would Crunchyroll's recovery [0] lag so much behind AWS's recovery [1]?

[0] https://downdetector.com/status/crunchyroll/

[1] https://downdetector.com/status/aws-amazon-web-services/

spenczar54y ago

I don't know for sure, but this is generally common because caches get cold.

A lot of websites use a cache in front of databases (or template rendering engines, or many other systems). That cache might evict entries based on time - after 5 minutes, the entry is considered invalid.

But that means that if you have no traffic for 10 minutes, the cache completely empties. Then when traffic returns, it all skips the cache and actually triggers a real hit to the backend - which is now overwhelmed with traffic. The cache protects the backend in normal behavior, but now it's not doing its job, so the backend has many more requests than usual.

In the worst case, those requests are enqueued in a big serial sequence... but the ones at the back of the queue may time out. The client may do something like say "it's taken me 5 seconds and I still don't have a response - I'll abort and retry!" and now you have even _more_ traffic to deal with.

So cold caches and retries can conspire to keep a service down for a long time even after the root cause is fixed.

1 more reply

Nexxxeh4y ago

Crunchyroll seems to barely work at the best of times, and when it does, it's still a mess.

All sorts of issues still unresolved for years, including the ridiculously annoying "Finishes playing season English sub, autoplays first season of German dub, which then gets stuck". Still no profiles (nerfing their super-premium offering). Auto-resume points are unreliable, the Android app is hot garbage at dealing with network disruption...

I can only imagine their back-end is mostly Visual Basic running on a single AWS-powered VM.

aaronharnly4y ago· 1 in thread

Everyone who spent the past week migrating from us-east-1 to us-west-2: this joke is on you. :)

DarthNebo4y ago

"US-EAST-1 or bust" being manifested right now.

gitfan864y ago· 1 in thread

I'm so glad that I'm not still the CTO of a startup. I would be getting dozens of e-mails from people without engineering backgrounds asking "Are we multi-cloud", "why didn't you make us multi-cloud"?

necovek4y ago

Well, why didn't you? :)

The response is that this actually works well enough, so the investment required has not pushed anyone to do it (with that meaning building the core infrastructure to make that easy).

waynecochran4y ago· 1 in thread

There was a brief period of time back in the early 90's where I felt I understood how Linux worked -- the kernel, startup scripts, drivers, processors, boot tools, etc... I could actually work on all levels of the system to some degree. Those days are long gone. I am far removed from many details of the systems I use today. I used to do a lot of assembly programming on multiple systems. Today I am not sure how most of the systems works in much detail.

cle4y ago

To an extent, this is one of the goals, to free up engineers to work on higher level things. Whether it meets that goal in some cases is debatable, and it’s certainly not ideal for us engineers who like to get to the bottom of things.

2 more replies

turtlebits4y ago· 1 in thread

That was fun. Badges weren't working (daily checkin required) so the front desk had to manually activate them.

Slack wasn't sending messages and Pagerduty was throwing 500's.

api4y ago

... because you need to contact a server 1000 miles away to issue badges in your building.

This cloud-for-everything-even-local-devices thing is both hilarious and sad.

I wonder if anyone had trouble doing their dishes or laundry today, because I'm sure someone thought dish washers and washing machines needed cloud.

2 more replies

myth_drannon4y ago· 1 in thread

That's the price of PIP culture and burning out your devs. Now noone wants to work at Amazon and they can only hire new grads.

Throwawayaerlei4y ago

I hear they do get people who want to be able to get experience at AWS's scale, there's only a few places for that.

The thing that really gets me is the reports from the last major outage a few days ago about how pervasive lying inside the company is. This really doesn't work well for engineering and we're possibly seeing the results of that. We should certainly expect to see that becoming visible the more time goes on without a major cultural shift. Which given that the guy who ran AWS now runs all of Amazon.com....

ceejayoz4y ago· 1 in thread

We're having troubles in us-west-2.

Discourse is reporting trouble, too. https://twitter.com/DiscourseStatus/status/14711403698992906...

supermathie4y ago

us-west-1 also seems offline, but us-east-1 (ironically) seems fine

wirelesspotat4y ago· 1 in thread

AWS status page shows an update:

> AWS Internet Connectivity (Oregon): 7:42 AM PST We are investigating Internet connectivity issues to the US-WEST-2 Region.

Source: https://status.aws.amazon.com

alvis4y ago

Oh. not again...

mattjaynes4y ago· 1 in thread

Tangentially related: On Friday Backblaze and B2 were down for 10+ hours to update their systems for the log4j2 vulnerability. Seemed noteworthy for the HN crowd and I posted a link to their announcement when the outage began. However, the post was quickly flagged and disappeared. Genuinely curious, why is announcing some outages ok and others not?

qaq4y ago

What would be the ratio of HNers who are Backblaze customers vs those who are AWS customers. I bet Backblaze number is small enough where Backblaze employees on HN can downvote you enough for it to matter.

hdjjhhvvhga4y ago· 1 in thread

An honest question. Why do you guys use AWS instead of dedicated servers? It's terribly expensive in comparison, nowadays equally complex, scalability is not magic and you need proper configuration either way, plus now the outages become more and more common. Frankly, I see no reason.

dsr_4y ago

Once you have committed to a certain way of doing things, the transition costs can be very high.

Let's consider RockCo and CloudCo. They both provide a B2B SAAS that is mostly used interactively during the working day, and mostly used via API calls for the rest of the working week. Demand is very much lower on weekends. Both RockCo and CloudCo were founded with a team of six people: a CEO who does sales, a CTO who can do lots of technology things, three general software developers, and one person who manages cloud services (for CloudCo) or wrangles systems and hosting (for RockCo).

In the first year, CloudCo spends less on computing than RockCo does, because CloudCo can buy spot instances of VMs in a few minutes and then stop paying for them when the job is done. RockCo needs a month to signficantly change capacity, but once they've bought it, it is relatively cheap to maintain.

In the second year, they are both growing. CloudCo buys more average capacity, but is still seeing lots of dynamic changes. RockCo keeps growing capacity.

In the third year, they're still growing. CloudCo is noticing that their bills are really high, but all of their infrastructure is oriented to dynamic allocation. They start finding places where it makes sense to keep more VMs around all the time, which cuts the costs a little. RockCo can't absorb a dynamic swing, but their bills are now significantly lower every month than CloudCo's bills, and the machines that they bought two years ago are still quite competitive. A four year replacement cycle is deemed reasonable, with capacity still growing. And bandwidth for RockCo is much cheaper than the same bandwidth for CloudCo.

Who's going to win?

Well, you can't tell. If they both got unexpectedly sudden growth surges, RockCo might not have been able to keep up. If they both got unexpected lulls, CloudCo might have been able to reduce spending temporarily. RockCo spent more up front but much less over the long term. CloudCo could have avoided hiring their cloud administrator for several months at the beginning. RockCo's systems and network engineer is not cheap. And so on, and so forth.

navidkhn14y ago· 1 in thread

My personal health dashboard on AWS shows "InternetConnectivity operational issue us-west-2"

[07:42 AM PST] We are investigating Internet connectivity issues to the US-WEST-2 Region.

iJohnDoe4y ago

Probably a silly question, but what are you using to get this info?

1 more reply

codercotton4y ago· 1 in thread

"Everything is fine." - https://status.aws.amazon.com

rytrix4y ago

Everything *is* fine now. The status page previously reflected an issue much quicker than last time.

jcoder4y ago· 1 in thread

This is new… Siri hasn’t been able to connect for me since this began

ghawkescs4y ago

Same thing here.

account7584y ago· 1 in thread

AWS Global Accelerator not working correctly anymore as well, connections dropped worldwide. Seems like it is managed from us-west-2 and not redundant.

electroly4y ago

This comment taught me about the existence of Global Accelerator and, somewhat ironically given the context, we decided to deploy it today. Pretty neat! I'll have to keep in mind that I learned about it because of a worldwide outage :) Thanks!

samgranieri4y ago· 1 in thread

At least this still works: https://livemap.pingdom.com/

fy204y ago

Partially, the stats on the right are wrong. For me it shows:

Website outages in the past hour 86,967

Lowest 16,208

Average 16,208

Highest 16,209

johnisgood4y ago· 1 in thread

And I kept getting "We're having some trouble serving your request. Sorry!" on HN for the past 10 minutes or something.

edoceo4y ago

Traffic flood to this site for status reports on AWS

branon4y ago· 1 in thread

Yup. Having issues with IT Glue and Duo here.

rd04y ago

Duo issues here as well.

TheFragenTaken4y ago· 1 in thread

At least Twitch.tv (Amazon subsidiary) and npmjs.com seems to be affected.

AustinDev4y ago

Yeah, I'm getting 2000 player errors in the Twitch video player.

Jamie99124y ago· 1 in thread

Twitch seems to have recovered, is it back now for everyone?

bloaf4y ago

Still getting errors in Houston

edit: some streams back up, chat still buggy as of 09:55 local time

edit2: appears to be back ~10:00 local time

1 more reply

streetcat14y ago· 1 in thread

Remember, every 12 secs take one 9.

skj4y ago

eh?

2 more replies

zonkd12344y ago· 1 in thread

yes. Having issues as of few mins ago reaching us-west-2 ec2.

zonkd12344y ago

us-west-2 EC2 looks like just came back online.

1 more reply

yabones4y ago

Yep, it's broken again. I was trying to install some Thunderbird extensions, and stuff started breaking halfway through. Never thought of an AWS outage borking my mail client I guess...

lukeqsee4y ago

We lost all public IPv6 in the Linode Newark DC.

This appears to be cross-provider.

Edit: We have IPv6 back.

kp195_4y ago

We're having issues connecting to our EC2 bastions and accessing the us-west-1 dashboard too

EDIT: Cognito auth seems down for us too

EDIT2: our ALBs are timing out as well

EDIT3: us-west-1 looks like working now!

Zelphyr4y ago

I think it's time to face the fact that we all have too many of our eggs in the AWS basket.

nickjj4y ago

I'm seeing outages on us-west-2 too. Customer facing traffic being served through Route53 -> ALB -> EC2 is down and CLI tools are failing to connect to AWS too.

zedpm4y ago

Wow, yeah, us-west-1 AND us-west-2 are reporting connectivity issues. I'm guessing this is related to the Auth0 outage that's currently going on too.

andrew_4y ago

Root logins are suffering some kind of "captcha outage." The buzz has just begun https://twitter.com/search?q=aws%20captcha&src=typed_query

gz54y ago

looks specific to certain (possibly AWS hosted or partially dependent) services such as Auth0:

https://status.auth0.com/

e.g. our services running on AWS are fine right now, but new sessions dependent on Auth0 are not.

alberth4y ago

It appears AWS Status Page is hosted at AWS [0].

Seems like a really bad idea.

[0] https://hostingchecker.com/

300bps4y ago

I'm on us-east-1 and everything is fine for me including:

* EC2 instances

* AWS Workspaces

* FSx for Windows

* AWS Directory Service

* S3 Buckets

11235813214y ago

Yes, all our stuff in west-2 went down at 7:15 PT.

xondono4y ago

At which point this outages are a sign that something inside AWS is deeply broken and pretty much unfixable?

theverything4y ago

Slack seems to be having issues too.

yottalove4y ago

Even as a software engineer, I think I could build from primitive materials a couple of battery operated transceivers to replace the signal flags or horsemen for critical communications. A little basic physics and materials science goes a long way.

pjf4y ago

Kentik data on the outage: https://twitter.com/DougMadory/status/1471162450649223173

BTCOG4y ago

Can't use MFA right now to get into multiple instances due to this outage.

NicoJuicy4y ago

I get the feeling that Havoc will happen when a tornado would reach us-east-1

commandlinefan4y ago

"Hey boss, that thing that took down us-east-1... that can't take down us-west-1 next week, can it?"

"No, no, of course not"

"Should I check?"

"No, don't waste time checking, get back to your TPS reports"

tubignaaso4y ago

Seeing this on us-west-1. us-east-1 appears to be functioning for us.

iJohnDoe4y ago

Yes, seeing it too.

Seems to be down in a major way. Lots of various AWS services are down. However, so many things depend on AWS that it could just be EC2 is down and it is causing a rippling affect.

tmvnty4y ago

Some npmjs.com pages are returning 503 Service Unavailable for us

CodinM4y ago

I fucking swear to God.

wenbin4y ago

ListenNotes.com has servers running on us-west-2.

One issue is that outbound requests from our servers us-west-2 timeout. Other than that, it seems that we are running ok so far.

devin4y ago

Can someone please update the title to be broader than AWS?

tuzemec4y ago

Is that related to the current NPM status (https://status.npmjs.org/)?

rpadovani4y ago

Systems manager in eu-central-1 is giving us some issues now, but I am not sure about their internal architecture for it, so maybe needs some us resources?

alecr954y ago

Yep, we're also having issues. Hosted on us-west-2

markbnj4y ago

Our systems that talk to S3 in CA and OR are timing out trying to open SSL connections. AWS lists outages in these regions on their status page.

sheepdog4y ago

I can't log on to the console for us-east-1. But our api gateway seems to be working, so I guess production is still up...for now...

anpat4y ago

My monitoring is on fire, flipping red to green every minute because of connectivity issues with every single LB in us-west-2.

evilhackerdude4y ago

4 hours in, our AWS IoT endpoint (not ATS, Symantec) in us-west-2 is still down according to monitoring, PHD and support.

ramesh314y ago

Auth0 down as well, right at the same time. There goes any sort of productivity today. Whole company in firefighting mode.

stevenhubertron4y ago

Yeah. It's inconsistent but a number of my production servers appear to be down. Along with my New Relic logging.

oriettaxx4y ago

AWS appears to be expensive again

mtschopp4y ago

Could it be related to a Log4j issue?

Graffur4y ago

I thought the whole point of AWS was that you could fail over to a different location?

menmob4y ago

7:42 AM PST We are investigating Internet connectivity issues to the US-WEST-2 Region.

monkeybutton4y ago

I really appreciate seeing these threads. Let's me know I haven't lost it.

mysql4y ago

It's bad that I come here first to see if I am crazy or AWS is actually down.

rakem4y ago

proof: https://twitter.com/thedrunkneteng/status/147114428947652608...

aswinmohanme4y ago

Couldn't access Notion, so came to check HN, and boom here is the answer.

Sholmesy4y ago

Yup, seeing this on us-west-1

phgn4y ago

This also seems to affect NPM, I can't install packages locally :/

swaraj4y ago

Our IaaS vendor, Aptible, reports us-west-1 is down / throwing errors

wirelesspotat4y ago

We're seeing AWS issues with us-west-2 at [medium-sized tech company]

blueside4y ago

The vehement defenders of AWS are starting to remind me of the cryptobros

mgbmtl4y ago

QuickBooks Online seems to be down, and they seem to be hosted on AWS.

8K832d7tNmiQ4y ago

Twitch video streaming is also down right now:

HTTP Error 500 internal server error

the_iceman4y ago

Confirmed experiencing significant issues in US-WEST-1 as well

mrsuprawsm4y ago

Seems like this is affecting Dropbox paper, at least for me.

gregmfoster4y ago

Down for us (graphite.dev) as well, running on us-west-2

imhoguy4y ago

I guess it is all about log4shell patching in rush.

rychco4y ago

Tsheets is also down so I can’t clock my hours LOL

niks21124y ago

we are having issue with us-west-1 and us-west-2

the_iceman4y ago

Experiencing significant issues in US-WEST-1

dannyw4y ago

Prime video down for me. Australia.

rwalk4y ago

Yup, trouble in us-west-2 for us.

curtisblaine4y ago

The npm registry is down too.

RunOutOfMemory4y ago

out of memory again. ;<

alvis4y ago

Oh man. Not again!

justinc86874y ago

us-west-2 stuff is down for me too

qwertyuiop_4y ago

Log4jammed ?

moneywoes4y ago

Back up

yellowsir4y ago

npmjs has problems too :(

1 more reply

prakashqwerty4y ago

leetcode.com is also down

1 more reply

alvis4y ago

Oh man, not again!

clavicat4y ago

We are barbarians occupying a city built by an advanced civilization, marveling at the hot baths but know nothing about how their builders keep them running. One day, the baths will drain and anyone who remembers how to fill them up will have died.

26 more replies

robthebrew4y ago

https://nolandda.org/images/memes/nuke_from_orbit.gif

belter4y ago

AWS Outage Analysis - December 15, 2021:

https://www.thousandeyes.com/blog/aws-outage-analysis-decemb...

https://azycqgvwjz.share.thousandeyes.com/view/tests/?roundI...

j / k navigate · click thread line to collapse

468 comments

207 comments · 108 top-level

rexreed4y ago· 9 in thread

It's not just AWS - check the down reports:https://downdetector.com/

Cloudflare having some significant issues as well on certain domains.

yabones4y ago

It's possible people are reporting the issue as CloudFlare because that's whose error page they see when a box on EC2 is unreachable.

1 more reply

jgrahamc4y ago

No, we are not. But customers who use AWS are having trouble.

1 more reply

nerdjon4y ago

The list of affected services is a bit all over the place, especially since I highly doubt Xbox Live or Halo is running on AWS.

3 more replies

the_pwner2244y ago

HN was also (briefly) down around that same time (roughly 1 hour ago from now).

PragmaticPulp4y ago

DownDetector is showing everything down during that period, including Google.

I suspect DownDetector itself suffered some outages during this period, which it shows as outages of every service it monitors.

1 more reply

buryat4y ago

downdetector.com uses users complaints so it’s unreliable as people can blame anything

ren_engineer4y ago

some sort of widescale attack would be the only explanation right?

forgotpwd164y ago

This looks weird. At the same time all those services had a spike in outage reports.

chasd004y ago

can confirm i have multiple salesforce instances down.

QuiiBz4y ago· 8 in thread

I tried to monitor services status using https://stop.lying.cloud, but they are also hosted to AWS, and down too.

civilized4y ago

If they're monitoring AWS downtime they might want to rethink this.

1 more reply

synergy204y ago

AWS should monitor itself from Azure or GCP, even DO or Linode makes more sense.

Eat your own dog food shows confidence, but monitoring it is a different dimension, you need use anything but your own dog food there.

3 more replies

QuinnyPig4y ago

Yeah, I homed https://stop.lying.cloud out of us-west-2. Oops.

2 more replies

saganus4y ago

How does this service work?

It seems to have all the look and feel of AWS, and somehow has more up to date info than the official AWS status page?

1 more reply

tinco4y ago

Now that they're back up they're not reporting any problems, how is it supposed to work? It looks like it is just repeating the status reported on the Amazon status page.

1 more reply

taormina4y ago

I mean, sounds like it's working as intended then?

mishftw4y ago

Funny I didn't know that and assumed it was okay

moneywoes4y ago

That’s hilarious

tomlagier4y ago· 7 in thread

I wonder if AWS will make more or less money from these outages?

Will large players flee because of excessive instability? Or will smaller players go from single-AZ to more expensive multi-AZ?

My guess is that no-one will leave and lots of single-AZ tenants who should be multi-AZ will use this as the impetus to do it.

Honestly, having events like this is probably good for the overall resilience of distributed systems. It's like an immune system, you don't usually fail in the same way repeatedly.

andy_ppp4y ago

* Free chaos monkey installed in every AZ

1 more reply

kenhwang4y ago

1 more reply

jorblumesea4y ago

s_dev4y ago

>I wonder if AWS will make more or less money from these outages?

There is no possibility that outages are good for AWS. Nor is there more money to be made from "publicity" of the outages.

2 more replies

urthor4y ago

The actual answer?

In the next 5 calendar years the bottom line will still grow.

However, the brand damage means they permananently lose market share. Which impacts their growth ceiling.

gjvr4y ago

I would not go multiple Availability Zone within the same Infra/Cloud provider...

ransom15384y ago

"Or will smaller players go from single-AZ to more expensive multi-AZ"

Yes! When you have a service interruption pay 2x more! With a region down I am sure other regions wont have any interruptions either! /s

earthboundkid4y ago· 7 in thread

HOST THE GODDAMN STATUS PAGE ON AZURE FOR FUCKS SAKE.

jeroenhd4y ago

They should host their status page on IPFS instead. If you're never going to change the contents of your status page, you might as well put it into immutable storage!

Aldipower4y ago

If the status page is down, you know the system is down. Mission accomplished. Go ahead.

bnt4y ago

Isn't it on S3 or something? And a few years ago we had that whole S3 is down situation and the status page was also down? xD

1 more reply

r3trohack3r4y ago

I don’t understand, are folks looking at a different status page than me?

This morning we saw some weird behavior in us-west-2, our traffic just _vanished_. I thought: there is no way this is us.

Went to https://status.aws.amazon.com/

Top of the board showed “Internet Connectivity Issues (Oregon)”

And that was that. The board worked exactly as it should - it immediately explained my missing traffic and kept me up-to-date with the status of the outage on their side.

tommek40774y ago

They should automatically update as well. Currently it is a static "all green" page and might be manually changed if a managet would give his go. Insane.

aaronharnly4y ago

Seriously.

boopboopbadoop4y ago

You don’t even know what the problem is yet. Stop shouting solutions.

3 more replies

dijit4y ago· 6 in thread

Is it AWS or could it be an ISP?

tw044y ago

ukyrgf4y ago

I have an outage way over in the southeast, looks to be affecting the major monopoly ISP. Can't get a tech to our data center until 2PM.

banana_giraffe4y ago

Things were working during the event, but connectivity was pretty messed up

https://imgur.com/a/VsrS0JZ

(This is two similarly spec'd boxes on us-east-2 and us-west-2). Looking at GeoIP of connecting clients, the only pattern I can see is the region itself.

adriand4y ago

1 more reply

albatross134y ago

Currently we're seeing 40kms response times from CloudFront distributions, we can't hit PagerDuty (probably runs on AWS), etc.

I guess it could be an ISP thing but I guess we're all assuming 80/20.

1 more reply

treis4y ago

It was an AWS networking issue 90%+ packet loss pinging to Google & Facebook.

jrs2354y ago· 5 in thread

I checked their health status page. All is good. /s

https://downdetector.com/status/aws-amazon-web-services/

tyingq4y ago

They did add an update, faster than last time:

"7:42 AM PST We are investigating Internet connectivity issues to the US-WEST-2 Region."

https://status.aws.amazon.com/

Edit: They added US-WEST-1:

"7:52 AM PST We are investigating Internet connectivity issues to the US-WEST-1 Region."

Edit: Found root case, maybe?

7 more replies

sgt4y ago

Ok, so it can't be down then. This is proof!

ornornor4y ago

Yep, when it loads, it's all green. "nine nines!!!"

2 more replies

Cort3z4y ago

Down detector is just a statistical page, it does not actually detect downtime, and is in no way aws's status page.

drcongo4y ago

What does downdetector run on?

1 more reply

TekMol4y ago· 5 in thread

It is surprising that their status page is down too:

https://status.aws.amazon.com

Their CDN, CloudFront, always works reliable for me. Couldn't they put the status page on CloudFront?

mhitza4y ago

[1] https://aws.amazon.com/blogs/networking-and-content-delivery...

1 more reply

electroly4y ago

The status page is working great for me. Did they make it multi-region after the last failure? I'm on the east coast.

1 more reply

hericium4y ago

Works for me. It's the usual static page with everything green.

1 more reply

drcongo4y ago

Not working for me either in the UK.

clavicat4y ago

Down for me, as well.

iso16314y ago· 4 in thread

Must be a Y in the day.

It amazes me how many projects exist that don't even have multi-region capability, let alone no single point of failure

electroly4y ago

1 more reply

staticassertion4y ago

All to go from, idk, 99.9% uptime to 99.95% (throwing out these numbers)? The thing is when AWS goes down so much of the internet goes down that companies don't really get called out individually.

1 more reply

ahallock4y ago

You're saying that as if it's a walk in the park to set up and not cost prohibitive, in terms of opportunity cost and budget, especially for smaller companies.

1 more reply

Spivak4y ago

This might be a multi-region problem. Auth0 as an example has three US regions and two of them are down.

thadjoOP4y ago· 4 in thread

obligatory comment about status page showing seas of green: https://status.aws.amazon.com

tubignaaso4y ago

The status page appears to be down now as well.

2 more replies

arsome4y ago

For this kind of thing it's usually better to just use a user-driven site like: https://downdetector.ca/status/aws-amazon-web-services/

Some users are clueless, but the clueless users average out over time and the spikes make it clear when there are actual issues.

gregmfoster4y ago

Love to see the manually updated status page not updating

rpadovani4y ago

For me, it's down as well

1 more reply

sam0x174y ago· 3 in thread

lljk_kennedy4y ago

I think SVP / GM approval is only needed for yellow / red status. From my time in AWS Support, the Support Oncall and Call Leader / GM delegate worked to approve green-i posts.

1 more reply

ceejayoz4y ago

They were much faster than usual about updating the AWS Status page.

3 more replies

vineyardmike4y ago

> we experience the obligatory black friday us-east-1 outage every year.

Is this a thing?

adnauseum4y ago· 3 in thread

Seems like ever since Microsoft bought AWS, it's been going down an awful lot.

endisneigh4y ago

> Seems like ever since Microsoft bought AWS, it's been going down an awful lot.

What?

1 more reply

masterof04y ago

Didn't know Tim Dillon is hanging on here in HN.

exdsq4y ago

Haha wtf?

dhruvarora0134y ago· 3 in thread

Looks like its taken down SendGrid, NPM, Twitch, Auth0 so far

hericium4y ago

PlayStation Network went down at the same time.

cyral4y ago

Stripe as well

ents4y ago

Notion as well

yawnxyz4y ago· 3 in thread

Vercel is down too.

My sites run on Cloudflare and Vercel, and I can't even log in to those right now.

I'm curious — what does Hacker News run on? It seems impervious to any kind of downtime...

qeternity4y ago

> I'm curious — what does Hacker News run on? It seems impervious to any kind of downtime...

On a dirty, disgusting dedicated server.

1 more reply

ceejayoz4y ago

HN definitely gets overloaded at times, including during big outages when everyone stampedes here. I got a bunch of "sorry, we can't serve your request" a little while back.

Pobody's nerfect.

1 more reply

hn_throwaway_694y ago

DNS A record suggests a dedicated server from this company:

https://www.m5hosting.com/

cebert4y ago· 2 in thread

Gov Cloud Status Page: https://status.aws.amazon.com/govcloud

texasviking4y ago

We are in the same boat. Finally updated "We are investigating Internet connectivity issues to the US-GOV-WEST-1 Region"

chasd004y ago

i had multiple govcloud hosted salesforce instances down but they appear to be coming back up now.

nic_wilson4y ago· 2 in thread

We are seeing issues with requests to Auth0, which I believe is hosted on AWS and has historically gone down when AWS has had issues

romanhotsiy4y ago

We see issues with Auth0 too. Other AWS services we use seem to be working fine so far (us-east-1)

1 more reply

ramesh314y ago

Auth0 went down for us as well right when AWS did. At least it's not like those two systems run our entire company...

iamricks4y ago· 2 in thread

How much do you guys think these frequent outages will effect their market share in cloud products?

Is this enough of a push for organizations to actually move over their infrastructure to other providers?

ceejayoz4y ago

Not at all.

The other cloud providers have had their own outages.

1 more reply

pm904y ago

cblconfederate4y ago· 2 in thread

Reminder that the internet was literally invented to avoid this kind of nuclear attack. But i guess people are herdish animals and prefer to die as a group

throw_m2393394y ago

doublepg234y ago

Seems like the Internet level networking is quite robust at this point.

baskethead4y ago· 2 in thread

It sounds like their systems design interviews aren’t rigorous enough.

tyingq4y ago

I'm guessing lots of people fled us-east-1 for us-west-2, after the last outage, and overwhelmed something there.

1 more reply

pm904y ago

At this point they should hire specifically for config management and rollout.

Mostly /s; I wish the aws engineers the best of luck through this.

1 more reply

CoastalCoder4y ago· 2 in thread

Asking as a non-cloud-developer: why would Crunchyroll's recovery [0] lag so much behind AWS's recovery [1]?

[0] https://downdetector.com/status/crunchyroll/

[1] https://downdetector.com/status/aws-amazon-web-services/

spenczar54y ago

I don't know for sure, but this is generally common because caches get cold.

So cold caches and retries can conspire to keep a service down for a long time even after the root cause is fixed.

1 more reply

Nexxxeh4y ago

Crunchyroll seems to barely work at the best of times, and when it does, it's still a mess.

I can only imagine their back-end is mostly Visual Basic running on a single AWS-powered VM.

aaronharnly4y ago· 1 in thread

Everyone who spent the past week migrating from us-east-1 to us-west-2: this joke is on you. :)

DarthNebo4y ago

"US-EAST-1 or bust" being manifested right now.

gitfan864y ago· 1 in thread

necovek4y ago

Well, why didn't you? :)

The response is that this actually works well enough, so the investment required has not pushed anyone to do it (with that meaning building the core infrastructure to make that easy).

waynecochran4y ago· 1 in thread

cle4y ago

2 more replies

turtlebits4y ago· 1 in thread

That was fun. Badges weren't working (daily checkin required) so the front desk had to manually activate them.

Slack wasn't sending messages and Pagerduty was throwing 500's.

api4y ago

... because you need to contact a server 1000 miles away to issue badges in your building.

This cloud-for-everything-even-local-devices thing is both hilarious and sad.

I wonder if anyone had trouble doing their dishes or laundry today, because I'm sure someone thought dish washers and washing machines needed cloud.

2 more replies

myth_drannon4y ago· 1 in thread

That's the price of PIP culture and burning out your devs. Now noone wants to work at Amazon and they can only hire new grads.

Throwawayaerlei4y ago

I hear they do get people who want to be able to get experience at AWS's scale, there's only a few places for that.

ceejayoz4y ago· 1 in thread

We're having troubles in us-west-2.

Discourse is reporting trouble, too. https://twitter.com/DiscourseStatus/status/14711403698992906...

supermathie4y ago

us-west-1 also seems offline, but us-east-1 (ironically) seems fine

wirelesspotat4y ago· 1 in thread

AWS status page shows an update:

> AWS Internet Connectivity (Oregon): 7:42 AM PST We are investigating Internet connectivity issues to the US-WEST-2 Region.

Source: https://status.aws.amazon.com

alvis4y ago

Oh. not again...

mattjaynes4y ago· 1 in thread

qaq4y ago

hdjjhhvvhga4y ago· 1 in thread

dsr_4y ago

Once you have committed to a certain way of doing things, the transition costs can be very high.

In the second year, they are both growing. CloudCo buys more average capacity, but is still seeing lots of dynamic changes. RockCo keeps growing capacity.

Who's going to win?

navidkhn14y ago· 1 in thread

My personal health dashboard on AWS shows "InternetConnectivity operational issue us-west-2"

[07:42 AM PST] We are investigating Internet connectivity issues to the US-WEST-2 Region.

iJohnDoe4y ago

Probably a silly question, but what are you using to get this info?

1 more reply

codercotton4y ago· 1 in thread

"Everything is fine." - https://status.aws.amazon.com

rytrix4y ago

Everything *is* fine now. The status page previously reflected an issue much quicker than last time.

jcoder4y ago· 1 in thread

This is new… Siri hasn’t been able to connect for me since this began

ghawkescs4y ago

Same thing here.

account7584y ago· 1 in thread

AWS Global Accelerator not working correctly anymore as well, connections dropped worldwide. Seems like it is managed from us-west-2 and not redundant.

electroly4y ago

samgranieri4y ago· 1 in thread

At least this still works: https://livemap.pingdom.com/

fy204y ago

Partially, the stats on the right are wrong. For me it shows:

Website outages in the past hour 86,967

Lowest 16,208

Average 16,208

Highest 16,209

johnisgood4y ago· 1 in thread

And I kept getting "We're having some trouble serving your request. Sorry!" on HN for the past 10 minutes or something.

edoceo4y ago

Traffic flood to this site for status reports on AWS

branon4y ago· 1 in thread

Yup. Having issues with IT Glue and Duo here.

rd04y ago

Duo issues here as well.

TheFragenTaken4y ago· 1 in thread

At least Twitch.tv (Amazon subsidiary) and npmjs.com seems to be affected.

AustinDev4y ago

Yeah, I'm getting 2000 player errors in the Twitch video player.

Jamie99124y ago· 1 in thread

Twitch seems to have recovered, is it back now for everyone?

bloaf4y ago

Still getting errors in Houston

edit: some streams back up, chat still buggy as of 09:55 local time

edit2: appears to be back ~10:00 local time

1 more reply

streetcat14y ago· 1 in thread

Remember, every 12 secs take one 9.

skj4y ago

eh?

2 more replies

zonkd12344y ago· 1 in thread

yes. Having issues as of few mins ago reaching us-west-2 ec2.

zonkd12344y ago

us-west-2 EC2 looks like just came back online.

1 more reply

yabones4y ago

Yep, it's broken again. I was trying to install some Thunderbird extensions, and stuff started breaking halfway through. Never thought of an AWS outage borking my mail client I guess...

lukeqsee4y ago

We lost all public IPv6 in the Linode Newark DC.

This appears to be cross-provider.

Edit: We have IPv6 back.

kp195_4y ago

We're having issues connecting to our EC2 bastions and accessing the us-west-1 dashboard too

EDIT: Cognito auth seems down for us too

EDIT2: our ALBs are timing out as well

EDIT3: us-west-1 looks like working now!

Zelphyr4y ago

I think it's time to face the fact that we all have too many of our eggs in the AWS basket.

nickjj4y ago

I'm seeing outages on us-west-2 too. Customer facing traffic being served through Route53 -> ALB -> EC2 is down and CLI tools are failing to connect to AWS too.

zedpm4y ago

Wow, yeah, us-west-1 AND us-west-2 are reporting connectivity issues. I'm guessing this is related to the Auth0 outage that's currently going on too.

andrew_4y ago

Root logins are suffering some kind of "captcha outage." The buzz has just begun https://twitter.com/search?q=aws%20captcha&src=typed_query

gz54y ago

looks specific to certain (possibly AWS hosted or partially dependent) services such as Auth0:

https://status.auth0.com/

e.g. our services running on AWS are fine right now, but new sessions dependent on Auth0 are not.

alberth4y ago

It appears AWS Status Page is hosted at AWS [0].

Seems like a really bad idea.

[0] https://hostingchecker.com/

300bps4y ago

I'm on us-east-1 and everything is fine for me including:

* EC2 instances

* AWS Workspaces

* FSx for Windows

* AWS Directory Service

* S3 Buckets

11235813214y ago

Yes, all our stuff in west-2 went down at 7:15 PT.

xondono4y ago

At which point this outages are a sign that something inside AWS is deeply broken and pretty much unfixable?

theverything4y ago

Slack seems to be having issues too.

yottalove4y ago

pjf4y ago

Kentik data on the outage: https://twitter.com/DougMadory/status/1471162450649223173

BTCOG4y ago

Can't use MFA right now to get into multiple instances due to this outage.

NicoJuicy4y ago

I get the feeling that Havoc will happen when a tornado would reach us-east-1

commandlinefan4y ago

"Hey boss, that thing that took down us-east-1... that can't take down us-west-1 next week, can it?"

"No, no, of course not"

"Should I check?"

"No, don't waste time checking, get back to your TPS reports"

tubignaaso4y ago

Seeing this on us-west-1. us-east-1 appears to be functioning for us.

iJohnDoe4y ago

Yes, seeing it too.

Seems to be down in a major way. Lots of various AWS services are down. However, so many things depend on AWS that it could just be EC2 is down and it is causing a rippling affect.

tmvnty4y ago

Some npmjs.com pages are returning 503 Service Unavailable for us

CodinM4y ago

I fucking swear to God.

wenbin4y ago

ListenNotes.com has servers running on us-west-2.

One issue is that outbound requests from our servers us-west-2 timeout. Other than that, it seems that we are running ok so far.

devin4y ago

Can someone please update the title to be broader than AWS?

tuzemec4y ago

Is that related to the current NPM status (https://status.npmjs.org/)?

rpadovani4y ago

Systems manager in eu-central-1 is giving us some issues now, but I am not sure about their internal architecture for it, so maybe needs some us resources?

alecr954y ago

Yep, we're also having issues. Hosted on us-west-2

markbnj4y ago

Our systems that talk to S3 in CA and OR are timing out trying to open SSL connections. AWS lists outages in these regions on their status page.

sheepdog4y ago

I can't log on to the console for us-east-1. But our api gateway seems to be working, so I guess production is still up...for now...

anpat4y ago

My monitoring is on fire, flipping red to green every minute because of connectivity issues with every single LB in us-west-2.

evilhackerdude4y ago

4 hours in, our AWS IoT endpoint (not ATS, Symantec) in us-west-2 is still down according to monitoring, PHD and support.

ramesh314y ago

Auth0 down as well, right at the same time. There goes any sort of productivity today. Whole company in firefighting mode.

stevenhubertron4y ago

Yeah. It's inconsistent but a number of my production servers appear to be down. Along with my New Relic logging.

oriettaxx4y ago

AWS appears to be expensive again

mtschopp4y ago

Could it be related to a Log4j issue?

Graffur4y ago

I thought the whole point of AWS was that you could fail over to a different location?

menmob4y ago

7:42 AM PST We are investigating Internet connectivity issues to the US-WEST-2 Region.

monkeybutton4y ago

I really appreciate seeing these threads. Let's me know I haven't lost it.

mysql4y ago

It's bad that I come here first to see if I am crazy or AWS is actually down.

rakem4y ago

proof: https://twitter.com/thedrunkneteng/status/147114428947652608...

aswinmohanme4y ago

Couldn't access Notion, so came to check HN, and boom here is the answer.

Sholmesy4y ago

Yup, seeing this on us-west-1

phgn4y ago

This also seems to affect NPM, I can't install packages locally :/

swaraj4y ago

Our IaaS vendor, Aptible, reports us-west-1 is down / throwing errors

wirelesspotat4y ago

We're seeing AWS issues with us-west-2 at [medium-sized tech company]

blueside4y ago

The vehement defenders of AWS are starting to remind me of the cryptobros

mgbmtl4y ago

QuickBooks Online seems to be down, and they seem to be hosted on AWS.

8K832d7tNmiQ4y ago

Twitch video streaming is also down right now:

HTTP Error 500 internal server error

the_iceman4y ago

Confirmed experiencing significant issues in US-WEST-1 as well

mrsuprawsm4y ago

Seems like this is affecting Dropbox paper, at least for me.

gregmfoster4y ago

Down for us (graphite.dev) as well, running on us-west-2

imhoguy4y ago

I guess it is all about log4shell patching in rush.

rychco4y ago

Tsheets is also down so I can’t clock my hours LOL

niks21124y ago

we are having issue with us-west-1 and us-west-2

the_iceman4y ago

Experiencing significant issues in US-WEST-1

dannyw4y ago

Prime video down for me. Australia.

rwalk4y ago

Yup, trouble in us-west-2 for us.

curtisblaine4y ago

The npm registry is down too.

RunOutOfMemory4y ago

out of memory again. ;<

alvis4y ago

Oh man. Not again!

justinc86874y ago

us-west-2 stuff is down for me too

qwertyuiop_4y ago

Log4jammed ?

moneywoes4y ago

Back up

yellowsir4y ago

npmjs has problems too :(

1 more reply

prakashqwerty4y ago

leetcode.com is also down

1 more reply

alvis4y ago

Oh man, not again!

clavicat4y ago

26 more replies

robthebrew4y ago

https://nolandda.org/images/memes/nuke_from_orbit.gif

belter4y ago

AWS Outage Analysis - December 15, 2021:

https://www.thousandeyes.com/blog/aws-outage-analysis-decemb...

https://azycqgvwjz.share.thousandeyes.com/view/tests/?roundI...

j / k navigate · click thread line to collapse