I think this breaks the site guidelines. Worse, I don't think the other people are wrong: being in a different region implies being in a different availability zone.
That is, I've read the comments to say "they're not only in different AZ's, they're in different regions". It seems you seem determined to pick a reading that lets you feel smugly superior about your level of knowledge, and then cast out digs at other people based on that presumed superiority.
Availability zones do not map across regions. AZs are specific to a region. Different regions have differing numbers of AZs. us-east-1 has 3. IIRC ap-southeast-1 has 2.
> That is, I've read the comments to say "they're not only in different AZ's, they're in different regions"
So I've read. The earlier example about STS that someone brought up was incorrect; both I and another commenter linked to the doc with the correct information.
> It seems you seem determined to pick a reading that lets you feel smugly superior about your level of knowledge, and then cast out digs at other people based on that presumed superiority.
You obviously feel very strongly about this. You've replied to my parent twice now. You're right that the parenthetical was harsh but I wouldn't say it's uncalled for.
Every one of these outage threads descends into a slew of easily defensible complaints about cloud providers. The quality of these discussions is terrible. I spend a lot of time at my dayjob (and as a hobby) working on networking related things. Understanding the subtle guarantees offered by AWS is a large part of my day-to-day. When I see people here make easily falsifiable comments full of hearsay ("I had a friend of a friend who works at Amazon and they did X, Y, Z bad things") and use that to drum up a frenzy, it flies in the face of what I do everyday. There's lots of issues with cloud providers as a whole and AWS in particular but to get to that level of conversation you need to understand what the system is actually doing, not just get angry and guess why it's failing.
> Availability zones do not map across regions. AZs are specific to a region. Different regions have differing numbers of AZs. us-east-1 has 3. IIRC ap-southeast-1 has 2.
Right.. So if you are in a different region, you are by definition in a different availability zone.
> You obviously feel very strongly about this. You've replied to my parent twice now. You're right that the parenthetical was harsh but I wouldn't say it's uncalled for.
Yah, I really thought about it and you're just reeking unkindness. And the people above that you're replying to and mocking are not wrong.
> Every one of these outage threads descends into a slew of easily defensible complaints about cloud providers. The quality of these discussions is terrible. I spend a lot of time at my dayjob (and as a hobby) working on networking related things. Understanding the subtle guarantees offered by AWS is a large part of my day-to-day.
If you're unable to be civil about this, maybe you should avoid the threads. Amazon seeks to avoid common-mode failures between AZs (and thus regions). This doesn't mean that Amazon attains this goal. And the larger point: as I'm sure you're aware, building a distributed system that attains higher uptimes by crossing multiple AZs is hard and costly and can only be justified in some cases.
I've got >20 years of experience in building geographically distributed, sharded, and consensus-based systems. I think you are being unfair to the people you're discussing with. Be nice.
there is a distinction between azs within a region vs azs in different regions. the overwhelming majority of services are offered regionally and provide slas at that level. services are expected to have entirely independent infrastructure for each region, and cross-regional/global services are built to scope down online cross regional dependencies as much as possible.
the specific example brought up (cross regional sts) is wrong in the sense that sts is fully regionalized as evidenced by the overwhelming number of aws services that leverage sts not having a global meltdown. but as others mentioned in a lot of ways it’s even worse because customers are opted into the centralized endpoint implicitly.
I didn't read my tone as uncivil, just harsh. I guess it came across harsher than intended. I'll try to cool it a bit more next time, but I have to say it's not like the rest of HN is taking this advice to heed when they're criticizing AWS. I realize that this isn't a defense (whataboutism), but I guess it's fine to "speak truth to power" or something? Anyway point noted and I'll try to keep my snark down.
> Amazon seeks to avoid common-mode failures between AZs (and thus regions). This doesn't mean that Amazon attains this goal. And the larger point: as I'm sure you're aware, building a distributed system that attains higher uptimes by crossing multiple AZs is hard and costly and can only be justified in some cases.
Right, so which common mode failures are occurring here? What I'm seeing in this thread and previous AWS threads is a lot of hearsay. Stuff like "the AWS console isn't loading" or "I don't have that problem on Linode!" or "the McDonalds app isn't working so everything is broken thanks to AWS!" I'd love to see a postmortem document, like this, actually uncover one of these common mode failures. Not because I doubt they exist (any real system has bugs and I have no doubt a real distributed system has real limitations); I just haven't seen it borne in real world experience at my current company and other companies I've worked at which used AWS pretty heavily.
While I don't work at AWS, my company also publishes an SLA and we refund our customers when we dip below that SLA. When an outage, SLA-impacting or not, occurs, we spend a _lot_ of time getting to the bottom of what happened and documenting what went wrong. Frequently it's multiple things that go wrong which cause a sort of cascading failure that we didn't catch or couldn't reproduce in chaos testing. Part of the process of architecting solutions for high scale (~ billions/trillions of weekly requests) is to work through the AWS docs and make sure we select the right architecture to get the guarantees we seek. I'd like to see evidence of common-mode failures and the defensive guarantees that failed in order show proof of them, or proof positive through a dashboard or something, before I'm willing to malign AWS so easily.
> And the larger point: as I'm sure you're aware, building a distributed system that attains higher uptimes by crossing multiple AZs is hard and costly and can only be justified in some cases.
Sure if you're not operating high reliability services at high scale, it's true, you don't need cross-AZ or cross-region failover. But if you chose, through balance sheet or ignorance, not to take advantage of AWS's reliability features then you shouldn't get to complain that AWS is unreliable. Their guarantees are written on their SLA pages.