I don't think anyone was especially happy with it. I think AWS, as an entity, is probably just as unhappy as its customers.
I was happy with their response, and I was happy with it during the outage last year, too. They're adapting, and I believe they're constantly getting better. It's still a pretty new thing, this utility computing service. You can't reasonably expect them to expect the unexpected. I'm quite sure that even if they didn't apply the "5 whys," or make them publicly available, they are doing something to address the control plane issue.
I'm confident that the service will improve. Some things just need to be battle hardened.
The whole problem with the AZ thing is that they're geographically congruent. Major weather events are pretty likely to mess you up. Remember the disk latency spikes from that little earthquake?
It costs more, but that they operate properties all over the world, and that they're operable under a common paradigm (for most services), is the truly compelling feature of AWS. I keep my major operation on the east and fail over to the west if there's a significant event. It's a little more labor intensive, but it works.
If you compare the recovery time (ballpark, feel free to break down the timelines in your copious amounts of free time):
2011:
12:47AM, Apr 21 - Event started, API impaired across all availability zones
12:00PM, Apr 21 - API recovered in non-affected zones
"Customers also experienced elevated error rates until Noon
PDT on April 21st when attempting to launch new EBS-backed
EC2 instances in Availability Zones other than the affected
zone."
12:30PM, Apr 22 - Nearly all volumes in affected zone restored
"all but about 2.2% of the volumes in the affected
Availability Zone were restored by 12:30PM PDT on
April 22nd"
18:15PM, Apr 23 - API restored for affected zone
"At 6:15 PM PDT on April 23rd, API access to EBS resources
was restored in the affected Availability Zone."
2012: 20:04, July 2 - Some number of racks lose power due to drained UPSs
21:10, July 2 - API restore
"8:04pm PDT to 9:10pm PDT, customers were not able to launch
new EC2 instances, create EBS volumes, or attach volumes in
any Availability Zone in the US-East-1 Region. At 9:10pm PDT,
control plane functionality was restored for the Region."
02:45, July 3 - Vast majority of volumes restored to customers
"By 2:45am PDT, 90% of outstanding volumes had been turned
over to customers."
http://aws.amazon.com/message/65648/
http://aws.amazon.com/message/67457/Yes I'm painting with broad strokes here, and feel free to argue the details (we always do). But I do think this at least shows some improvement to answer the previous poster's question.
[edits to try to fix the formatting, fixed mis-paste]
AWS effectively lost its control plane for entire region as a result of a failure within a single AZ. This was not supposed to be possible.
"I find it funny how we have this assumption that if we don't architect across multiple AZs or regions we shouldn't be surprised when our service goes down because of an AWS failure, but that if we do, we're "pretty safe" -- and then Amazon itself experiences failure spanning AZs from a single-AZ failure.
"The control planes for EC2 and EBS were significantly impacted by the power failure” in a single AZ."
neither of these things are reasons for the disruption, but side effects of. not much "why" happening in the article all together.