{ "errorCode" : "InternalError" }
When I attempt to use the AWS Console to view s3
Apologies if you find this to be in poor taste, but GCS directly supports the S3 XML API (including v4):
https://cloud.google.com/storage/docs/interoperability
and has easy to use multi-regional support at a fraction of the cost of what it would take on AWS. I directly point my NAS box at home to GCS instead of S3 (sadly having to modify the little PHP client code to point it to storage.googleapis.com), and it works like a charm. Resumable uploads work differently between us, but honestly since we let you do up to 5TB per object, I haven't needed to bother yet.
Again, Disclosure: I work on Google Cloud (and we've had our own outages!).
The timeline, as observed by Tarsnap:
First InternalError response from S3: 17:37:29
Last successful request: 17:37:32
S3 switches from 100% InternalError responses to 503 responses: 17:37:56
S3 switches from 503 responses back to InternalError responses: 20:34:36
First successful request: 20:35:50
Most GET requests succeeding: ~21:03
Most PUT requests succeeding: ~21:52For legacy customers, it's hard to move regions, but in general, if you have the chance to choose a region other than us-east-1, do that. I had the chance to transition to us-west-2 about 18 months ago and in that time, there have been at least three us-east-1 outages that haven't affected me, counting today's S3 outage.
EDIT: ha, joke's on me. I'm starting to see S3 failures as they affect our CDN. Lovely :/
A pyrrhic victory... ;)
[0] - http://status.hrpartner.io
EDIT UPDATE: Well, I spoke too soon - even our status page is down now, but not sure if that is linked to the AWS issues, or simply the HN "hug of death" from this post! :)
EDIT UPDATE 2: Aaaaand, back up again. I think it just got a little hammered from HN traffic.
The dashboard not changing color is related to S3 issue.
See the banner at the top of the dashboard for updates.
So it's not just a joke... S3 being down actually breaks its own status page!"We are investigating increased error rates for Amazon S3" translates to "We are trying to figure out why our mission critical system for half the internet is completely down for most (including some of our biggest) customers."
https://twitter.com/homakov/status/836649802842591232
I've been fuzzing S3 parameters last couple hours...
And now it's down.
(Yes it sucks and yes we're working on fixing it. We hate slow software too!)
CloudFront is currently experiencing problems with requesting objects from Amazon S3.
Can I also say I am constantly disappointed by AWS's status page: https://status.aws.amazon.com/ it seems whenever there is an issue this takes a while to update. Sometimes all you see is a green checkmark with a tiny icon saying a note about some issue. Why not make it orange or something. Surely they must have some kind of external monitor on these things that could be integrated here?edit: Since posting my comment they added a banner of
"Increased Error Rates
We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region."
However S3 still shows green and "Service is operating normally"
Somewhere a sysadmin is having to explain to a mildly technical manager that AWS services are down and affecting business critical services. That manager will be chewing out the tech because the status site shows everything is green. Dishonest metrics are worse than bad metrics for this exact reason.
Any sysadmin who wasn't born yesterday knows that service metrics are gamed relentlessly by providers. Bluntly there aren't many of us, and we talk. Message to all providers: sysadmins losing confidence in your outage reporting has a larger impact than you think. Because we will be the ones called to the carpet to explain why <services> are down when <provider> is lying about being up.
But if you go to your personal health dashboard (https://phd.aws.amazon.com/phd/home#/dashboard/open-issues) they report an S3 operational issue event there.
Edit: Mine is reporting region us-east-1
Edit 2: And now the event disappeared from my personal health dashboard too. But we are still experiencing issues. WTH.
* Slack file sharing no longer works, hangs forever (no way to hide the permanently rolling progress bar except quitting)
* Github.com file uploads (e.g. dropping files into a Github issue) don't work.
* Imgur.com is completely down.
* Docker Hub seems to be unavailable. Can't pull/push images.
they just now put up a box at the top saying "We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region."
increased error rates? really?
Amazon, everything is on fire. you are not fooling anyone
edit: in the future, please subscribe to @MyFootballNow for timely AWS service status updates https://pbs.twimg.com/media/C5xdm9_WMAAY7y_.jpg:large
Through some dumb luck (and desire to procrastinate a bit), I opened HN and, subsequently, the AWS status page and actually read the US-EAST-1 notification.
HN saves the day.
"Increased API Error Rates - 9:52 AM PST We are investigating increased error rates in the US-EAST-1" "S3 operational issue - us-east-1"
What else should I add?
I'm curious how much $ this will lose today for the economy. :)
Many aws SDK libs don't remove \n for you.
(I hope it wasn't me who broke it lol)
You would have to host your own software which can also fail, but then at least you could do something about it. For example, you could avoid changing things during critical times of your own business (e.g. a tradeshow), which is something no standard provider could do. You could also dial down consistency for the sake of availability, e.g. keep a lot of copies around even if some of them are often stale - more often than not this would work well enough for images.
https://jdorfman.posthaven.com/medium-bitcoin-660x493-dot-jp...
The irony
[1]: https://aws.amazon.com/s3/details/
[2]: https://en.wikipedia.org/wiki/High_availability#Percentage_c...
Well good thing I have my backups on [some service that happens to also use S3 as a backend].
As someone who's really only a yellow belt (assuming you're all black belts!), just so I understand ('cos I'm cacking myself!) ...
I'm seeing the same issue. Does this mean there's a problem with Amazon? I can't access either of my S3 accounts even if I change the region, and I'm concerned it may be something I've done wrong, and deleted the whole lot. It was working yesterday!!!
Would be massively grateful for a heads up. Thanks in advance.
"Believe" is not inspiring.
(I think the AM means PM)
"500 The server encountered an error processing your request." message
It appears to be impacting gotomeeting, I get this error when trying to start a 12pm meeting here:
CloudFront is currently experiencing problems with requesting objects from Amazon S3.
Edit: ironically, my missed 12pm meeting was an Azure training session.
There is something to be said about not being located in the region where everything gets launched first, and where most the customers are not [imo all the benefits of the product, processes and people, but less risk].
Good luck to everyone impacted by this...crappy day.
- https://status.aws.amazon.com/
- https://medium.statuspage.io/
- http://status.filestack.com/
- http://www.trellostatus.com/
AMZN stock down $3.45 (0.41%).
[1] http://www.nbcchicago.com/news/national-international/Amazon...
The only services my team uses directly are EC2 and RDS, and I'm thinking of moving RDS over to EC2 instances.
We are entirely portable. We can move my entire team's infrastructure to a different cloud host really quickly. Our only dependency is a Debian box.
I flipped the switch today and cloned our prod environment, including VPN and security rules, over to a commodity hosting provider.
Change the DNS entry for the services, and we were good to go. We didn't need to do anything because everyone was freaking out about everything else being down. But our internal services were close to unaffected.
At least for my team.
Obviously, we aren't Trello or some of the other big people affected. And we don't have the same needs they do. But setting up the DevOps stuff for my team in the way that I think was correct to begin with (no dependencies other than a Debian box) really shined today. Having a clear and correct deployment strategy on any available hardware platform really worked for us.
Or at least it would have if people weren't so upset about all our other external services being down that they paid no attention to internal services.
Lock-in is bad, mmkay?
If your company is the right size, and it makes sense, do the extra work. It's not that hard to write agnostic scripts that deploy your software, create your database, and build your data from a backup. This can be a big deal when some providers are flipping out.
All-your-junk-in-one-place is really overrated, in my opinion. Be able to rebuild your code and your data at any given point in time. If you don't have that, I don't really know what you have.
Hearing reports of EBS down as well.
From http://status.aws.amazon.com/ Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.
[edit- looks like they do have a pretty heavy reliance on S3, per https://github.com/WhisperSystems/Signal-Server/blob/master/... and various other sources.]
As part of the release they wanted to make sure everybody gets a chance to see "red" metrics.
https://atom-installer.github.com/
is there a part of this hosted on S3? I cannot open Atom anymore, it keep crashing on the check for updates screen...
[1] http://www.inc.com/sonya-mann/amazon-web-services-outage.htm...
In the last couple of minutes that forum post has gone from not existing to 175 views and 9 posts.
Amazon Elastic Compute Cloud (N. Virginia) Increased Error Rates less 11:38 AM PST We can confirm increased error rates for the EC2 and EBS APIs and failures for launches of new EC2 instances in the US-EAST-1 Region. We are also experiencing degraded performance of some EBS Volumes in the Region.
Amazon Elastic Load Balancing (N. Virginia) Increased Error Rates more
Amazon Relational Database Service (N. Virginia) Increased Error Rates more
Amazon Simple Storage Service (US Standard) Increased Error Rates more
Auto Scaling (N. Virginia) Increased Error Rates more
AWS Lambda (N. Virginia) Increased Error Rates more
In the meantime, EC2, ELB, RDS, Lambda, and autoscaling have all been confirmed to be experiencing issues.
https://www.google.com/finance?chdnp=0&chdd=0&chds=1&chdv=0&...
When I go to my orders I get "There's a problem displaying some of your orders right now. If you don't see the order you're looking for, try refreshing this page, or click "View order details" for that order."
It seems that Amazon is eating its own dog food.
[1] https://www.forbes.com/sites/ryanwhitwam/2017/02/28/amazon-s...
"Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue."
It shows up in the event log now too.
For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.
http://alessandrobender.com.br/wp-content/uploads/2015/07/fi...
Amazon Web ServicesVerified account @awscloud 8m8 minutes ago More The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates.
Increased API Error Rates
09:52 AM PST We are investigating increased error rates in the US-EAST-1 Region.
Event data Event S3 operational issue Status Open Region/AZ us-east-1 Start time February 28, 2017 at 6:51:57 PM UTC+1 End time - Event category Issue
S3 promises four nines of availability (11 nines of durability), so today we got about 3-4 years worth of downtime in one fell swoop. Oops.
At least now we can see all the network failures in full RGB.
$ s3cmd ls WARNING: Retrying failed request: / ([Errno 60] Operation timed out) WARNING: Waiting 3 sec... WARNING: Retrying failed request: / ([Errno 60] Operation timed out) WARNING: Waiting 6 sec...
https://status.aws.amazon.com/
Half internet is down the data center in Virginia the one with the cloud is totally dead apparently. Enjoy the cloud bullshit :)
I'd rather my app load but appear broken so I can show my own status rather than just shutting down every single app...
Interesting tweet from last month.
Technology leads to technology (and wealth) monopolies, in other words: more centralization. Which has always been bad.
Just like with Cloudflare leaking highly sensitive data all over the Internet, a couple of days ago.
Source: https://status.aws.amazon.com/
After two hours, they have finally updated their dashboard.
"The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."
:(
It seems their status page is hosted ... as a S3 static website.
Well that explains all the green checkmarks /s
This is bullshit if you're using an S3 origin in your distribution.
Upon the fields of barley
You'll forget the sun in his jealous sky
As we walk in fields of green
2. People push updates as fast as possible to fix security
3. No tests, so everything blows up
Slack image uploads are hanging.
Oh wait. The site sits on S3. Never mind.
https://aws.amazon.com/blogs/aws/new-cross-region-replicatio...
They are consistent for me.
edit: for the year, it only takes 52.57 minutes
Is it related to S3??
slack file services down too
Seems cloud computing still has a lot to learn.
Then they started dropped rocks on S3 and who is laughing now?
Increased Error Rates
Update at 11:35 AM PST: We have now repaired the ability to
update the service health dashboard. The service updates
are below. We continue to experience high error rates with
S3 in US-EAST-1, which is impacting various AWS services.
We are working hard at repairing S3, believe we understand
root cause, and are working on implementing what we believe
will remediate the issue.
Amazon hosted their status page on their failing service, ouch. Now they fixed the status page, after more than one hour. The dashboard not changing color is related to S3 issue.
See the banner at the top of the dashboard for updates.
https://twitter.com/awscloud/status/836656664635846656