Apologies if you find this to be in poor taste, but GCS directly supports the S3 XML API (including v4):
https://cloud.google.com/storage/docs/interoperability
and has easy to use multi-regional support at a fraction of the cost of what it would take on AWS. I directly point my NAS box at home to GCS instead of S3 (sadly having to modify the little PHP client code to point it to storage.googleapis.com), and it works like a charm. Resumable uploads work differently between us, but honestly since we let you do up to 5TB per object, I haven't needed to bother yet.
Again, Disclosure: I work on Google Cloud (and we've had our own outages!).
It's not in bad taste, despite other comments saying otherwise. We need to recognize that competition is good, and Amazon isn't the answer to everything.
I think there is little GCP does better than AWS. Pricing is better on paper, but performance per buck seems to be on par. Stability is a lot worse on GCP, and I don't just mean service outages like this one (which they had their fair share) but also individual issues like instances slowing down or network acting up randomly. Also lack of service offerings like no PostgreSQL, functions never leaving alpha, no hosted redis clusters etc... Support is also too expensive compared to AWS.
Management interfaces are better on GCP and sustained use discount is a big step up against AWS reservations. Otherwise, I think AWS works better.
Sounds like it basically coincides with Diane Greene coming on board to run the show -- which is great news for all of us with increased competition on not just the technical front but also support (which is often the deal maker/breaker)
https://medium.com/@jim_dowling/reflections-on-s3s-architect...
Our production Cloud SQL started throwing errors that we could not write anything to the database. We have Gold support, so quickly created a ticket. While there was a quick reply, it took a total of 21+ hours of downtime to get the issue fixed. During the downtime, there is nothing you can do to speed this up - you're waiting helplessly. Because Cloud SQL is a hosted service, you can not connect to a shell or access any filesystem data directly - there is nothing you can do, other than wait for the Google engineers to resolve the problem.
When the Cloud SQL instance was up&running again, support confirmed that there is nothing you can do to prevent a filesystem crash, it "just happens". The workaround they offered is to have a failover set up, so it can take over in case of downtime. The worst part is that GCS refused to offer credit, as according to their SLA this is not considered downtime. The SLA [1] states: "with respect to Google Cloud SQL Second Generation: all connection requests to a Multi-zone Instance fail" - so as long as the SQL instance accepts incoming connections, there is no downtime. Your data can get lost, your database can be unusable, your whole system might be down: according to Google, this is no downtime.
TL;DR: make sure to check the SLA before moving critical stuff to GCS.
Managing stateful services is still difficult but we are starting to see paths forward [1] and the community's velocity is remarkable.
K8s seems to be the wolf in sheep's clothing that will break AWS' virtual monopoly on IaaS.
[0] We (gravitational.com) help companies go "multi-region" or on-prem using Kubernetes as a portable run-time.
[1] Some interesting projects from this comment (https://news.ycombinator.com/item?id=13738916)
* Postgres automation for Kubernetes deployments https://github.com/sorintlab/stolon
* Automation for operating the Etcd cluster:https://github.com/coreos/etcd-operator
* Kubernetes-native deployment of Ceph: https://rook.io/
In addition to Rook, Minio [1] is also working to build an S3 alternative on top of Kubernetes, and the CNCF Landscape is a good way of tracking projects in the space [2].
[0] https://kubernetes.io/ [1] https://www.minio.io/ [2] https://github.com/cncf/landscape
Disclosure: I'm the executive director of CNCF, which hosts Kubernetes, and co-author of the landscape.
I remember reading somewhere in the K8s documentation that it is designed such that nodes in a single cluster should be as close as possible, like in the same AZ.
The S3 keys it produces are tied to your developer account. This means that if someone gets the keys from your NAS, he will have access to all the Cloud Storage buckets you have access to (e.g your employer's).
I use Google Cloud but not Amazon. Once I wanted a S3 bucket to try with NextCloud (then OwnCloud). I was really frightened to produce a S3 key with my google developer account.
As another option, you can continue using the XML API and switch out only the auth piece to Google's OAuth system while changing nothing else.
There's a lot more detail available at: https://cloud.google.com/storage/docs/migrating
Disclaimer: I work on Google Cloud Storage.
Kicked the tires, not impressed at all. Notes went missing from the interface could only get them back after manually digging through folders via FTP.
I also want to personally thank Solomon (@boulos) for hooking me up with a Google Cloud NEXT conference pass. He is awesome!
https://cloud.google.com/compute/docs/load-balancing/http/us...
Been trying to get one for IO (can't attend NEXT unfortunately)
Disclosure: I don't work for google but have an upcoming interview there.
Your Egress prices are quite a bit more compared to CloudFront for sub 10TB (.12/GB vs .085/GB).
The track record of s3 outages vs time your up and sending Egress seems like S3 wins in cost. If all your worried about is cross region data storage, your probably a big player and have AWS enterprise agreement in place which offsets the cost of storage.
As to our network pricing, we have a drastically different backbone (we feel its superior, so we charge more). But as you mention CloudFront, the right comparison is probably Google Cloud CDN (https://cloud.google.com/cdn/) which has lower pricing than "raw egress".
if you are api compatible with s3, could you make it easy /possible to work with google storage inside spark?
remember i may or may not run my spark on Dataproc.
It took me about 15 minutes to spin up the instances on Google Cloud that archive these objects and upload them to Google Storage. While we didn't have access to any of our existing uploaded objects on S3 during the outage, I was able to mitigate not having the ability to store any future ongoing objects. (our workload is much more geared towards being very very write heavy for these objects)
It it turns out this cost leveraging architecture works quite well as a disaster recovery architecture.
There are a large number of people out there looking intently at ACD's "unlimited for $60/yr" and wondering what that really means.
I recently found https://redd.it/5s7q04 which links to https://i.imgur.com/kiI4kmp.png (small screenshot) showing a user hit 1PB (!!) on ACD (1 month ago). If I understand correctly, the (throwaway) data in question was slowly being uploaded as a capacity test. This has surprised a lot of people, and I've been seriously considering ACD as a result.
On the way to finding the above thread I also just discovered https://redd.it/5vdvnp, which details how Amazon doesn't publish transfer thresholds, their "please stop doing what you're doing" support emails are frighteningly vague, and how a user became unable to download their uploaded data because they didn't know what speed/time ratios to use. This sort of thing has happened heaps of times.
I also know a small group of Internet archivists that feed data to Archive.org. If I understand correctly, they snap up disk deals wherever they can find them, besides using LTO4 tapes, the disks attached to VPS instances, and a few ACD and GDrive accounts for interstitial storage and crawl processing, which everyone is afraid to push too hard so they don't break. One person mentioned that someone they knew hit a brick wall after exactly 100TB uploaded - ACD simply would not let this person upload any more. (I wonder if their upload speed made them hit this limit.) The archive group also let me know that ACD was better at storing lots of data, while GDrive was better at smaller amounts of data being shared a lot.
So, I'm curious. Bandwidth and storage are certainly finite resources, I'll readily acknowledge that. GDrive is obviously going to have data-vs-time transfer thresholds and upper storage limits. However, GSuite's $10/month "unlimited storage" is a very interesting alternative to ACD (even at twice the cost) if some awareness of the transfer thresholds was available. I'm very curious what insight you can provide here!
The ability to create share links for any file is also pretty cool.
Not only is webpagetest.org a google product but it's also much better suited for the minute by minute billing cycle of google cloud compute. For any team not needing to run hundreds of tests an hour the cost difference between running a WPT private instance on EC2 versus on google cloud compute could easily be in the thousands of dollars.
Just saying, it gets you a foot in the door.