For the longest time we tried to convince people that they should have an off-amazon archive of their S3 data ... we even ran an ads to that effect in 2012[1].
The (obvious) reason this isn't compelling is the cost of egress. It's just (relatively) too expensive to offload your S3 assets to some third party on a regular basis.
So if R2 is S3 with no egress, suddenly there is a value proposition again.
Further, unlike in 2012, in 2021 we have really great tooling in the form of 'rclone'[2][3] which allows you to move data from cloud to cloud without involving your own bandwidth.
[1] The tagline was "Your infrastructure is on AWS and your backups are on AWS. You're doing it wrong."
That doesn't appear to be what they're doing, they don't seem to have changed their existing operating model at all:
> R2 will zero-rate infrequent storage operations under a threshold — currently planned to be in the single digit requests per second range. Above this range, R2 will charge significantly less per-operation than the major providers. Our object storage will be extremely inexpensive for infrequent access and yet capable of and cheaper than major incumbent providers at scale.
What I read this as is "we won't bill you until your traffic spikes, then you'll pay us, oh how you'll pay us"
Transparent bandwidth pricing would be a far more interesting announcement. This is the second post I've seen from CloudFlare in recent months throwing bricks at AWS over bandwidth pricing, while failing to mention CloudFlare bandwidth is some of the most expensive available.
The announcement says that they're eliminating bandwidth charges three times.
I don't know the whole economics around cloud storage and bandwidth so maybe this is unrealistic pricing and your suspicions are well founded. However, Backblaze seems is offering storage at $0.005/GB and bandwidth at $0.01/GB. Cloudflare is charging 3x more than Backblaze for the storage and $0 for the bandwidth. Given that Cloudflare's costs are probably lower than Backblaze for bandwidth, that doesn't seem so unreasonable - but I could be very wrong.
I think Cloudflare probably sees R2 as something that is sustainable, but creates demand for their enterprise products. You start The NextBigThing with R2 and suddenly your application servers are under attack. You have a relationship with Cloudflare, you're used to their control panel, you trust them, and when you're at the scale that you're getting attacked like this you can drop $10,000/mo because you're bringing in a bunch of revenue - $10,000/mo is less than 1 software engineer in the US.
R2, in a certain way, can be a marketing tool. "Come use our S3 competitor with free bandwidth rather than getting locked into AWS's transfer pricing." 6-12 months go by and you're substantially larger and want more complex stuff and you're already getting emails from Cloudflare about their other offerings, you see them in the control panel, etc.
It seems like Cloudflare might be trying to move in on AWS's market. R2 is an easy way for them to do it. It seems like S3 has high margins. Competing storage services can be a fraction of the cost per GB and AWS's bandwidth markup is incredibly high. If you're looking to attack a competitor's market, it seems like going after one of their highest-margin product could make the most sense. Again, R2 becomes a marketing tool for future cloud offerings.
Part of Cloudflare's strategy might be targeting things that they see very high margins on and being willing to accept lower margins. If something has 50% margins and you're willing to accept 20% margins, you're still doing pretty great. Plus, over time, the cost of hardware comes down and you can keep your prices at the same level once people are happily inside your ecosystem and don't want to deal with migrations.
> CloudFlare bandwidth is some of the most expensive available
It sounds like you might have gotten burned by something with Cloudflare. I don't have any horror stories, but I'm always interested in new data points if you have them.
Isn't B2 from Backblaze already filling that need? I means more choice is always better for sure, but considering R2 goal seems really to be a CDN more than a backup space and it does feel like their money maker is in the CDN part, not the storage part... I feel like trusting them to store it long-term without using the CDN part is a little bit risky.
Eg: my employer already has stuff on clouflare, using their services is just as easy as pulling their terraform provider. OTOH, for backblaze, I'd have to go through the whole evaluation process, security and legal compliance etc etc...
Maybe I'm reading this wrong, but the data does pass through the machine where rclone is running. rclone does support remote-to-remote transfers[0], but I believe only for remotes of the same type (ie S3 to S3).
You can follow that howto with any two cloud endpoints if you wanted to.
>> This cheaper price doesn’t come with reduced scalability. Behind the scenes, R2 automatically and intelligently manages the tiering of data to drive both performance at peak load and low-cost for infrequently requested objects. We’ve gotten rid of complex, manual tiering policies in favor of what developers have always wanted out of object storage: limitless scale at the lowest possible cost.
The amount of effort it takes to understand and account for S3 Intelligent Tiering is somewhat mind-blowing so to get rid of all of that (and the corresponding fees) would be really nice and TheWayThingsShouldBe™ for the customer -- on top of that most users just don't even know S3 Intelligent Tiering exists so it'll be great if Cloudflare just handles that automatically.
We at https://vantage.sh/ (disclosure, I'm the Co-Founder and CEO) recently launched a cross-provider cost recommendation for CloudFront Egress to Cloudflare which was really popular and I can imagine doing something similar for S3 -> R2 once it is live and we are able to vet it.
When looking at object storage, tail latency is probably the single most overlooked metric, and the most material differentiator between providers after bandwidth costs. Don't sweat the cent spent on storing an object, worry about the cost of the 6,000,0000 copies of it you'll ship after it's stored.
As for bandwidth, CloudFlare becomes uninteresting the moment your account starts to see any real consumption, even AWS are easier to negotiate with.
* R2 will support the same object sizes as S3? We have 500GB+ objects and could go to a 1TB per object. * R2 will support HTTP Range GETs, right?
Egress bandwidth for objects on S3 is the biggest line item on the AWS bill for a company I work for, by an order of magnitude, and this will just wipe it off for the most part.
Yes to range requests. Current object limit is smaller than that, but we don't have a fundamental restriction there. Shoot me an email to gmckeon [at] cloudflare.
I had tried using BackBlaze 8 months ago as a much cheaper (especially with B2 and CF's Free Data Transfer partnership) replacement for Amazon S3 and was running into a limitation on B2.
I had a scenario where my users can upload images from the browser to BackBlaze. I wanted the ability to control the file name of the uploaded file. I don't want the user to be able to modify the network request to upload the file with a different file name. Nor do I want the users to be able to upload files with names which would overwrite existing files.
B2 didn't let me upload files with a specific filename in the pre-signed URL.
For example there's this API:
curl https://apiXXX.backblazeb2.com/b2api/v2/b2_get_upload_url -H 'Authorization: XXXXXXXXXXXXXXXXXXXXXXXXX=' -d '{"bucketId": "XXXXXXXXXXXXXXX"}'
which gave me an upload URL where I can upload the file to.
And then there's this:
https://www.backblaze.com/b2/docs/b2_upload_file.html
which lets users upload files to that URL.
But this allowed my users to upload file with any name they want. It would also allow them to overwrite existing files (from other users).
My question is more from a security point of view so preventing one user from overwriting another user's content is crucial. For example, lets say you right click on an image from someone else on facebook and get the actual image's file name. Now you try to upload an image on facebook and you edit the network request in the browser's inspector tool to the image file name which you got for another user. Facebook obviously prevents this in their own way using pre-signed urls which include the filename in the signature. However on BackBlaze if I try this, the "pod" url which is received doesn't include any file name signature. The pod URL is just where the image gets stored on your end. A user can easily edit the network request and modify the "X-Bz-File-Name" Header to another user's filename. This would be a major security vulnerability if I went with BackBlaze. As a workaround, right now it seems like users would first have to upload files to my own server, then my server would have to upload them to BackBlaze to avoid this issue. This sounded like hassle.
Amazon S3 solves this problem using createPresignedPost which includes a signature of the filename in the URL. I contacted BackBlaze's support and got a response their S3 api doesn't support createPresignedPost:
https://www.reddit.com/r/backblaze/comments/kzszym/is_backbl...
You can read B2's staff's response to my question:
https://www.reddit.com/r/backblaze/comments/l0c9s7/is_there_...
Is there a way to prevent this on R2? Something where the link provided by b2_get_upload_url (whatever R2's equivalent will be) only works for specific a file name?
At Taloflow (https://www.taloflow.ai), (disclosure: I'm the CEO/Cofounder) we provide buying insights for cloud object storage (and soon other IaaS/PaaS). We will definitely be adding Cloudflare R2 to the mix.
Video files are large, although ~20 cents per video streamed for a small website is manageable (S3, Cloud Storage, Azure...), it's the potential for abuse that could drive my bill up that terrifies me, which is why I decided to stick to Hetzner VMs with their 20TB of free egress.
https://www.cloudflare.com/en-gb/products/cloudflare-stream/
Both of these services significantly reduce the file-size with a re-encode, even if they promote an "impercievable quality loss". They seem to be more suited to high traffic on-demand streaming for websites, promotional material, etc.
>If your monthly egress data transfer is greater than your active storage volume, then your storage use case is not a good fit for Wasabi’s free egress policy
>If your use case exceeds the guidelines of our free egress policy on a regular basis, we reserve the right to limit or suspend your service
Unless I am misreading, wasabi can shut you down if your egress is high.
This product seems a good fit for their cloudflare workers.
2 - Cloudflare is going heavy on egress costs. Which they have a lot of extra opportunity costs, comparible to the big 3 cloud providers and they are putting their "weight" in good use:
https://blog.cloudflare.com/bandwidth-alliance-charges-forwa...
Egress/ingress costs is also important for pricing fyi.
3 - Cloudflare has a known good reputation for their services ( eg. Uptime and costs) + an active community. ( Eg. Check their blog )
Tbh. I didn't knew Wasabi, will check them out though.
4 - Their datacenters are at the edge. Hard to compete on that physical advantage in terms of speed.
Ps. Yes, I could be considered a fan, similar to AMD in the past ( still do fyi).
https://wasabi-support.zendesk.com/hc/en-us/articles/3600594...
Taking the have-a-much-better-product route to siphoning use from AWS is particularly ambitious. I hope it works out. AWS has had it a little too easy for too long
And there is a lot of things that can be improved - like egress price, better support, get rid of regions completely, etc...
Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service. [1]
--
[1] 2.8 Limitation on Serving Non-HTML Content https://www.cloudflare.com/terms/
If R2 can be approximately as reliable on ingest as AWS/GCS/Azure is, but without the egress fees of the other major providers, then $0.015/GB-month seems like a pretty good deal.
Backblaze, DO Spaces simply were not designed for this in the first place.
Strategy Letter V, commoditize your competitor's advantages!
> We’ve gotten rid of complex, manual tiering policies in favor of what developers have always wanted out of object storage: limitless scale at the lowest possible cost.
Cloudflare has a clear strategy: Be the simplest cloud platform to deploy to. It has been a breeze as a small dev shop adopting their tech. AWS started with the startups, but have since long struggled to keep up that simplicity in face of supporting what must be a dizzying array of customer requirements. Remains to be seen how Cloudflare fares in that regard. I like my Golang better than Rust.
> Cloudflare R2 will include automatic migration from other S3-compatible cloud storage services. Migrations are designed to be dead simple.
Taking a leaf out of Amazon Data Migration Service and its free transfers from elsewhere into RedShift/RDS/Aurora/OpenSearch. Niice.
> ...we designed R2 for data durability and resilience at its core. R2 will provide 99.999999999% (eleven 9’s) of annual durability, which describes the likelihood of data loss... R2 is designed with redundancy across a large number of regions for reliability.
S3 goes upto 16 9s with cross-region replication... and so wondering why R2's still at 11 9s? May be the mutli-region tiering is just acceleration (ala S3 Accelerated Buckets) and not replication?
> ...bind a Worker to a specific bucket, dynamically transforming objects as they are written to or read from storage buckets.
This is huge, if we could open objects in append-mode. Something that's expensive to do in S3 (download -> append -> upload) even after all these years.
> For example, streaming data from a large number of IoT devices becomes a breeze with R2. Starting with a Worker to transform and manipulate the data, R2 can ingest large volumes of sensor data and store it at low cost.
Okay, where do I sign-up?
> R2 is currently under development...
Oh.
On a serious note i’m wondering about the signed urls and ACL capabilities of the cloudflare offering cause this is something we use.
I’m also interested does R2 replace S3 and CloudFront at the same time? That’d be nice and one headache less.
Seems like they are going to automatically replicate data to other regions. Something like tebi.io is doing for a long time already, it is a geo-distributed S3-compatible storage that is replicating data across the world to reduce read/write latency and increase speed.
If it is done right, this might increase download speeds by a lot especially for big infrequently accessed files.
The main question: how can Cloudflare make this into a sustainable business?
* cost/gb is cheaper or same as s3, gcp, azure
* no egress charges to customers, but they still have to pay for transit when they cross an AS!
what is the hidden angle Cloudfare is using here?
The offer is no doubt competitively prized, but it's no doubt much more lucrative than people using Cloudflare as a CDN for their s3/b2/azure/whatever.
Eg. What could be the hidden angle of providing domains at cost ( which they do).
I think it's a matter of having a more complete cloud suite and expanding their portfolio while gaining market share.
Clarification: I don't think they do this at cost. Just cheaper than the big 3.
They seem to learn a lot by centralizing a lot of the internet's traffic through their core. Perhaps it makes their ddos-protection more robust when they can train on "what normal usage" looks like "for the longer tail".
Perhaps they are building moats around that business and expecting a future when Fortune1000 requires their cdn.
It's a net gain, with a storage price higher than B2 or Wasabi, and also moves your data to their platform for more ecosystem lock-in.
That's their advantage -- a global network that only exits to the local AS.
Can somebody TL;DR what does being member of the "bandwidth alliance" mean?
Cloud providers normally have huge profits on egress data ( outgoing) and not incoming data ( to attract giving them your data).
This additionally incentives customers to stay within a certain cloud and not using competing products based on costs ( since egress costs to a 3rd party makes the price difference obsolete)
Cloudflare ( i think) started the bandwidth alliance to make these prices more fair. As in: work together, have direct network connections and reduce customer billing.
AWS didn't want to join as they are n°1 currently: https://blog.cloudflare.com/aws-egregious-egress/
https://www.protocol.com/enterprise/cloudflare-r2-storage-aw...
- Daily inventory report of the contents of the bucket in parquet format
- Signed URLs
- Lifecycle policies based on tags (but to be honest just a policy that isn't restricted to suffix/prefix would be amazing)
- Bucket policies (restrict object key structure)
Lots of these work well enough in AWS but are often lacking in some regards with annoying restrictions.
Looks like an amazing product, good luck!
What's the catch? Imagine a few cases. Let's assume s3 volume rate of $50/TB.
-I post 1gb video file on Reddit. 100k downloads / month: $5k
-I make a 1gb desktop app. I have 100k downloads / month: $5k
-I post 100gb data file on Github. 10K downloads / month: $50k.
Would I pay $0 on R2? And would there be throttling / rate-limiting?
[Edit: Added more realistic examples]
Most people don't even have 100k followers. The follower-to-click ratio isn't going to be 100% in any situation. Click-to-full-download ratio is going to be low too, especially so at 100GB. A lot of people don't have that much free space on their devices even!
I think this scenario is firmly in outlier land, and thus not really relevant to cloudflare's calculations.
I'm hoping this means what I think it means, that write latencies will be minimal across the globe, since writes will be persisted and ack'd at the closest region and then eventually consistently propagated to other regions?
If so, curious what would happen in a scenario where a region requests an object that has been persisted at another region but not yet propagated? Will it result in a 404 or is the system smart enough to route the request to the region that has the file at the cost higher latency?
From my research so far into S3's cross region replication, the latter behavior doesn't seem possible out of the box since requests have to specify a single region (S3 experts, please do correct me if I'm wrong), so I'm hoping CloudFlare with its deep expertise in managing a global network can differentiate here. Even if it's not offered out of the box, due to the lack of egress costs, it's a lot more feasible to build in this behavior in the application layer with R2 by just racing requests across several regions and taking the one that resolves first (or at all), so very promising regardless.
Also, would love to hear some numbers on what kinds of write latency to expect. From my experience so far, S3 writes for tiny files in a single region take on the order of 50ms ish even for clients in close physical proximity, which is serviceable for my use case, but seems higher than it needs to be (and every little bit I can shave off on latency helps tremendously for what I'm building). Really looking forward to seeing what the CloudFlare solution is capable of here.
Lastly, S3 didn't advertise and guarantee strong read-after-write consistency for same region read/write until late last year. Will R2 offer this out of the gate?
- Writes and reads always goes to the closest region, this keeps network latency low. You can define different metadata write concern levels for each bucket, this way you can define how fast or how consistent your writes will be on a global scale. You can even make them asynchronous, meaning that once the data transfer is complete, metadata write and propagation is performed without you waiting for it to complete.
- If you write to one region and someone is trying to read that object from another region - it will be possible as soon as metadata is replicated to that region (usually it takes less than 500ms). If data is not yet replicated to that region, then it will be read from another region. If the data is partially replicated, then that part will be read from the local storage and the rest of the data from another region. Additionally, Tebi supports synchronous replication allowing almost instant data propagation across the world.
- Write latency depends on metadata replication write concern - it can be faster than AWS or slower, you can configure it yourself.
- You can define where you want to store your data and how many copies you need in each region.
It's eventually consistent for global replication, but additionally there should be a consistent index of things that are available in other regions? I suppose that's plausible. Seems to defeat a lot of what they're avoiding.
Nothing stopping you from doing a scatter gather on a 404 base on some heuristic in your application code though.
We support per-object TTLs, so this should work!
> Behind the scenes, R2 automatically and intelligently manages the tiering of data to drive both performance at peak load and low-cost for infrequently requested objects. We’ve gotten rid of complex, manual tiering policies in favor of what developers have always wanted out of object storage: limitless scale at the lowest possible cost.
Is it built on Ceph's S3 compatibility?
Your durability numbers imply erasure coding. Is that the case?
Well, you can't really build inexpensive, large scale, reliable storage without erasure coding. So that's probably a given.
(Workers KV is great but there's a ton of times that you just need an actual relational database)
Although egress (outbound network) can be a significant part of object storage expenses, if you are reading and writing small objects, per-request expenses can be much bigger. Cloudflare indicates that for low request rates there won't be any request fees, but doesn't state what they will charge for high request rates.
My article points out that the best deal when working with high request rates is to use services that don't charge per request such as DigitalOcean, Linode, and Vultr. If it's S3 that you want, even Amazon has recently joined the budget club with Lightsail Object Storage which has monthly plans of $1, $3, and $5 (250 GB storage and 500 GB egress) with no per-request fees.
Does that mean automatic caching across regions? Low-latency read access everywhere without an extra CDN in front of it?
We're still deciding on how we want to handle caching. We integrate with Workers, so manually caching is always possible. The catch is we're currently building for strong consistency - if we added a cache in front, we'd weaken that - so it will likely be a configuration option we add later.
We all knew that the big players are really maximising their profits on the egress charges, so I can see that this left some potential for someone to step in. No egress charges at all still sound a bit too good to be true, but it would be nice as that's just one parameter less to think about.
Another interesting aspect are CloudFlare Workers. As far as I can tell they're not a full replacement for something like AWS Lambda if e.g. I need to do a bit heavier stuff on the data in R2. Being able to do heavier processing close to the actual data would be really interesting as well.
That is why SeaweedFS added a gateway to remote object store. https://github.com/chrislusf/seaweedfs/wiki/Gateway-to-Remot... , which asynchronously write local changes to the cloud. If there are enough local capacity, there should be no egress cost.
Hopefully, this can change the common pattern, to really treat the cloud object store as a backup.
We’d stick R2 in front of an S3 bucket and wipe off a the biggest portion of our bill.
S3 does absolutely have a ton of other stuff like events, lifecycle, Glacier, Lambda etc and is plugged into the AWS ecosystem, so I doubt we’re exiting it completely. But this is a solid option as a front for egress from S3.
We're fully integrated with Workers, so you can write a Worker that calls additional logic when a request is made to the bucket's url.
We have notifications, where a change to the bucket invokes a worker with a specific event, on our roadmap, but have a bunch of other work prioritized ahead of them.
It's like going to a restaurant, asking if they have shrimp scampi, and getting a reply that you can go to a supermarket and buy the ingredients and make the dish and bring it back to the restaurant to have with your meal.
Just tell me you don't have shrimp scampi.
How will other providers respond to this now?
AWS, GC, and others do not really pay for egress charges themselves. Those super high egress charges are pretty ridiculous.
How frequent is infrequent? In our case it's "never unless other backups fail" and for that S3 Glacier Deep Archive is still cheaper ($0.00099 per GB).
https://www.protocol.com/enterprise/cloudflare-r2-storage-aw...
So.... I still need to test, but if that is true can be a game changer. Of course... $0.00099 is almost free, but if everything is done automagically will be awesome.
This is a little bit above your quoted paragraph
Of course the other angle is that Cloudflare one day ends up like Google, who ended up like Microsoft, who ended up like IBM
As for right now though, cooooool
However, still totally worth it, easily. Just saying they have a healthy B2B business model.
"Our vision for R2 includes multi-region storage that automatically replicates objects to the locations they’re frequently requested from."
Like egress pricing, S3 starting at $5/million writes and $0.40 per million reads feels excessive.
Disclaimer: I haven't used it, but planning to, since I already use their VPS.
This announcement is noteworthy because there’s no egress fees. Spaces still charges 1c per gigabyte for egress. Not comparable in that aspect.
we are still living in 2007?
What happens to your Object Storage buckets when Cloudflare has an outage? - https://filebase.com/blog/what-happens-when-my-cloud-goes-do...
Similarly, 3-2-1 is a backup strategy and the pricing appears to already include multiple copies using the same mechanism so the correct calculation would be the cost of R2 plus whatever _different_ mechanism you choose for disaster recovery purposes such as on-premise storage or a completely different provider.
1. For example, if you use Cloudflare as your CDN / DNS provider and they have a catastrophic failure, the fact that your storage is inaccessible is just another facet of the same issue.
Is this going to be content-neutral, like Cloudflare was when fronting ISIS websites?
Or is this going to be fine-until-bad-PR, like when Cloudflare decided to stop hosting The Daily Stormer?
There is a special kind of lock-in when it comes to object storage, as generally you use something like this when the data is too big to store another copy of locally or at another provider. It's not like you can easily maintain provider independence, and if Cloudflare decides one day that some of your UGC in a bucket isn't something they want to host, what happens then?
Is the data lost forever because your account is nuked? Is there a warning or grace period?
I am hesitant to put any large amount of data into a service without a crystal clear statement on this, so that I can know up front whether or not a business needs to maintain a second, duplicate object store somewhere else for business continuity.
If Cloudflare in practice is going to nuke the account the moment your site ends up hosting something objectionable, this DR requirement (a second provider that also stores all objects) needs to be factored into a customer's costs. (It may be that the bandwidth savings still make it worth it to use Cloudflare even with double storage.)
It's a mistake to rely on a clear statement when you can't afford to lose your data. Stuff happens all the time... mistakes, malware, an expired credit card, etc. Independently of the provider you decide to use, I'm not sure if a backup is optional in your case.
You raise an interesting point regarding object storage being a relatively unique type of lock-in.