The file storage move from S3 to Magic Pocket is detailed in these blog posts:
https://blogs.dropbox.com/tech/2016/03/magic-pocket-infrastr...
https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pock...
https://blogs.dropbox.com/tech/2016/07/pocket-watch/
The network backbone is talked about here:
https://blogs.dropbox.com/tech/2017/09/infrastructure-update...
Reminds me of the infamous "SSL added and removed here! :^)".
Does anyone know if there are any additional protections from snooping on their internal network?
edit: phrasing
> We use TLS 1.2 and a PFS cipher suite at both our origin data centers and proxies. Additionally, we’ve enabled upstream certificate validation and certificate pinning on our proxy servers. This helps ensure that the edge proxy server knows it’s talking to our upstream server, and not someone attempting a man-in-the-middle attack.
(N.B.: I work on security at Dropbox, and consulted on this design)
[1]: https://blogs.dropbox.com/tech/2016/11/infrastructure-update...
In this post they say specifically it is about the handshake:
https://blogs.dropbox.com/tech/2017/06/evolution-of-dropboxs...
I don't know if they use SSL/TLS to their upstreams, I'm just saying terminating in at the edge doesn't mean that is the end of all SSL/TLS. It is totally normal to terminate SSL/TLS at the edge, pretty much anyone using an HTTPS load balancer or CDN does it, but the LB or CDN can still use SSL/TLS to the upstreams and verify certificates of upstreams.
https://www.wired.com/2016/03/epic-story-dropboxs-exodus-ama...
For companies where the value lies in the utility of a service that can't be easily replicated, you have pricing power to make the convenience of AWS worth the expense.
Edit: Amortization/Depreciation can <i>generally</i> be used interchangeably.
Realistically, the valuation multiples on EBITDA for mature SaaS businesses are different from those for datacenter operators, and someone building a financial model for Dropbox would now likely take a blend of these this into consideration. So EBITDA is by no means the end-all be-all here.
EDIT: And taxation is a different story altogether, as in either the cloud or own-hardware case, Dropbox can write off expenses or depreciation respectively.
I don't think this article is news. The editorialized title in HN is unfortunate and I wish a mod can change it. The article talks about the reasons Dropbox decided to move 500PB storage into our own data centers. Also like the article mentions, "They still use AWS for some workloads".
Edit: Here is another HN submission of the same piece with the correct headline. https://news.ycombinator.com/item?id=15257331
You could just as easily say it's amazing that Amazon allows a competitor like Netflix to run on it's platform. The reality is that relationship is more complex than that.
AWS doesn't even let you do BGP. If you want to use them for a CDN, you're monopolized into their network and their blend of ridiculously overpriced bandwidth.
They might be using AWS for the canonical store of data, but the sanity of using "the cloud" goes out the window the second you need to ship a lot of traffic to the public internet.
[1] https://media.netflix.com/en/company-blog/how-netflix-works-...
let's assume they have 12 buildings at each region with 500 racks in each. 6k racks
Let's assume that S3 is 1/4 of their infrastructure.
Let's assume you can put 60 * 12TB disks in 4U and they use 52 racks. That's 136012TB per cabinet or 9.4PB per rack.
1500 racks at 9.4PB raw capacity with RF=2.1 so best case that's that's 1500 * 9.4 / 2.1 = 6700PB at a site.
So 500PB would be 7% of the space.
They're probably growing 5-15% per month like most hot companies now, so that's 1-2 month's growth.
Of course these numbers are certainly off. But probably in the magnitudinal ballpark.
But who knows...
I personally never asked for that much but I'd speculate that they won't give you that in any single region.
1. Network throughput on the appliance is fast but for an Enterprise level the 10gigE cards used become a bottleneck for transactions because of how the software hypervisor scales the data. 2. Power consumption of the appliances in a rack mount environment are too high and leave needed space that has to stay empty because of the facility power per rack limitations. 3. The software hypervisor scales the stack vertically and relies on the software to load balance horizontally. The performance in a high transactional environment becomes dependent on the software to scale instead of the natural horizontal distribution that can be setup on the hardware out of the box. Standard multi-purpose storage arrays scale horizontal with very little over head from traditional software storage management. I only found one company whose software does not force the stack to be vertical but they fail to meet a reasonable performance in network/power.
Streaming petabytes of data to keep a dynamic constant (static overall storage requirement that changes it's data life cycle via retention rules) becomes very hard with premade hardware.
Does anyone have any recommendations or has attempted a similar exodus from S3 that they can share?
Granted, I stopped paying attention to tape drive capacities a while ago, but the upcoming LTO-8 standard will "only" support 13TB/tape, so you'd need 38,000 of them to hold 500 PB.
I've seen some higher density announcements in the 200 - 300TB/tape range, but couldn't find any products available.
AWS does offer a "Snowmobile" [1] product, that can hold 100PB on disks in a tractor trailer.
[1] https://aws.amazon.com/blogs/aws/aws-snowmobile-move-exabyte...
And I did have a flashback to my first hard drive, all 220 glorious megabytes of it, and a great deal more expensive than "a decent lunch", and get a sweet hit of that yeah... I am in the future, aren't I? feeling.
Moving 500PB on a live application is not a trivial task.
I know that Dropbox doesn't care about my tiny dollars and all. But why not let customers pay for what they use? This "constant growth" bullshit is probably the reason they don't care.
Who believes this?
It was, easily noticable by everyone, wanting to reduce the cost. Which is fine, everyone does so, so why not admit that it was the primary impetus?
I would not want to outsource control over any larger company that I were to run to other, even bigger companies.
However, then you've had Amazon tell you "no" when you try to get more resources, and when you've had your business go offline for hours while you wait for news from Amazon, you may want more control.
It's very possible that they did this all on their own provided they rented cabinets/cages in an existing facility like Equinix, which is extremely common. Then they do not need to manage power/generators, fiber into the building, or any other data center necessities.
It does not take very many people to do the wiring inside a cage, especially considering Dropbox has been doing this over the span of a few years. If you've ever been physically in a data center (personally visited a few Equinix facilities myself), it's mostly one person from a company working in a cage that is wiring everything. I've rarely seen multiple people do wiring, and that same person will come back every day thereafter to continue working until the job is done.
What they are doing sounds entirely feasible. If money is no object in regards to equipment, with 3 data centers, I'd honestly say you only need 4 competent people to get the job done.
3 on-site. 1 remote/office.
From there, the more the merrier. A guy to lift the equipment up as well is nice sometimes - can be pretty heavy!
Here's an article from last year with a lot more detail - https://www.wired.com/2016/03/epic-story-dropboxs-exodus-ama...
(b) Without CAF, GC and AWS pricing is ridiculous - somewhat akin to paying $9.99/lb for a Purdue chicken a day before the expiration
(c) Without the scale of AWS and GC one can get vendor prices at Google/Amazon MFN + 10% any day or +5% after schmoozing.
1500 sounds like a lot, but keep in mind how many customers they have, and in how many countries they sell their services.
https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pock...
Also, as mentioned elsewhere in the comments here, it should be retitled to "Why Dropbox decided to drop AWS"
On the Amp page, it only catches one: https://imgur.com/a/P7mww
Though it's less of a problem with self-hosted AMP, but still a problem because it's still tech from company that might choose to improve web and can very well do it with Chrome market share, but instead decide to replace it with own thing.
I have good reasons to not trust Google because over years they provided worse experience for Firefox on their own services and it's still annoyingly true on Android.
So the solution is SO bad that the "amp" tag even becomes part of the URL?!
Now THAT is really weird ... well aside from me disliking AMP anyway, that ... hmmm. We can send people to the moon, probably soon elswhere but ... we have to denote that ... we use AMP by ... a appended string called 'amp'.
Google should be ashamed of this "engineering".
The title of this HN link leads you to believe this is recent, but the article makes it seem like a multi-year effort, that in fact, could have finished a long time ago.
The timeline is really, really significant here. Did they initiate this back in 2013? 2015?
My company has seen a remarkable shift in trust for online services change the last two years. Was Dropbox ahead of this curve? Were they someone who brought this change in mentality around?
This blog article is crap because without a basic sense of _when_ it could be that they were riding the coattails of other industry leaders. Or they were leading the charge.
Dropbox has moved user data from S3 to its own colocated data centers over the past few years, and is also doing compute in those data centers too. The compute actually existed for quite a long time - in the past you'd be talking to a Dropbox run server which would connect back to S3 to retrieve data.
Dropbox is definitely still an AWS customer, just not a major S3 or EC2 customer anymore. For example, all transactional email uses SES, and DNS is hosted on Route 53.