“Let's talk about a hypothetical public-facing service” (opens in new tab)

(reddit.com)

439 pointskgm10y ago67 comments

67 comments

34 comments · 11 top-level

smacktoward10y ago· 8 in thread

> You have a moment where you envision the future of virtualized storage and think about how great it will be when storage is nearly free and outsourceable and you can stop buying disks from Amazon every few months

The future is now! Instead of you sending money to Amazon and them sending you disks, they keep the disks and you send them the money anyway.

Progress! :-D

ThePhysicist10y ago

Sadly, depending on what you want to accomplish with your storage the future is still not here. For example, if you want to share a virtual disk with multiple servers through NFS you will still need to manually create a NFS high-availability cluster using e.g. DRBD on most cloud providers (e.g. EC2 or Azure). To be fair, MS Azure offers fully managed SMB shares as a service, but the transfer speed is quite low (60 MB/s in the best case) so they are not very useful for most tasks.

mrud10y ago

AWS has EFS [1] which should allow you to share one FS to multiple serves. It is still in preview but could help address these needs.

[1] http://aws.amazon.com/efs/pricing/

1 more reply

objectivefs10y ago

For sharing a filesystem between servers without managing your own high-availability cluster, you could try our ObjectiveFS [1].

[1] https://objectivefs.com

1 more reply

idunno24610y ago

Ec2 has efs at least, no idea if it's any good

1 more reply

windsurfer10y ago

If the disks only last so long, is it really so terrible that Amazon gets to keep the broken outdated junk?

Sanddancer10y ago

Yes. One can build a system that holds a petabyte of data, fits in less than one rack, has multiple redundancy, for about a hundred grand these days. Multiplied by three locations, that'll get you to $300000. Add in another $35000 for three years of hosting at three different datacenters, and you get to $335000. Sounds like a lot, right?

Amazon's even more expensive. Storing one petabyte on Amazon, at the slowest, cheapest, glacier level, will cost you $10000/month, so at about 10 months, you've paid a hundred grand. At thirty six months, you've paid three hundred sixty grand. Plus, you have no hardware to show for it, while amazon sells that broken outdated junk to a reseller to regain some of their investment. Had you hosted it yourself, you could have sold all that equipment off to refurbishers and resellers and regained at least a portion of your investment back. So yeah, it is pretty terrible to choose Amazon.

2 more replies

nadams10y ago

> you can stop buying disks from Amazon every few months

This is something that has been puzzling me. Many years ago I purchased 4x 2TB 5900 RPM drives for a 4 bay ReadyNAS (cost about ~$300 for drives plus ReadyNAS). They have been spinning nonstop for ~4 years [1] and haven't had to replace a single one. Not even an increase in errors to signal that the drive is going.

Yet - I've worked on a SAN that cost hundreds of thousands of dollars and would have to replace a disk about every month.

Granted the disks in SANs probably spin faster (thus faster data access/lower MTBF) - but that high failure rate seems rather suspicious to me.

[1] http://i.imgur.com/NRQCedj.png

ChuckMcM10y ago

AFR buddy, AFR. AFR is the annual failure rate, its typically 2 - 5% of the population per year. 4 drives you don't have a large enough set to see this in action, just every day you're in danger of losing a drive by a small statistical amount.

In the Blekko cluster we have just under 10,000 drives. We have a two 20 drive 'boxes' (40 drives) from Western Digital, as drives fail we pull replacements from the 'new/refurbished' box, and we put the dead one in the outgoing box. When we get up to 20 we RMA them in bulk, 20 go out, 20 more come in. That becomes the new 'new/refurbished' box.

It really isn't SAN vs non-SAN it is all statistics.

That said, if you're running your ReadyNAS with raid 10 (mirrored drives in a RAID 0 config) you may find some unpleasantness when a drive does fail. Statistically you have a 1/10 chance of not being able to re-silver the mirror for a 5900 RPM desktop SATA drive. That gets a bit painful.

2 more replies

intrasight10y ago· 5 in thread

Got a good chuckle. Such is life in IT. Gotta admit I missed the "Extremely Massive Corporation" hint.

glass-10y ago

It reminds me of Douglas Crockford's talk where he mentioned that a company asked him for an exemption to the "do no evil" clause in the JSON License, and he said he didn't want to name the company as that would embarrass them so he would instead give their initials: IBM.

MichaelGG10y ago

It's not embarrassing; their lawyers are just being cautious, as befits a company with huge customers like theirs. The "do no evil" clause is stupid and utterly ambiguous. If Crockford becomes a fundie, then is any gay-related org in violation of the license?

It's his right to make up ridiculous licenses. Like the sisterware license (you can use the software if you send me a pic of your sister if you have one), people shouldn't take it seriously and avoid code licensed like that. That Crockford doesn't get this is either him trolling or being clueless.

myth_buster10y ago

The comment in question

https://www.youtube.com/watch?v=-hCimLnIsDA

microcolonel10y ago

I laughed so hard while reading that.

tekromancr10y ago

Care to elaborate?

4 more replies

zatkin10y ago· 4 in thread

I was hoping they would stay offline. It's really disappointing to see a good service turn into a money wringing desperation.

pjc5010y ago

I was thinking similar thoughts from the other direction - that they have turned to money wringing desperation might mean there is no money to get the thing back up. If the situation turns out to be "it's dead Jim, we need to spend several million on a replacement EMC filer or shut down forever", it might be gone along with all the valuable projects hosted there.

In some ways the bundleware served like a SMART warning, causing people to back up and migrate their projects.

morganvachon10y ago

The way I discovered SF was down was when I was trying to install some Slackbuilds and kept getting MD5 mismatches on the source downloads. I noticed I was getting the same MD5 for several different packages, so I tried to manually grab a package and that's when I hit SF's error page (which sbopkg was happily downloading and trying to pass off as somesourcepkg.tar.gz).

I found myself in the odd position of hoping SF would come back up so I could finish what I was doing, when I'd normally welcome the news that they had shut down.

pascalmemories10y ago

Sadly I too found it was screwed a few days ago when trying to pull some source code hosted there (ironically so I would have my own copy in case things went wrong there...). I just got their 'Disaster Recovery Mode' notice and a subset of binary packages available.

Lots of google searching for what's wrong with Sourceforge revealed nothing (other than complaints about their packaged installers/crapware). Now I have the answer to the question I was really asking.

I just need to wait and see if the developers of the packages I need will somehow migrate to github so I can get the source...

stuaxo10y ago

It does still seem to be offline, for a whole bunch of projects - I like to check in on dosemu from time to time and it is fucked !

http://www.dosemu.org/

What's happened to the mailing list archives for all these projects, are they just down the toilet now? And of course the source, this is bloody terrible !

rosser10y ago· 3 in thread

...Extremely Massive Corporation...

I knew EMC storage was utter shit when, upon attempting to create a new RAID group, I realized that the configuration tool's default was to stripe across drives within a shelf, not to create stripes that span shelves.

Worse, to create the more fault-tolerant, shelf-spanning RAID volumes, one must manually add drives, one by one to the array, in a process that involves about 44 (slight hyperbole) clicks per disk.

And then there was the fact that the configuration tool was Windows-only.

Yeah, screw those guys.

angry_octet10y ago

Actually it sounds like the EMC software was trying to save you from something which is a bad practice. If you want raid 1+0 it is better to have two disks on the same controller/chassis in raid 0, then raid 1 (or raid 5/6) across chassis. Otherwise a chassis failure will take fail both leaves of your raid mirror.

Having said that, EMC is stupidly overpriced bloat.

rosser10y ago

Did I really need to spell out the geometry of the RAID I was creating to avoid this kind of nitpicky follow-up?

It was a 20-something disk RAID 10 [1], arranged so that every mirrored pair of disks spanned different enclosures, in order to mitigate the failure of any one shelf — that is, interleaving mirrors across controllers and shelves, exactly as you suggest I should have done — and further, such that any one shelf failing only affects the mirrors that had disks on that shelf.

EMC's software wanted to allocate the drives from two shelves, with an unequal number of drives per shelf. It was just grabbing the next however many disks, linearly.

So, no, they weren't trying to balance the mirror across enclosures or controllers. They just weren't thinking.

[1] By "RAID 10", I mean "striped mirrors" — that is, create a bunch of mirrors that span shelves and then stripe across them — not "mirrored stripes" which is what you appear to be suggesting, with "it is better to have two disks on the same controller/chassis in raid 0, then raid 1".

A striped mirror is recommended in everything I've ever read on the subject, because it puts the redundancy at the lowest level of the array's geometry.

Using a mirrored stripe, on the other hand, means that when one disk fails, any other disks striped with it, still presumably perfectly functional, can't be used; the controller must instead read from and write to the mirror. If a disk in that mirror subsequently fails, you've lost data — and remember that when striping, the chance of failure is multiplied by the number of disks in the stripe.

EDIT: Footnote.

1 more reply

kragen10y ago

I guess a "shelf" shares some piece of hardware, such as a controller card or communication backplane, which could itself fail and thus disable access to all the drives on that shelf, which a RAID 4 or 5 stripe that spans shelves could survive?

simonebrunozzi10y ago· 2 in thread

"Let's talk about a hypothetical public-facing service that offers tools for collaboration, revision control, and software publishing." - it was hard not to notice that he was referring to SourceForge. :)

Spivak10y ago

You know the title of the Reddit post was "Sourceforge is down due to storage problems, no ETA", right?

skrause10y ago

Especially since it was posted in a thread about SourceForge.

galoppini10y ago· 1 in thread

SourceForge has posted info about current infrastructure and service restoration activity at http://sourceforge.net/blog/sourceforge-infrastructure-and-s...

[Disclosure: I work for SourceForge]

M2Ys4U10y ago

>[Disclosure: I work for SourceForge]

My condolences.

eCa10y ago

> made by an Extremely Massive Corporation who until now you've generally respected.

As usual, one's respect for BigCompany is inversely correlated to one's use of their products.

nitrogen10y ago

One can hope that if this hypothetical public-facing service never returns, they will ship the backup tapes to the Internet Archive instead of Honest Bob's Social Data Mining and Market Manipulation.

amelius10y ago

> It writes a full 32 bits of numeric user ID to its filesystem, but to save a few bytes it only stores 16 bits of group IDs. Some engineer probably thought that'd be enough for anybody.

I'm having the same issue with the number of hardlinks, which, for linux ext4 systems, is limited to 65000.

bootload10y ago

'all the knowledge about how to handle this moody piece of hardware is lost, like tears in the rain'

Offworld, Roy Batty reference.

chris_wot10y ago

Yeah, they dealt with EMC. That's never going to go well.

(continues reading)

j / k navigate · click thread line to collapse

67 comments

34 comments · 11 top-level

smacktoward10y ago· 8 in thread

The future is now! Instead of you sending money to Amazon and them sending you disks, they keep the disks and you send them the money anyway.

Progress! :-D

ThePhysicist10y ago

mrud10y ago

AWS has EFS [1] which should allow you to share one FS to multiple serves. It is still in preview but could help address these needs.

[1] http://aws.amazon.com/efs/pricing/

1 more reply

objectivefs10y ago

For sharing a filesystem between servers without managing your own high-availability cluster, you could try our ObjectiveFS [1].

[1] https://objectivefs.com

1 more reply

idunno24610y ago

Ec2 has efs at least, no idea if it's any good

1 more reply

windsurfer10y ago

If the disks only last so long, is it really so terrible that Amazon gets to keep the broken outdated junk?

Sanddancer10y ago

2 more replies

nadams10y ago

> you can stop buying disks from Amazon every few months

Yet - I've worked on a SAN that cost hundreds of thousands of dollars and would have to replace a disk about every month.

Granted the disks in SANs probably spin faster (thus faster data access/lower MTBF) - but that high failure rate seems rather suspicious to me.

[1] http://i.imgur.com/NRQCedj.png

ChuckMcM10y ago

It really isn't SAN vs non-SAN it is all statistics.

2 more replies

intrasight10y ago· 5 in thread

Got a good chuckle. Such is life in IT. Gotta admit I missed the "Extremely Massive Corporation" hint.

glass-10y ago

MichaelGG10y ago

myth_buster10y ago

The comment in question

https://www.youtube.com/watch?v=-hCimLnIsDA

microcolonel10y ago

I laughed so hard while reading that.

tekromancr10y ago

Care to elaborate?

4 more replies

zatkin10y ago· 4 in thread

I was hoping they would stay offline. It's really disappointing to see a good service turn into a money wringing desperation.

pjc5010y ago

In some ways the bundleware served like a SMART warning, causing people to back up and migrate their projects.

morganvachon10y ago

I found myself in the odd position of hoping SF would come back up so I could finish what I was doing, when I'd normally welcome the news that they had shut down.

pascalmemories10y ago

I just need to wait and see if the developers of the packages I need will somehow migrate to github so I can get the source...

stuaxo10y ago

It does still seem to be offline, for a whole bunch of projects - I like to check in on dosemu from time to time and it is fucked !

http://www.dosemu.org/

What's happened to the mailing list archives for all these projects, are they just down the toilet now? And of course the source, this is bloody terrible !

rosser10y ago· 3 in thread

...Extremely Massive Corporation...

Worse, to create the more fault-tolerant, shelf-spanning RAID volumes, one must manually add drives, one by one to the array, in a process that involves about 44 (slight hyperbole) clicks per disk.

And then there was the fact that the configuration tool was Windows-only.

Yeah, screw those guys.

angry_octet10y ago

Having said that, EMC is stupidly overpriced bloat.

rosser10y ago

Did I really need to spell out the geometry of the RAID I was creating to avoid this kind of nitpicky follow-up?

EMC's software wanted to allocate the drives from two shelves, with an unequal number of drives per shelf. It was just grabbing the next however many disks, linearly.

So, no, they weren't trying to balance the mirror across enclosures or controllers. They just weren't thinking.

A striped mirror is recommended in everything I've ever read on the subject, because it puts the redundancy at the lowest level of the array's geometry.

EDIT: Footnote.

1 more reply

kragen10y ago

simonebrunozzi10y ago· 2 in thread

Spivak10y ago

You know the title of the Reddit post was "Sourceforge is down due to storage problems, no ETA", right?

skrause10y ago

Especially since it was posted in a thread about SourceForge.

galoppini10y ago· 1 in thread

SourceForge has posted info about current infrastructure and service restoration activity at http://sourceforge.net/blog/sourceforge-infrastructure-and-s...

[Disclosure: I work for SourceForge]

M2Ys4U10y ago

>[Disclosure: I work for SourceForge]

My condolences.

eCa10y ago

> made by an Extremely Massive Corporation who until now you've generally respected.

As usual, one's respect for BigCompany is inversely correlated to one's use of their products.

nitrogen10y ago

One can hope that if this hypothetical public-facing service never returns, they will ship the backup tapes to the Internet Archive instead of Honest Bob's Social Data Mining and Market Manipulation.

amelius10y ago

> It writes a full 32 bits of numeric user ID to its filesystem, but to save a few bytes it only stores 16 bits of group IDs. Some engineer probably thought that'd be enough for anybody.

I'm having the same issue with the number of hardlinks, which, for linux ext4 systems, is limited to 65000.

bootload10y ago

'all the knowledge about how to handle this moody piece of hardware is lost, like tears in the rain'

Offworld, Roy Batty reference.

chris_wot10y ago

Yeah, they dealt with EMC. That's never going to go well.

(continues reading)

j / k navigate · click thread line to collapse