Like other technical fads, everyone will probably come back to servers they can reach out and touch when needed, sooner or later.
This was the role of shared hosting in the past. Several years ago, everybody realized that having root is better. Now, instead of colocating two servers and negotiating transit and dealing with remote hands, you can spin up two Linodes for $40 and have enough power to build anything. Critical mass? Add three more. You're not waiting for a shipment of servers to the datacenter to handle a sudden load from a positive mention on HN.
Saying that the cloud is a fad and we should all own our gear does two things: (a) increases humanity's carbon footprint, since most organizations never utilize hardware to their full potential, and (b) guarantees that only those with significant capital to buy a fleet, a cage, and power will ever compete in the Internet space, which is where we were many years ago. It is very arguable that the cloud is progress, and everybody sitting on the sidelines calling it a "fad" is scared by it.
Jeremy Edberg of Reddit had a good comment later in that thread, to someone who paralleled the cloud to electricity generation:
http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...
What sucks is, my remarks really depend on what you define "cloud" as, which -- partially thanks to Microsoft television commercials -- is currently up in the air.
So you're right that the cloud is great for startups. It is not so great for established stuff.
Reddit for example doesn't seem to have one and seems quite stuck.
You are wrong.
> everyone will probably come back to servers they can reach out and touch when needed, sooner or later.
No they won't, because most of us don't want to be managing hardware, ever.
There are definitely hiccups, but I can't imagine many guys running an internet-heavy business going forward are seriously going to say "let's build out our own datacenter rather than solve the issues with the cloud" unless they're doing something really, really, specialized.
Personally, if I'm going to be operating a large computing environment, I'd rather stick 80% of my workload in a cloud environment and pay someone to deal with utilities, buildings, hardware, etc.
The remaining 20% may require a "higher touch" setup at a colo or a facility that I control. The smaller I can make that 20%, the less I need to spend on setting up and maintaining infrastructure.
There ain't nothin' new under the sun...
Can anyone tell me if MySQL fares any better than Postgres on a single EBS volume? I wouldn't assume it does but I shouldn't be making assumptions.
http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...
That being said, we get more bang for our buck by spreading our data across many small databases that don't need much tuning beyond upping the memory defaults. The EC2 cloud isn't great for the uber-server, but it's halfway decent for many small servers.
The team develops a reputation internally for being glorified firefighting, and have trouble recruiting. More senior engineers eventually flee (having, well, choice in the matter) leaving a team heavy with junior talent with no seasoned gurus leading the way.
The company is also growing at ludicrous speed, and hiring is difficult. When the product is in such a painful state, attrition from the team is high, and with slow hiring you are barely countering attrition (exacerbating the junior talent problem), and not even close to growing the team to be in a position to take care of the problem for good.
I suspect this is an industry-wide problem though, and is hardly unique to this place.
I'm surprised Reddit ever though AWS would be a good platform to host on. You don't bitch about it, you create the best system you can and if something doesn't work, then you need to do more work. If you don't want to put in the work, then AWS is wrong for you. You don't see Heroku bitching about AWS, rather they made the thing work for them with great engineers.
If I were faced with EBS performance issues, I would see this as a big red flag, consider EBS unsuitable for the application and avoid it, rather than carrying on with such a workaround.
Honestly, since EBS vols are supposedly not tied to a single disk, the raiding should be done on Amazon's end. That it isn't is telling.
In fact there is a handy package called ec2-consistent-snapshot (https://launchpad.net/ec2-consistent-snapshot) that will manage this for you!
http://searchcloudcomputing.techtarget.com/news/1516269/Amaz...
Even worse than the EBS performance is Amazon does not offer any shared storage solutions between EC2 instances. You have to cobble together your own shared storage using NFS and EBS volumes making it sucky to the Nth power.
EC2 is fine for Hadoop-style distributed work loads, and distributed data stores that can tolerate eventual consistency, that's all good. But for production database applications requiring constant and reliable performance, forget it.
Since RDS really is EC2 + EBS, they couldn't provide any real assurances it performed better than our own installation.
We ended up moving off of AWS as a whole. After several discussions about how we can continue to scale, the ultimate answer was without AWS.
EC2 is great for distributed stuff, but when need something that is heavy IO, for instance, it is a big problem. Scaling it ends up costing more to work around AWS's performance problems than to go elsewhere.
Amazon seems more flexible, since you buy block storage (EBS) independent of instances. If you have an application that needs a massive amount of data, but only a little RAM and CPU, you can do it.
Rackspace, on the other hand, ties storage to instances. If you only need the RAM and CPU of the smallest instance (256 MB RAM) but need more than the 10 GB of disk space that provides, you need to go for a bigger instance, and so you'll probably end up with a bigger base price than at Amazon.
On the other hand, the storage at Rackspace is actual RAID storage directly attached to the machine you instance is on, so it is going to totally kick Amazon's butt for performance. Also, at Amazon you pay for I/O (something like $0.10 per million operations).
Looking at our existing main database and its usage, at Amazon we'd be paying more just for the I/O than we now pay for colo and bandwidth for the servers we own (not just the database servers...our whole setup!).
The big lesson we've taken away from our investigation so far as that Amazon is different from Rackspace, and both are different from running your own servers. Each of these three has a different set of capabilities and constraints, and so a solution designed for one will probably not work well if you just try to map it isomorphically to one of the others. You don't migrate to the cloud--you re-architect and rewrite to the cloud.
Amazon provides plenty of opportunities to mitigate for this, such as providing multiple availability zones. Reddit, if you read the original blog post, wasn't designed for that - it was designed for a single data centre.
OTOH, the variability of EBS performance is true, and frustrating. If you do a RAID0 stripe across 4 drives, you can expect around sustained 100 MB/sec in performance modulo hiccups that can bring it down by a factor of 5. On a compute cluster instance (cc1.4xlarge) it's more like up to 300 MB/sec if you go up to 8 drives, since they provision more network bandwidth and seem to be able to cordon it off better with a placement group.
The comments on reddit indicated hiccups more on a factor of 10x and, sometimes, 100x.
Either way, the issue is that the more drives you add to your RAID0, the more often one of those drives experiences a "hiccup," and kills the performance of the entire volume.
Even in your own data centre, if your FC fabric goes wonky, your whole SAN is hosed.
Never trust critical parts of your business to others.
But you're still sharing the same hardware as everyone else and its still just commodity hardware.
[1]: http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...
Hard drives are unreliable and they certainly don't fail independently of one another - but the independence of their failure is much more independent than EBS.
With physical dives and n-parity RAID you drastically reduce the rate of data loss. This is because although failures are often correlated, it's quite unlikely to have permenant failure of 3 drives out of a pool of 7 within 24 hours. It happens, but it is very rare.
With EBS, your 7 volumes might very well be on the same underlying RAID array. So you have no greater durability by building software RAID on top of that. If anything, it potentially decreases durability.
You could utilize snapshots to S3, but is that really a good solution? It seems that deploying onto EBS at any meaningful scale is a recipe for garunteed data-loss. Raid on physical disks isn't a great solution either, and there is no substitute for backups - but at least you can build a 9 disk RaidZ3 array that will experience pool failure so rarely that you can more safely worry about things like memory and data bus corruption.
If you would like to know more please send me an email: prakash [at] cedexis.com
Although you would hope that the storage components of AWS's cloud were highly reliable, I think the main benefit is not single instance reliability but being able to recover faster because of quickly available hardware.
Both types of drives CAN and DO fail, so RAID-10, fail over, and replication are a must have.
http://blogs.sun.com/marchamilton/entry/a_brilliant_argument... "Cloud Storage Will Be Limited By Drive Reliability, Bandwidth ... The key feature of ZFS enabling data integrity is the 256-bit checksum that protects your data."
ZFS will do nothing about this "More recently we also discovered that these disks will also frequently report that a disk transaction has been committed to hardware but are flat-out lying.", for instance, other than tell you the data you want isn't there to be read - like any filesystem would.
I'm not immediately planning to move out of AWS, but the trouble with EBS has certainly got me thinking about other options and has made me much less inclined to make an increased commitment to AWS.
"Elastic" is AWS's claim to fame, but I am not seeing it.
Trying to resize an EMR cluster (which is half the point of having an EMR cluster instead of buying our own hardware) generates the cryptic error "Error: Cannot add instance groups to a master only job flow" that is not documented anywhere.
(Why would Amazon even implement a "master only job flow", which serves no purpose at all?)
Also, if you want Pig you should complain about it vocally on the EMR forum. That is the best way to get them to listen to you.
Overall, EC2 is a very impressive offering, for which I commend Amazon. At times, I've been so frustrated that I'm ready to switch, but they fix things just quickly enough that I never quite get around to it. In the end, I'm willing to accept that what they're doing is hard, there will be mistakes, and it's worth suffering to get the flexibility and cost-effectiveness that EC2 offers.
200gb really isn't all that big of a database. It shouldn't have to be this hard.
The guys from Reddit also spoke about their use of EC2. Apparently they are running entirely on m1 instances which suffer from notoriously poor EBS performance relative to m2 and cc1/cg1 instances.
(Edit: I hadn't considered the possibility of somehow killing all my instances through human error. Ouch. That probably warrants one slave on EBS per AZ.)
We released a dropbox-like product to sync and the back-end is on EBS. Yesterday we saw two times when a device got filled to 7GB and as it got closer it became slower and slower and slower. We did not have any instrumentation/monitoring in place and we were immediately suspect it was something on our end.
We (wrongly?) assumed reliability and (decent) performance from AWS.
This kind of complaint reminds me of people who buy a product that does A very well, but then they trash it in reviews for not doing B. It was never advertised as doing B, but you'd never know that from the complaining.
We tried GoGrid and they lost or crashed our server instance.
I've personally used Rackspace, so far so good, but I've only been doing development on it.