This bit us once. Someone issued a `shutdown -h now` out of habit in an instance that was going for reboot, and it came back without its data, because "shutdown" is the same as "stop", and "stop" on ephemeral instances means "delete all my data". Since the command was issued from inside the VM, no warning or message that would've appeared on the EC2 console was displayed.
Amazon's position on ephemeral storage was shockingly unacceptable and unprofessional. They claimed they had to scrub the physical storage as soon as the stop button was pressed for security purposes, which is a complete cop-out. Of course they can't reallocate that chunk of the disk to the next instance while your stuff is on it, but they could've implemented a small cooldown period between stoppage, scrubbing, and reallocating the disk so that there would at least be a panic button and/or so accidental reboots-as-shutdowns don't destroy data. The only reason they didn't do that is because they didn't want to need to expand their infrastructure to accommodate it. Very sloppy, and not at all OK. That's not how you treat customer data.
Fortunately, AWS has moved on; I don't think that any new instances can be created with ephemeral storage anymore. Pure EBS now.
>I also learned that RDS is not truly HA in the case of upgrading servers, both minor and major upgrade. I've tested major upgrade and saw DB connection unavailable up to 10 min. In some minor version upgrades both primary and secondary had to be taken down.
You need multi-AZ for true HA. Failover within the same AZ has a small delay, as you've noted.
>I still enjoy using RDS, the service is stable and quick to use, but just make sure you have the habit of backuping and take serious ownership and responsibility of the database.
As many others in this thread have said, AWS and other cloud providers aren't a silver bullet. Competent people are still needed to manage these sorts of things. GitLab most likely would not have fared any better under AWS.