AWS Tips I Wish I'd Known Before I Started (opens in new tab)

(wblinks.com)

606 pointsrichadams12y ago149 comments

149 comments

88 comments · 27 top-level

mbreese12y ago· 11 in thread

I'd also add to the list - make sure that AWS is right for your workload.

If you don't have an elastic workload and are keeping all of your servers online 24/7, then you should investigate dedicated hardware from another provider. AWS really only makes sense ($$) when you can take advantage of the ability to spin up and spin down your instances as needed.

NathanKP12y ago

The startup I'm working for has minimal scaling required but we still use AWS despite the higher cost for EC2 because the broad ecosystem of AWS products make it easier to develop interesting things quickly and efficiently.

If we went with all of our own dedicated hardware, or cheaper instances from a different cloud provider then we'd miss out on ELB, have slower and more expensive communication to and from S3, not to mention that services like Elastic Beanstalk make deploying to EC2 instances very easy compared with rolling your own deployment system. And for those who don't want to bother with administrating databases and cache machines RDS and Elasticache are going to be cheapest and fastest if your instances are EC2.

So yeah I agree that EC2 is expensive, but the benefits of living fully within the Amazon ecosystem are pretty large.

osteele12y ago

I think of AWS as a tool for prototyping and early-stage outsourcing of your infrastructure. Use it when you're fighting past market and technology risk with a single-digit team; consider dropping it in order to optimize costs when you've got more people (including fractional people) to evaluate, configure, and operate alternatives.

neverminder12y ago

Isn't relying so much on AWS like "putting all your eggs in one basket"?

berkay12y ago

AWS != EC2. Do not assume that AWS is only used as VMs. It's different for everyone but AWS provides massive savings for many companies. It makes sense for many use cases in addition to elastic workloads.

mbreese12y ago

True, but outside of S3 and Route53, how much under the AWS umbrella is much use without using at least one EC2 instance?

I can see a lot of benefit to using S3 without EC2, but after that, I'm not sure what else would be possible. Care to elaborate more?

Can you use their queues and database tools w/o using EC2? (If you are using a VPC, maybe?)

4 more replies

AaronBBrown12y ago

AWS makes a lot of sense in a 24/7 environment, particularly when you are a new startup and don't have enough information (or capital!) to make educated server purchases.

dangrossman12y ago

They don't have to make any purchasing decisions. They can rent servers from companies like Softlayer and Rackspace (#3 and #4 behind AWS for YC startups), or spin up much cheaper VPS's (Linode's #2). We're talking $120/month commitments, not buying hardware and driving to a data center to install it. Deploying to a freshly imaged physical server is the same as deploying to EC2, and they can be provisioned for you in an hour or two. Each of those servers gets you many times the performance of an EC2 instance in the same price class, which means much more time to figure out your capacity needs as you grow.

1 more reply

Wilya12y ago

I don't see how that makes sense.

If all you need is a server that is up 24/7, rent it by the month. You don't need information to make an educated choice, since they are pretty much all cheaper than EC2.

banachtarski12y ago

With the additional caveat that you have tons of money sure. With 24/7 load, you'd probably pay a 10x premium to use AWS.

1 more reply

_zen12y ago

> particularly when you are a new startup and don't have enough information (or capital!) to make educated server purchases.

I doubt there are many founders who are technically informed enough to know about Amazon Web Services, but don't know about the other big 3 (Digital Ocean, Linode, Rackspace). If you truly don't, then you must not be a tech company, and I have a hard time believing a non-tech company without any technical founders would even know about AWS.

2 more replies

richadamsOP12y ago

Absolutely, great point! AWS isn't for everyone, and there can be lots of cases where it's cheaper to use dedicated hardware. Shop around before jumping in. I've added this as a new tip at the end of the article (crediting you of course). Thanks!

krallin12y ago· 5 in thread

Lots of very useful tips there!

There's one that I think could be improved on a little:

    Uploads should go direct to S3 (don't store on local filesystem and have another process move to S3 for example).

You could even use a temporary URL[0,1] and have the user upload directly to S3!

[0]: http://stackoverflow.com/questions/10044151/how-to-generate-... [1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

jaibot12y ago

I have a desktop client that requests one-time upload URLs from my server via an API. Later they get downloaded and processed somewhere else - never actually touching my web server.

Jach12y ago

Even cooler I think if you need a lot of file uploads (and potentially organized into their own folders) is letting your customer connect to a WebDAV interface with their system file browser, then they can just drag and drop whatever. (https://code.google.com/r/1meref-sabredav-amazons3/)

richadamsOP12y ago

Wow, didn't know about pre-signed URLs, very useful. I've added this info to my article, thanks!

toomuchtodo12y ago

I've always seen issues pushing objects directly to S3 from a browser using CORS. YMMV.

ceejayoz12y ago

You can specify CORS headers for S3, or you can just use a standard form POST.

1 more reply

michaelmior12y ago· 5 in thread

Disabling SSH is an interesting tip. I guess the OP doesn't do any automation via SSH.

richadamsOP12y ago

Just disabling inbound SSH connections, the servers can still SSH out to other systems to pull in files, configurations, clone git repos, etc.

It's just a way to stop yourself from cheating and SSHing in just to fix that one thing, instead of automating it.

milkshakes12y ago

except that some automation frameworks rely on inbound ssh access to the machines. ansible would be an example of such a framework, in its default configuration at least.

1 more reply

michaelmior12y ago

It can also be useful to SSH into a system to check what's going on with a specific problem. Sometimes weird things happen that you can't always anticipate or automate away.

jamiesonbecker12y ago

Userify is awesome for this - disable SSH user accounts at any time and then re-enable when you realize you still need SSH to find out why your instance stopped sending logs!! ;)

michaelmior12y ago

Thanks Userify CEO! :)

freerobby12y ago· 5 in thread

Can you (or somebody else) elaborate on disabling ssh access? Is this a dogma of "automation should do everything" or is there a specific security concern you are worried about? What is the downside of letting your ops people ssh into boxes, or for that matter of their needing to do so?

pavel_lishin12y ago

Based on the article, it seems it's there to make sure that you're automating everything, instead of logging in to do that one little thing by hand.

freerobby12y ago

Thanks.

Does anybody else here agree with this mentality? This seems a major mispractice to me. I've worked at companies with as few as two people to as many as 50,000 people. None of them have had production systems that are entirely self-maintaining. Most startups are better off being pragmatic than investing man-years of time handling rare error cases like what to do if you get an S3 upload error while nearly out of disk space. There's a good reason why even highly automated companies like Facebook have dozens of sysadmins working around the clock.

I thought all of his other points were spot-on but this one rings very dissonant to my experience.

3 more replies

richadamsOP12y ago

This is correct. The tip about disabling SSH isn't about security, it's just about quickly highlighting areas where you're not automated.

When developing an application for example, it's often necessary to SSH in to play with some things. But once you've ready to go to production, you want as much automation as possible. Forcing yourself to not use SSH will quickly show you where you aren't automated.

2 more replies

jamiesonbecker12y ago

I like SSH personally.. how else do you log in to figure out why your production instances quit logging (or anything else). I do appreciate the logic behind this, though... what he really seems to be saying is "turn off SSH to see if you can live without it." Good call. (disclaimer: I developed Userify, which pushes out SSH keys and lets you disable SSH for any or all users anytime.. and then re-enable when you need it!)

tbarbugli12y ago

Pure bullshit

j-kidd12y ago· 4 in thread

Good article, but I think it touches too little about persistence. The trade-off of EBS vs ephemeral storage, for example, is not mentioned at all.

Getting your application server up and running is the easiest part in operation, whether you do it by hand via SSH, or automate and autoscale everything with ansible/chef/puppet/salt/whatever. Persistence is the hard part.

crescentfresh12y ago

Good point. We're struggling to see the benefits of EBS for Cassandra that has its own replication strategy (ie data is not lost if an instance is lost), voiding the "only store temporary data on ephemeral stores" argument.

lowbloodsugar12y ago

You've probably read this, but in case anyone else is considering EBS and cassandra...

"EBS volumes are not recommended for Cassandra data volumes."

http://www.datastax.com/docs/1.1/cluster_architecture/cluste...

blakesmith12y ago

How do you handle entire datacenter outages with ephemeral only setup? You can replicate to another datacenter, but if power is lost to both do you just accept that you'll have to restore from a snapshotted backup?

1 more reply

objectivefs12y ago

Persistence for AWS can be relatively simple if you use a distributed filesystem, such as GlusterFS (http://gluster.org) or our ObjectiveFS (https://objectivefs.com). You get a shared namespace for all your instances and persisting your data becomes as simple as writing files.

PhilipA12y ago· 4 in thread

Really useful article, though I don't agree with not using a CDN instead of S3. There are multiple articles which proves the performance of S3 being quite bad, and not useful for serving assets, comparing to CloudFront.

kolev12y ago

The issue with CloudFront is the tremendous cost of $600/mo for custom domain SSL certificate. You also need to apply and get approved. There many not-so-obvious limits and blocks and unless you pay for support (which is pretty cheap, by the way), it may take you a week to lift those - you need to request them one by one and various teams approve/disapprove the requests. It's totally ridiculous.

richadamsOP12y ago

I'll admit I hadn't really look at this in depth, using S3 without a CDN solved a particular use case I had a while ago, and it just seemed unnecessary to add a CDN in front of it. I've been doing some reading today, and it seems I was wrong. Adding a CDN in front adds lots of benefits I didn't know about!

I'll update the article soon to add in the new information.

tyw12y ago

also the outbound bandwidth cost of S3 is very high. it would cost us several times what we're paying for s3+cloudfront to serve our content straight from s3.

tedivm12y ago

Even cloudfront is ridiculously overpriced for a CDN. If you're pushing anything close to real bandwidth you could do a lot better elsewhere.

1 more reply

mslot12y ago· 3 in thread

Be very careful with assigning IAM roles to EC2 instances. Many web applications have some kind of implicit proxying, e.g. a function to download an image from a user-defined URL. You might have remembered to block 127.0.0.*, but did you remember 169.254.169.254? Are you aware why 169.254.169.254 is relevant to IAM roles? Did you consider hostnames pointed to to 169.254.169.254? Did you consider that your HTTP client might do a separate DNS look-up? etc.

There are other subtleties which make roles hard to work with. The same policies can have different effects for roles and users (e.g., permission to copy from other buckets).

IAM Roles can be useful, especially for bootstrapping (e.g. retrieving an encrypted key store at start-up), but only use them if you know what you're doing.

Conversely, tips like disabling SSH have negligible security benefit if you're using the default EC2 setup (private key-based login). It's really quite useful to see what's going on in an individual server when you're developing a service.

Also, it does matter whether you put a CDN in front of S3. Even when requesting a file from EC2, CloudFront is typically an order of magnitude faster than S3. Even when using the website endpoint, S3 is not designed for web sites and will serve 500s relatively frequently, and does not scale instantly.

richadamsOP12y ago

Great point with regards to IAM roles. The applications I've worked on don't download things from user-defined URLs, so this never even occurred to me.

Is the purpose of blocking 169.254.169.254 important because it could potentially give users access to the instance metadata service for your instance? I'd be interested to hear more information on securing EC2 with regards IAM roles, you seem to have lots of experience in that area.

The disabling SSH tip wasn't really about security (I agree that it has negligible security benefit), it's more about quickly highlighting parts of your infrastructure that aren't automated. It's often tempting to just quickly SSH in and fix this one little thing, and disabling it will force you to automate the fix instead.

The CDN info has been mentioned elsewhere too, lots of things I didn't know. I'll be updated the article soon to add all of the points that have been made. Thanks for the tips!

mslot12y ago

The way IAM roles work is terrifyingly simple. IAM generates temporary access key identifier and secret access key with the configured permissions and EC2 makes them available to your instance via the instance metadata as JSON at http://169.254.169.254/latest/meta-data/iam/security-credent... . The AWS SDK periodically retrieves and parses the JSON to get the new credentials. That's it. I'm not entirely sure whether the credentials can be used from a different IP or not, but given a proxying function that does not really matter.

I make sure all HTTP requests in my (Java) application go through a DNS resolver that throws an exception if: ip.isLoopbackAddress() || ip.isMulticastAddress() || ip.isAnyLocalAddress() || ip.isLinkLocalAddress()

The last clause captures 169.254.169.254. Of course, many libraries use their own HTTP client, so it's easy to make a mistake.

I'm trying to bring my usage of IAM roles down to 0 as a matter of policy. Currently, I'm only using an IAM role to retrieve an encrypted Java key store from S3 (key provided via CloudFormation) and encrypted AWS credentials for other functions (keys contained in the key store). I'd be happier to bootstrap using CloudFormation with credentials that are removed from the instance after start-up.

Thanks for making updates. There are definitely some great tips in there.

jamiesonbecker12y ago

I agree. Not to toot own horn too much, but I use SSH for everything, including backups, automated jobs, live log monitoring, etc. disclaimer: founder Userify

lfuller12y ago· 3 in thread

Your body tag is set to "overflow: hidden;". I wasn't able to scroll until I tweaked it manually in the inspector.

richadamsOP12y ago

Oops, sorry about that. Should be fixed now.

paulgb12y ago

Also, if you change the first line of http://wblinks.com/css/style.css

    @import url(http://fonts.googleapis.com/css?family=Droid+Sans:400,700);

you should notice an improvement in the boldface font rendering.

Great article, btw.

2 more replies

SixSigma12y ago

I can't zoom on Firefox mobile for Android

novaleaf12y ago· 3 in thread

i'm a devops noob. what tools should i use to log / monitor all my servers?

i don't want to learn some complex stuff like cheff/puppet btw.... anything SIMPLE?

carlio12y ago

Though I haven't tried it, people tell me that ansible is pretty simple - http://www.ansible.com/home

For logging, try logstash? http://logstash.net/

Monitoring... well that's a large and complicated topic!

adenot12y ago

+1 on Ansible, great tool and super simple to configure and use.

novaleaf12y ago

thanks, i'll give ansible a try!

mblaney12y ago· 3 in thread

As an Australian developer, using an EC2 instance seems to be the cheapest option if you want a server based in this country. Anyone got any other recommendations?

kibibu12y ago

Ninefold aren't bad either

mootpointer12y ago

As a Ninefold employee, I'd like to think we're pretty good. We do virtual servers and we have a solid Rails platform as well.

mblaney12y ago

thanks will keep them in mind.

simonlebo12y ago· 3 in thread

Can anyone explain how disabling ssh has anything to do with automation? We automate all our deployments through ssh and I was not aware of another way of doing.

ceejayoz12y ago

I believe the idea is that by preventing SSH the temptation to just pop in and tweak something manually isn't possible.

richadamsOP12y ago

Yup, this was the intention. You could still allow your automation processes SSH access, just disable it for your users.

The idea is that if a user can't SSH in (at least not without modifying the firewall rules to allow it again), it will force them to try and automate what they were going to do instead. It worked well for me, but it's probably not for everyone.

jessaustin12y ago

ssh is handy if you're creating instances and then setting them up. However, if you're doing that on a regular basis, you might ought to use custom AMIs instead. Then (with proper "user data" management) you can just roll out instances that are already set up how you want.

Fizzer12y ago· 2 in thread

> you pay the much cheaper CloudFront outbound bandwidth costs, instead of the S3 outbound bandwidth costs.

What? CloudFront bandwidth costs are, at best, the same as S3 outbound costs, and at worse much more expensive.

S3 outbound costs are 12 cents per GB worldwide. [1]

CloudFont outbound costs are 12-25 cents per GB, depending on the region. [2]

Not only that, but your cost-per-request on CloudFront way more than S3 ($0.004 per 10,000 requests on S3 vs $0.0075-$0.0160 per 10,000 requests on CloudFront)

[1] http://aws.amazon.com/s3/pricing/ [2] http://aws.amazon.com/cloudfront/pricing/

richadamsOP12y ago

Doh, I feel stupid now. I only looked at bandwidth costs, not the request prices. That's what I get for editing my post late at night based on reading, instead of based on personal experience.

For low bandwidth, you're absolutely right, the costs are at best the same. For high bandwidth however (once you get above 10TB), CloudFront works out cheaper (by about $0.010/GB, depending on region). But that wasn't taking into account the request cost, which as you point out, is more expensive on CloudFront, which can negate the savings from above depending on your usage pattern.

I'll update my post accordingly, thanks for pointing this error out!

_hyn312y ago

You do have to pay for S3 to CloudFront traffic, so really you're paying twice. (Although the S3 to CF traffic might be cheaper than S3 to Internet, according to the Origin Server section on the Cloudfront pricing page.) http://aws.amazon.com/cloudfront/pricing/

Also, S3 buckets cannot scale infinitely. They have to have their key names managed appropriately to do it. http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tip...

Finally :) I like SSH. But I'm the founder of Userify! http://userify.com

Fasebook12y ago· 2 in thread

What's the point of auditing security in the Cloud? Is there any point at which you can know that your making any progress?

mscarborough12y ago

Just one example -- Amazon will sign a Business Associate's Agreement for HIPAA compliance. That doesn't absolve you of your application security responsibilities, but it does give you piece of mind on the PAAS EC2/S3 side of things.

tel12y ago

For further note though, they won't unless you buy dedicated instances. This also disables RDS.

5ersi12y ago· 2 in thread

Aww man, my head hurts just looking at this list.

Just go with a PaaS, like Heroku or AppEngine, and forget about this sysadmin crap.

q3k12y ago

> sysadmin crap

Without this “sysadmin crap” you would not have your precious PaaS.

tburch12y ago

>> sysadmin crap

>Without this “sysadmin crap” you would not have your precious PaaS.

The difference being that I don't have to deal with the “sysadmin crap”.

1 more reply

rkalla12y ago· 1 in thread

Fantastic list with much more depth than I expected. Some surprises that others might be interested in from this article and comments below:

  [1] Keeping buckets locked down and allowing direct client -> S3 uploads
  [2] Using ALIAS records for easier redirection to core AWS resources instead of CNAMES.
  [3] What's an ALIAS?
  [-] Using IAM Roles
  [4] Benefits of using a VPC
  [-] Use '-' instead of '.' in S3 bucket names that will be accessed via HTTPS.
  [-] Automatic security auditing (damn, entire section was eye-opening)
  [-] Disable SSH in security groups to force you to get automation right.

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

[2] http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Cre...

[3] http://blog.dnsimple.com/2011/11/introducing-alias-record/

[4] http://www.youtube.com/watch?v=Zd5hsL-JNY4

jamiesonbecker12y ago

I like SSH. But I'm the founder of Userify ;) http://userify.com

Also, S3 buckets cannot scale infinitely. This is a huge myth http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tip...

Judson12y ago· 1 in thread

One thing the article mentions is terminating SSL on your ELB. If you want more control over your SSL setup AND want to get remote IP information (e.g. X-Forwarded-For) ELB now supports PROXY protocol. I wrote a little introduction on how to set it up[0]. They haven't promoted it very much, but it is quite useful.

[0]: http://jud.me/post/65621015920/hardened-ssl-ciphers-using-aw...

richadamsOP12y ago

Great post, I had no idea you could do this with ELB. I've added your link to the additional reading list in my post, thanks for sharing!

kolev12y ago· 1 in thread

One painful to learn issue with AWS is the limits of services, which some of them are not so obvious. Everything has a hard limit and unless you have the support plan, it can take you days and weeks to get those lifted. They are all handled by the respective departments and lifted (or rejected) one by one. Many times we've encountered a Security Group limit right before a production push or other similar things. Last, but not least, RDS and CloudFront are extremely painful to launch. I have many incidents where RDS was taking nearly 2 hours to launch - blank multi-AZ instance! CloudFront distributions take 30 minutes to complete. I hate those two taking so long as my CloudFormation templates pretty much take an excess of an hour due to the blocking RDS and CloudFront. Last, but not least - VPC is nice, I love it, but it takes time to get what's the difference between Network ACL and Security groups and especially - why the neck do you need to run NATs?! Why isn't this part of the service?! They provide some outdated "high" availability scripts, which are, in fact, buggy, and support only 2 AZs. Also, a CloudFront "flush" takes over 20 minutes - even for empty distributions! Also, you can't do a hot switch from on distribution to another as it also take 30 minutes to change a CNAME and you cannot have two distributions having the same CNAME (it's a weird edge case scenario, but anyway).

kolev12y ago

Just recalled another big annoyance! CloudFormation allows you to store JSON files in the user data, which is a bit similar to CloudInit, but... it turns your numbers into strings! So, imagine you need to put some JSON config file in there and the software expect an integer and craps out if there's a string value instead. I won't even bring how limited and behind the API CloudFormation is... Even their AWS CLI is behind and doesn't support major services like CloudFront. They even removed the nice landing page of the CLI took, which made it very obvious which services are NOT supported - I guess they just got embarrassed by having so many unsupported ones!

noelherrick12y ago· 1 in thread

> Have tools to view application logs.

Yes! Centralized logging is an absolute must: don't depend on the fact that you can log in and look at logs. This will grow so wearisome.

txttran12y ago

What tools do you recommend for centralized logging?

Mizza12y ago· 1 in thread

That '.' instead of '-' tip for SSL'd buckets just saved me a large future headache. Good stuff!

croddin12y ago

I think you reversed them.

Estragon12y ago· 1 in thread

How hard is it to roll your own version of AWS's security groups? I want to set up a Storm cluster, but the methods I have come up with for firewalling it while preserving elasticity all seem a bit fragile.

jamiesonbecker12y ago

Check out Dome9. Amazing tool and I think they work with both AWS and elsewhere.

drob12y ago

Along these lines, I recommend installing New Relic server monitoring on all your EC2 instances.

The server-level monitoring is free, and it's super simple to install. (The code we use to roll it out via ansible: https://gist.github.com/drob/8790246)

You get 24 hours of historical data and a nice webUI. Totally worth the effort.

match12y ago

  > Use random strings at the start of your keys.
  > This seems like a strange idea, but one of the implementation details 
  > of S3 is that Amazon use the object key to determine where a file is physically 
  > placed in S3. So files with the same prefix might end up on the same hard disk 
  > for example. By randomising your key prefixes, you end up with a better distribution 
  > of your object files. (Source: S3 Performance Tips & Tricks)

This is great advice, but just a small conceptual correction. The prefix doesn't control where the file contents will be stored it just controls where the index to that file's contents is stored.

rdl12y ago

I'd probably also say "avoid ELB where possible, especially for instance storage" and "avoid ELB, roll your own."

late2part12y ago

Thing I wish I'd known before I started: Don't rely on proprietary AWS solutions when open source solutions work just as well.

jamiesonbecker12y ago

With regards to managing ssh, keys, etc... userify. Disclaimer: founder.

gesman12y ago

Someone needs to create such list for Azure as well.

And make it Wiki-ized.

ape412y ago

Wow looks like a big pain.

j / k navigate · click thread line to collapse

149 comments

88 comments · 27 top-level

mbreese12y ago· 11 in thread

I'd also add to the list - make sure that AWS is right for your workload.

NathanKP12y ago

So yeah I agree that EC2 is expensive, but the benefits of living fully within the Amazon ecosystem are pretty large.

osteele12y ago

neverminder12y ago

Isn't relying so much on AWS like "putting all your eggs in one basket"?

berkay12y ago

mbreese12y ago

True, but outside of S3 and Route53, how much under the AWS umbrella is much use without using at least one EC2 instance?

I can see a lot of benefit to using S3 without EC2, but after that, I'm not sure what else would be possible. Care to elaborate more?

Can you use their queues and database tools w/o using EC2? (If you are using a VPC, maybe?)

4 more replies

AaronBBrown12y ago

AWS makes a lot of sense in a 24/7 environment, particularly when you are a new startup and don't have enough information (or capital!) to make educated server purchases.

dangrossman12y ago

1 more reply

Wilya12y ago

I don't see how that makes sense.

If all you need is a server that is up 24/7, rent it by the month. You don't need information to make an educated choice, since they are pretty much all cheaper than EC2.

banachtarski12y ago

With the additional caveat that you have tons of money sure. With 24/7 load, you'd probably pay a 10x premium to use AWS.

1 more reply

_zen12y ago

> particularly when you are a new startup and don't have enough information (or capital!) to make educated server purchases.

2 more replies

richadamsOP12y ago

krallin12y ago· 5 in thread

Lots of very useful tips there!

There's one that I think could be improved on a little:

    Uploads should go direct to S3 (don't store on local filesystem and have another process move to S3 for example).

You could even use a temporary URL[0,1] and have the user upload directly to S3!

[0]: http://stackoverflow.com/questions/10044151/how-to-generate-... [1]: http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

jaibot12y ago

I have a desktop client that requests one-time upload URLs from my server via an API. Later they get downloaded and processed somewhere else - never actually touching my web server.

Jach12y ago

richadamsOP12y ago

Wow, didn't know about pre-signed URLs, very useful. I've added this info to my article, thanks!

toomuchtodo12y ago

I've always seen issues pushing objects directly to S3 from a browser using CORS. YMMV.

ceejayoz12y ago

You can specify CORS headers for S3, or you can just use a standard form POST.

1 more reply

michaelmior12y ago· 5 in thread

Disabling SSH is an interesting tip. I guess the OP doesn't do any automation via SSH.

richadamsOP12y ago

Just disabling inbound SSH connections, the servers can still SSH out to other systems to pull in files, configurations, clone git repos, etc.

It's just a way to stop yourself from cheating and SSHing in just to fix that one thing, instead of automating it.

milkshakes12y ago

except that some automation frameworks rely on inbound ssh access to the machines. ansible would be an example of such a framework, in its default configuration at least.

1 more reply

michaelmior12y ago

It can also be useful to SSH into a system to check what's going on with a specific problem. Sometimes weird things happen that you can't always anticipate or automate away.

jamiesonbecker12y ago

Userify is awesome for this - disable SSH user accounts at any time and then re-enable when you realize you still need SSH to find out why your instance stopped sending logs!! ;)

michaelmior12y ago

Thanks Userify CEO! :)

freerobby12y ago· 5 in thread

pavel_lishin12y ago

Based on the article, it seems it's there to make sure that you're automating everything, instead of logging in to do that one little thing by hand.

freerobby12y ago

Thanks.

I thought all of his other points were spot-on but this one rings very dissonant to my experience.

3 more replies

richadamsOP12y ago

This is correct. The tip about disabling SSH isn't about security, it's just about quickly highlighting areas where you're not automated.

2 more replies

jamiesonbecker12y ago

tbarbugli12y ago

Pure bullshit

j-kidd12y ago· 4 in thread

Good article, but I think it touches too little about persistence. The trade-off of EBS vs ephemeral storage, for example, is not mentioned at all.

crescentfresh12y ago

lowbloodsugar12y ago

You've probably read this, but in case anyone else is considering EBS and cassandra...

"EBS volumes are not recommended for Cassandra data volumes."

http://www.datastax.com/docs/1.1/cluster_architecture/cluste...

blakesmith12y ago

1 more reply

objectivefs12y ago

PhilipA12y ago· 4 in thread

kolev12y ago

richadamsOP12y ago

I'll update the article soon to add in the new information.

tyw12y ago

also the outbound bandwidth cost of S3 is very high. it would cost us several times what we're paying for s3+cloudfront to serve our content straight from s3.

tedivm12y ago

Even cloudfront is ridiculously overpriced for a CDN. If you're pushing anything close to real bandwidth you could do a lot better elsewhere.

1 more reply

mslot12y ago· 3 in thread

There are other subtleties which make roles hard to work with. The same policies can have different effects for roles and users (e.g., permission to copy from other buckets).

IAM Roles can be useful, especially for bootstrapping (e.g. retrieving an encrypted key store at start-up), but only use them if you know what you're doing.

richadamsOP12y ago

Great point with regards to IAM roles. The applications I've worked on don't download things from user-defined URLs, so this never even occurred to me.

The CDN info has been mentioned elsewhere too, lots of things I didn't know. I'll be updated the article soon to add all of the points that have been made. Thanks for the tips!

mslot12y ago

The last clause captures 169.254.169.254. Of course, many libraries use their own HTTP client, so it's easy to make a mistake.

Thanks for making updates. There are definitely some great tips in there.

jamiesonbecker12y ago

I agree. Not to toot own horn too much, but I use SSH for everything, including backups, automated jobs, live log monitoring, etc. disclaimer: founder Userify

lfuller12y ago· 3 in thread

Your body tag is set to "overflow: hidden;". I wasn't able to scroll until I tweaked it manually in the inspector.

richadamsOP12y ago

Oops, sorry about that. Should be fixed now.

paulgb12y ago

Also, if you change the first line of http://wblinks.com/css/style.css

    @import url(http://fonts.googleapis.com/css?family=Droid+Sans:400,700);

you should notice an improvement in the boldface font rendering.

Great article, btw.

2 more replies

SixSigma12y ago

I can't zoom on Firefox mobile for Android

novaleaf12y ago· 3 in thread

i'm a devops noob. what tools should i use to log / monitor all my servers?

i don't want to learn some complex stuff like cheff/puppet btw.... anything SIMPLE?

carlio12y ago

Though I haven't tried it, people tell me that ansible is pretty simple - http://www.ansible.com/home

For logging, try logstash? http://logstash.net/

Monitoring... well that's a large and complicated topic!

adenot12y ago

+1 on Ansible, great tool and super simple to configure and use.

novaleaf12y ago

thanks, i'll give ansible a try!

mblaney12y ago· 3 in thread

As an Australian developer, using an EC2 instance seems to be the cheapest option if you want a server based in this country. Anyone got any other recommendations?

kibibu12y ago

Ninefold aren't bad either

mootpointer12y ago

As a Ninefold employee, I'd like to think we're pretty good. We do virtual servers and we have a solid Rails platform as well.

mblaney12y ago

thanks will keep them in mind.

simonlebo12y ago· 3 in thread

Can anyone explain how disabling ssh has anything to do with automation? We automate all our deployments through ssh and I was not aware of another way of doing.

ceejayoz12y ago

I believe the idea is that by preventing SSH the temptation to just pop in and tweak something manually isn't possible.

richadamsOP12y ago

Yup, this was the intention. You could still allow your automation processes SSH access, just disable it for your users.

jessaustin12y ago

Fizzer12y ago· 2 in thread

> you pay the much cheaper CloudFront outbound bandwidth costs, instead of the S3 outbound bandwidth costs.

What? CloudFront bandwidth costs are, at best, the same as S3 outbound costs, and at worse much more expensive.

S3 outbound costs are 12 cents per GB worldwide. [1]

CloudFont outbound costs are 12-25 cents per GB, depending on the region. [2]

Not only that, but your cost-per-request on CloudFront way more than S3 ($0.004 per 10,000 requests on S3 vs $0.0075-$0.0160 per 10,000 requests on CloudFront)

[1] http://aws.amazon.com/s3/pricing/ [2] http://aws.amazon.com/cloudfront/pricing/

richadamsOP12y ago

Doh, I feel stupid now. I only looked at bandwidth costs, not the request prices. That's what I get for editing my post late at night based on reading, instead of based on personal experience.

I'll update my post accordingly, thanks for pointing this error out!

_hyn312y ago

Also, S3 buckets cannot scale infinitely. They have to have their key names managed appropriately to do it. http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tip...

Finally :) I like SSH. But I'm the founder of Userify! http://userify.com

Fasebook12y ago· 2 in thread

What's the point of auditing security in the Cloud? Is there any point at which you can know that your making any progress?

mscarborough12y ago

tel12y ago

For further note though, they won't unless you buy dedicated instances. This also disables RDS.

5ersi12y ago· 2 in thread

Aww man, my head hurts just looking at this list.

Just go with a PaaS, like Heroku or AppEngine, and forget about this sysadmin crap.

q3k12y ago

> sysadmin crap

Without this “sysadmin crap” you would not have your precious PaaS.

tburch12y ago

>> sysadmin crap

>Without this “sysadmin crap” you would not have your precious PaaS.

The difference being that I don't have to deal with the “sysadmin crap”.

1 more reply

rkalla12y ago· 1 in thread

Fantastic list with much more depth than I expected. Some surprises that others might be interested in from this article and comments below:

  [1] Keeping buckets locked down and allowing direct client -> S3 uploads
  [2] Using ALIAS records for easier redirection to core AWS resources instead of CNAMES.
  [3] What's an ALIAS?
  [-] Using IAM Roles
  [4] Benefits of using a VPC
  [-] Use '-' instead of '.' in S3 bucket names that will be accessed via HTTPS.
  [-] Automatic security auditing (damn, entire section was eye-opening)
  [-] Disable SSH in security groups to force you to get automation right.

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

[2] http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Cre...

[3] http://blog.dnsimple.com/2011/11/introducing-alias-record/

[4] http://www.youtube.com/watch?v=Zd5hsL-JNY4

jamiesonbecker12y ago

I like SSH. But I'm the founder of Userify ;) http://userify.com

Also, S3 buckets cannot scale infinitely. This is a huge myth http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tip...

Judson12y ago· 1 in thread

[0]: http://jud.me/post/65621015920/hardened-ssl-ciphers-using-aw...

richadamsOP12y ago

Great post, I had no idea you could do this with ELB. I've added your link to the additional reading list in my post, thanks for sharing!

kolev12y ago· 1 in thread

kolev12y ago

noelherrick12y ago· 1 in thread

> Have tools to view application logs.

Yes! Centralized logging is an absolute must: don't depend on the fact that you can log in and look at logs. This will grow so wearisome.

txttran12y ago

What tools do you recommend for centralized logging?

Mizza12y ago· 1 in thread

That '.' instead of '-' tip for SSL'd buckets just saved me a large future headache. Good stuff!

croddin12y ago

I think you reversed them.

Estragon12y ago· 1 in thread

jamiesonbecker12y ago

Check out Dome9. Amazing tool and I think they work with both AWS and elsewhere.

drob12y ago

Along these lines, I recommend installing New Relic server monitoring on all your EC2 instances.

The server-level monitoring is free, and it's super simple to install. (The code we use to roll it out via ansible: https://gist.github.com/drob/8790246)

You get 24 hours of historical data and a nice webUI. Totally worth the effort.

match12y ago

  > Use random strings at the start of your keys.
  > This seems like a strange idea, but one of the implementation details 
  > of S3 is that Amazon use the object key to determine where a file is physically 
  > placed in S3. So files with the same prefix might end up on the same hard disk 
  > for example. By randomising your key prefixes, you end up with a better distribution 
  > of your object files. (Source: S3 Performance Tips & Tricks)

This is great advice, but just a small conceptual correction. The prefix doesn't control where the file contents will be stored it just controls where the index to that file's contents is stored.

rdl12y ago

I'd probably also say "avoid ELB where possible, especially for instance storage" and "avoid ELB, roll your own."

late2part12y ago

Thing I wish I'd known before I started: Don't rely on proprietary AWS solutions when open source solutions work just as well.

jamiesonbecker12y ago

With regards to managing ssh, keys, etc... userify. Disclaimer: founder.

gesman12y ago

Someone needs to create such list for Azure as well.

And make it Wiki-ized.

ape412y ago

Wow looks like a big pain.

j / k navigate · click thread line to collapse