Improving HTTPS Performance with Early SSL Termination (opens in new tab)

(blog.filepicker.io)

97 pointstagx13y ago59 comments

59 comments

48 comments · 18 top-level

buro913y ago· 6 in thread

I'm doing something very similar to this, the setup I'm using is:

DNSMadeEasy has a global traffic redirector ( http://www.dnsmadeeasy.com/services/global-traffic-director/ )

That then sends a request to the closest Linode data center.

Linode instances run nginx which redirect to Varnish, and the Varnish backend is connected via VPN to the main app servers (based in the London datacenter as the vast majority of my users are in London).

I use Varnish behind nginx to additionally place a fast cache close to the edge to prevent unnecessary traffic over the VPN.

Example: USA to London traffic passes over the VPN running within Linode, and the SSL connection for an East Coast user is just going to Newark. If the requested file was for a recently requested (by some other user) static file, then the file would come from Varnish and the request would not even leave the Newark data center.

buro913y ago

I can't edit my post but I should note that there is an edge case I'm aware of where this kind of solution might not be the fastest solution for the end user, and this would likely affect what filepicker.io are doing too.

The edge case is that some DNS providers (Google, OpenDNS) already pick what they feel is the closest end point.

I read about that stuff over here a while ago: http://tech.slashdot.org/story/11/08/30/1635232/google-and-o...

And this comment explains it best: http://tech.slashdot.org/comments.pl?sid=2404950&cid=372...

I haven't fully investigated this, and I don't know whether it is affecting some users. But when I implemented my solution I was aware that it might be possible for some small subset of users, for this to not result in a faster connection than if I'd done nothing at all (the closest resolver to Google may actually be further from the customer than the local server I run).

I'm just betting that for the vast majority of users this does bring about a noticeable increase in speed.

tedunangst13y ago

Google runs lots of DNS servers. You (your ISP) pick the 8.8.8.8 closest to you. That will in turn do the lookup and get the linode closest to google dns, which should also be close to you.

If you're using a North American Google DNS server, you'll get answers that say NA. If you use the DNS server in Europe, you'll get answers that say EU.

I'm assuming Google doesn't try to sync and cache between 8.8.8.8 instances, but I don't see why they would. That's a lot of work for no benefit.

1 more reply

jsatok13y ago

That's a great idea. We're hosting with SoftLayer and have been considering doing something similar. They offer free bandwidth on the private network between data centers (and have pretty good pings between them). With a cloud server in each data center, you could achieve a similar thing while avoiding the need for VPN and not paying for extra bandwidth.

donavanm13y ago

Is this roll your own CDN significantly cheaper than another provider? Or is there some other advantage?

buro913y ago

I was already on Linode and I'm only serving a few hundred GB of static files per day (with the Linodes I have this is well within my free quota).

In my instance (forums with current discussions) most static file requests are for image attachments in the very latest discussions, the hot topics. So Varnish fits this scenario really well. I didn't need a long-term storage of images in the CDN, I just needed to store the most recently requested items in the CDN.

Linodes are cheap, I was already using them in a distributed fashion to reduce SSL roundtrips, and introducing Varnish was a small configuration change.

I have tried a few other providers (most recently CloudFlare). But I was generally not happy with them, usually due to a lack of visibility.

I proxy http:// images within the user generated content over https:// when the sites are accessed over https:// . And occasionally I found that images would not load when I used a CDN provider for that. But never had enough data and transparency with the CDN to know why. Users notice this stuff though, so I'd have isolated users complaining of images not loading and no way to debug or reproduce it.

So I found that as my scenario made Varnish a good fit, and the bandwidth was within my allowance, and it was easy to do... well, I just did it.

I still experiment with CDNs every now and then, but largely I get more reliability and transparency from my own solution. I've also found this to be cost effective, though I would be OK with paying a premium if I found the reliability and transparency rivalled my home-rolled solution.

tagxOP13y ago

Static files are still served by a normal CDN. This helps with dynamic HTTPS requests that change each time.

stevencorona13y ago· 5 in thread

So, the way I understand it, the connection between the load balanacer <-> web server is over the private network, right? And with VPC, your private network is isolated and can't be snooped by other Amazon customers?

Sounds cool, but this would only work on Amazon or datacenters w/ cross-data center private networks (SoftLayer has this, for example).

tagxOP13y ago

No, the way it works is that there is a load balancer that terminates ssl and forwards it to nginx instances all in a private network. The nginx instances then have secure HTTPS connections over the public internet to the main load balancer that terminates ssl and forwards it over a private network the application servers. So this would be possible with any network since the cross country connections are encrypted.

kenshiro_o13y ago

That's a nice technique and the explanation is good while remaining concise. We do something similar at work (I work in finance) where our clients connect to a secure gateway using HTTPS but all communication with our other services are made using an unsecure protocol. If it lives in your house then it's likely to be harmless!

stevencorona13y ago

Oh, I guess I misunderstood. The load balancer <-> web server connection is over HTTPS, not HTTP.

sp33213y ago

You can edit (or delete) comments here for up to 2 hours after you make them.

1 more reply

rbanffy13y ago

You can have the endpoint servers participating in a VPN with the backend servers. They don't have to be on EC2. This way you wouldn't need to make the front-back requests via https.

aaronpk13y ago· 3 in thread

Doesn't this mean the traffic is being sent un-encrypted across the ocean?

TallGuyShort13y ago

The impression I got from the article was that the warm keep-alive connections were encrypted - the SSL handshake takes place ahead of time and then tunnels multiple requests from multiple users - hence the lower latency.

Amazon's ELB (the EC2 load balancer) used to send HTTPS traffic to your back-end unencrypted, but I believe they have since fixed this.

ghotli13y ago

Not sure what you mean by your ELB/HTTPS comment. ELB can be used as an HTTPS terminator. It will then proxy traffic to your backend as HTTP. It can also be used as a straight TCP proxy, but in that case it's just shoving along the HTTPS request to an HTTPS terminator that you maintain.

1 more reply

tagxOP13y ago

No, the pool of keep-alive connections are all encrypted as well.

jbyers13y ago· 2 in thread

How do you manage the keepalive connection pool? Are you managing this in nginx (via HTTP 1.1 backend support?) or using a different service?

We ran a test of this approach using a similar stack in 2010. We had Ireland, Singapore, Sydney backhauling to Dallas, TX for a reasonably large population of users. Managing the backend pool was a bit of a challenge without custom code. nginx didn't yet support HTTP 1.1 backend connections. The two best options I could find at that time were Apache TrafficServer and perlbal. perlbal won and was pretty easy to set up with a stable warm connection pool.

Despite good performance gains we didn't put the system into production. The monitoring and maintenance burden was high and we lacked at that time a homogeneous network -- I tested Singapore and Australia using VPS providers as Amazon and SoftLayer (our vendors of choice) weren't there yet.

As a side-effect of using the VPS vendors we did and trying to keep costs in control, we had to ratchet the TTL for this service down uncomfortably low to allow for cross-region failover. In Australia the additional DNS hit nearly wiped out the gains in SSL negotiation.

With today's increased geographical coverage and rich set of services from Amazon, this is a much less daunting project if you can stomach the operational overhead.

Note that the lack of sanely-priced bandwidth and hosting providers in Australia is a huge problem. When Amazon lands EC2 there, it's going to really shake up that market.

tagxOP13y ago

We are using nginx. Newer versions support HTTP 1.1 backends (There is also a patch for older versions of nginx)

jbyers13y ago

How do you do get nginx to preconnect and maintain an appropriate-sized backend pool?

1 more reply

EvanAnderson13y ago· 2 in thread

The "pool of warm keep-alive connections to the main web servers" is still sending the traffic over HTTPS, then?

Edit: I'm clear that latency is reduced and how that's accomplished. I just wanted to get clarification that the connections between the early SSL termination and the web servers was also encrypted, too.

pjscott13y ago

Yes, but SSL connections are fine once they get going -- the nasty part is how many round-trips are needed to complete the handshake. Any latency between the client and the server is going to be multiplied several times over as they do the initial ritual of verifying public keys and establishing a session key.

The trick here is to cut down on the latency of establishing the session.

tagxOP13y ago

Yes, but the SSL handshake has already been completed ahead of time so all the overhead is reduced.

hythloday13y ago· 2 in thread

I'm sorry, I don't understand. How is this different from geographically distributed reverse proxies?

lancefisher13y ago

These proxies are doing SSL between themselves and the app server and using a pool of warm keep-alive connections to avoid multiple high-latency calls. That's a little more than just a reverse proxy.

WALoeIII13y ago

Thats what this is.

alexchamberlain13y ago· 2 in thread

Would it be more effective to forward plain HTTP over a VPN instead? For example, you set up your servers in London, East Coast and West Coast and configure a VPN. People connect to their local servers via HTTPS and that server forwards it to London via HTTP; the request would be encrypted by the VPN. The advantage is that your proxy - Nginx is good for this - can bring up additional connections quicker.

TheOnly9213y ago

I suppose that you usually want to protect the part from client->server rather than just receiving encrypted things from server side.

alexchamberlain13y ago

Sorry, I don't understand?

1 more reply

saurik13y ago· 2 in thread

Standard CDNs will also accomplish this goal, and their bandwidth is normally cheaper than EC2 instances.

tagxOP13y ago

Normal CDNs don't do this with dynamic content that changes on every user request. Each api request we serve is different and saving 200ms almost doubles the performance.

saurik13y ago

This is at least incorrect for Akamai and CDNetworks (examples of large CDNs; if you are talking about something silly like CloudFlare, then all bets are off). I run my entire website, most of the content of which is dynamic, through CDNetworks; they definitely maintain hot connections from their systems through to my server, and use it for uncached origin fetches. For more information on related performance improvements, see one of my earlier comments.

http://news.ycombinator.com/item?id=2823268

2 more replies

dawolf13y ago· 2 in thread

I think maintaining a pool of 'warm' https sessions between the nginx and the app server is not a very flexible approach. What happens when all of those are occupied? Wouldn't it be nicer to have an IPsec tunnel between the nginx and the app server and open http sessions on demand?

1SaltwaterC13y ago

> What happens when all of those are occupied?

Backlog. That increases the latency till a new connection can be accepted. However, the number of pooled connections can be increased to a fairly large number at the expense of more memory consumption. This is something that isn't an issue with nginx by using it as a HTTPS proxy.

dawolf13y ago

Was more a rhetorical question. ;)

ammmir13y ago· 1 in thread

Maybe I don't understand the problem correctly, but why not just preflight an HTTPS request when your widget loads?

In the time it takes the user to pick their file(s) to upload, the initial SSL negotiation will most likely have finished. And if you upload multiple files serially, the browser should even reuse the current SSL context, so it wouldn't be ~300ms per file.

tagxOP13y ago

We don't make any connections until the website calls us. At that point we load a personalized dialog for the user and we want that request to be as fast as possible.

steve891813y ago· 1 in thread

So, if my understanding is correct, are they are trading SSL handshake latency (which occurs once per connection), for the potential latency incurred by having traffic redirected from multiple servers around the world to a single set of application servers?

It seems like in the diagram, the West Coast Client, instead of making a direct connection to the APP servers on the right, is instead making a connection to the ELB on the left, which then forwards the traffic to the nginx server, which forwards it to another ELB, which forwards it to the App servers.

If the client connected directly to the ELB in front of the App Servers, they would incur the SSL handshake latency, but would avoid the four extra hops (two per send and two per receive) on the ELB and nginx.

Over the lifetime of the connection, is it possible that this latency could be longer than 200 ms?

tagxOP13y ago

It is a possibility. However, I've measured 86ms between east and west EC2 instances, 96ms between my client on the west coast and an east EC2 instance, and 15ms between my client and a west EC2 instance. Thus the additional latency per connection is only about 5ms.

For the total latency to be longer than 200ms, about 20 requests would need to be made on the same connection, which will not happen given the number of requests we do at a time.

yalogin13y ago· 1 in thread

Wonder what the server certificate checking they are doing is. Its taking them 200ms seems a lot.

slpsys13y ago

The 200ms is pretty well spelled out in the beginning of the post. It's not that cert checking is taking 200ms by itself, it's that sending any packet cross-country takes 80-100ms, round trip, and so if you have to go cross-country two extra trips...there's your 200ms.

pandemicsyn13y ago· 1 in thread

Wait. "The actual HTTP request would then be sent to the intermediate instance which then forwards it on" are you forwarding this on in plain text ? Is the traffic at least traversing a VPN between the two locations ?

pjscott13y ago

As the article says, the intermediate traffic happens over long-lived HTTPS connections.

WALoeIII13y ago

Isn't this a "poor man's version" of what cloudflare offers?

They even have an optimized version called railgun (https://www.cloudflare.com/railgun) that only ships the diff across country.

crazygringo13y ago

Wow, this is actually really clever. Kudos to the engineer who thought of this.

donavanm13y ago

You can get this from a CDN like AWS CloudFront as well. CloudFront will keep a pool of persistent connections to the origin, whether it's S3 or a custom origin. You can also do HTTP or HTTPS over the port of your choice on the backend, enabling "mullet routing". The minimum TTL is 0, allowing you to vary content for each request.

One issue with CloudFront is the POST PUT DELETE verbs aren't currently supported, which is a kink for modifying data. You could use Route 53s LBR feature to route requests to nearby EC2 instances, then proxy back to your origin.

iamrekcah13y ago

And... how is this different than SSLStrip? Except maybe that SSLStrip also prints out the HTTP form values as the data passes through.

MIT_Hacker13y ago

This post shows off how engineers aren't the best at showing off their work. I think if the author abstracted this post and didn't dive so far into the technical aspects of the problem, it could appeal to a much wider audience.

For example, the discussion of nginx could be abstracted into a discussion of graph theory, where a handshake has to occur with a secure cluster of nodes.

This is all just IMHO. Great post though!

j / k navigate · click thread line to collapse

59 comments

48 comments · 18 top-level

buro913y ago· 6 in thread

I'm doing something very similar to this, the setup I'm using is:

DNSMadeEasy has a global traffic redirector ( http://www.dnsmadeeasy.com/services/global-traffic-director/ )

That then sends a request to the closest Linode data center.

I use Varnish behind nginx to additionally place a fast cache close to the edge to prevent unnecessary traffic over the VPN.

buro913y ago

The edge case is that some DNS providers (Google, OpenDNS) already pick what they feel is the closest end point.

I read about that stuff over here a while ago: http://tech.slashdot.org/story/11/08/30/1635232/google-and-o...

And this comment explains it best: http://tech.slashdot.org/comments.pl?sid=2404950&cid=372...

I'm just betting that for the vast majority of users this does bring about a noticeable increase in speed.

tedunangst13y ago

Google runs lots of DNS servers. You (your ISP) pick the 8.8.8.8 closest to you. That will in turn do the lookup and get the linode closest to google dns, which should also be close to you.

If you're using a North American Google DNS server, you'll get answers that say NA. If you use the DNS server in Europe, you'll get answers that say EU.

I'm assuming Google doesn't try to sync and cache between 8.8.8.8 instances, but I don't see why they would. That's a lot of work for no benefit.

1 more reply

jsatok13y ago

donavanm13y ago

Is this roll your own CDN significantly cheaper than another provider? Or is there some other advantage?

buro913y ago

I was already on Linode and I'm only serving a few hundred GB of static files per day (with the Linodes I have this is well within my free quota).

Linodes are cheap, I was already using them in a distributed fashion to reduce SSL roundtrips, and introducing Varnish was a small configuration change.

I have tried a few other providers (most recently CloudFlare). But I was generally not happy with them, usually due to a lack of visibility.

So I found that as my scenario made Varnish a good fit, and the bandwidth was within my allowance, and it was easy to do... well, I just did it.

tagxOP13y ago

Static files are still served by a normal CDN. This helps with dynamic HTTPS requests that change each time.

stevencorona13y ago· 5 in thread

Sounds cool, but this would only work on Amazon or datacenters w/ cross-data center private networks (SoftLayer has this, for example).

tagxOP13y ago

kenshiro_o13y ago

stevencorona13y ago

Oh, I guess I misunderstood. The load balancer <-> web server connection is over HTTPS, not HTTP.

sp33213y ago

You can edit (or delete) comments here for up to 2 hours after you make them.

1 more reply

rbanffy13y ago

You can have the endpoint servers participating in a VPN with the backend servers. They don't have to be on EC2. This way you wouldn't need to make the front-back requests via https.

aaronpk13y ago· 3 in thread

Doesn't this mean the traffic is being sent un-encrypted across the ocean?

TallGuyShort13y ago

Amazon's ELB (the EC2 load balancer) used to send HTTPS traffic to your back-end unencrypted, but I believe they have since fixed this.

ghotli13y ago

1 more reply

tagxOP13y ago

No, the pool of keep-alive connections are all encrypted as well.

jbyers13y ago· 2 in thread

How do you manage the keepalive connection pool? Are you managing this in nginx (via HTTP 1.1 backend support?) or using a different service?

With today's increased geographical coverage and rich set of services from Amazon, this is a much less daunting project if you can stomach the operational overhead.

Note that the lack of sanely-priced bandwidth and hosting providers in Australia is a huge problem. When Amazon lands EC2 there, it's going to really shake up that market.

tagxOP13y ago

We are using nginx. Newer versions support HTTP 1.1 backends (There is also a patch for older versions of nginx)

jbyers13y ago

How do you do get nginx to preconnect and maintain an appropriate-sized backend pool?

1 more reply

EvanAnderson13y ago· 2 in thread

The "pool of warm keep-alive connections to the main web servers" is still sending the traffic over HTTPS, then?

pjscott13y ago

The trick here is to cut down on the latency of establishing the session.

tagxOP13y ago

Yes, but the SSL handshake has already been completed ahead of time so all the overhead is reduced.

hythloday13y ago· 2 in thread

I'm sorry, I don't understand. How is this different from geographically distributed reverse proxies?

lancefisher13y ago

These proxies are doing SSL between themselves and the app server and using a pool of warm keep-alive connections to avoid multiple high-latency calls. That's a little more than just a reverse proxy.

WALoeIII13y ago

Thats what this is.

alexchamberlain13y ago· 2 in thread

TheOnly9213y ago

I suppose that you usually want to protect the part from client->server rather than just receiving encrypted things from server side.

alexchamberlain13y ago

Sorry, I don't understand?

1 more reply

saurik13y ago· 2 in thread

Standard CDNs will also accomplish this goal, and their bandwidth is normally cheaper than EC2 instances.

tagxOP13y ago

Normal CDNs don't do this with dynamic content that changes on every user request. Each api request we serve is different and saving 200ms almost doubles the performance.

saurik13y ago

http://news.ycombinator.com/item?id=2823268

2 more replies

dawolf13y ago· 2 in thread

1SaltwaterC13y ago

> What happens when all of those are occupied?

dawolf13y ago

Was more a rhetorical question. ;)

ammmir13y ago· 1 in thread

Maybe I don't understand the problem correctly, but why not just preflight an HTTPS request when your widget loads?

tagxOP13y ago

We don't make any connections until the website calls us. At that point we load a personalized dialog for the user and we want that request to be as fast as possible.

steve891813y ago· 1 in thread

Over the lifetime of the connection, is it possible that this latency could be longer than 200 ms?

tagxOP13y ago

For the total latency to be longer than 200ms, about 20 requests would need to be made on the same connection, which will not happen given the number of requests we do at a time.

yalogin13y ago· 1 in thread

Wonder what the server certificate checking they are doing is. Its taking them 200ms seems a lot.

slpsys13y ago

pandemicsyn13y ago· 1 in thread

pjscott13y ago

As the article says, the intermediate traffic happens over long-lived HTTPS connections.

WALoeIII13y ago

Isn't this a "poor man's version" of what cloudflare offers?

They even have an optimized version called railgun (https://www.cloudflare.com/railgun) that only ships the diff across country.

crazygringo13y ago

Wow, this is actually really clever. Kudos to the engineer who thought of this.

donavanm13y ago

iamrekcah13y ago

And... how is this different than SSLStrip? Except maybe that SSLStrip also prints out the HTTP form values as the data passes through.

MIT_Hacker13y ago

For example, the discussion of nginx could be abstracted into a discussion of graph theory, where a handshake has to occur with a secure cluster of nodes.

This is all just IMHO. Great post though!

j / k navigate · click thread line to collapse