Google and Microsoft Cheat on Slow-Start. Should You? (opens in new tab)

(blog.benstrong.com)

437 pointsbstrong15y ago67 comments

67 comments

52 comments · 21 top-level

Pahalial15y ago· 10 in thread

This is interesting, but the article and I differ greatly at this point: "Being non-standards-compliant in a way that privileges their flows relative to others seems more than a little hypocritical from a company that's making such a fuss about network neutrality."

No, no it's not. This has nothing to do with network neutrality; it's a purely server-side change/fix. Not only that, they're benefiting users without requiring anyone else to change while they wait for standards bodies to catch up. This is a similar scenario to HTML5 video, and distinctly more clear-cut than e.g. '802.11n draft' wireless routers in my opinion.

shadowmatter15y ago

"Benefiting _their_ users," yes. But they're not benefiting whoever else is sharing the smallest-capacity link -- what he's arguing is that Google is crowding them out.

Net neutrality is not the only way to privilege your flows on the Internet: There's nothing to stop me from writing a crude application-layer protocol atop UDP that implements reliability but not congestion control. (You maybe had to implement something like this for your networking class in school; otherwise you could start with a protocol like DCCP.) If I were to use that to send data as fast as I could to some remote computer, I could be sending more data than the smallest-capacity link could handle. Other TCP/IP connections sharing that link would detect data loss and thus reduce the amount of data they put in transit, but my protocol wouldn't have to. I can monopolize that link.

So assuming your physical layer is tin cans and string, what he's arguing is that if you have a link with a capacity of 12 segments, then data from Google will use 10 of them and a client will never expand its outstanding data beyond 2 segments. If both used vanilla TCP/IP, they should share the link evenly.

Of course, speed is a critical factor for Google. Android by default uses TCP Westwood+.

It's been six years since I tinkered with TCP/IP and really focused on networking, so someone please correct me if I'm wrong >_<

kragen15y ago

> So assuming your physical layer is tin cans and string, what he's arguing is that if you have a link with a capacity of 12 segments, then data from Google will use 10 of them and a client will never expand its outstanding data beyond 2 segments. If both used vanilla TCP/IP, they should share the link evenly.

He's not showing any evidence that Google wouldn't back off to a smaller window in the face of packet loss; he's just saying their initial window is 9 segments. Once one of those 10 segments in your tin-can router falls out of its receive buffer, Google will be down to 9 and the other guys get up to 3, and packet loss will random-walk toward fairness.

Right? I've never tinkered with this stuff, so please correct me if I'm wrong.

1 more reply

sh1mmer15y ago

While it's true that Google is starting aggressively they are still using the slow-start algorithm, not ignoring it. If your physical layer is tin cans with string sure you'll get crowded out, but then the connection will degrade in the same way it would if they were using the default window size.

Microsoft on the other hand should really use slow-start.

I think it's difficult to argue that the profile of the underlying network hasn't changed since the last time an RFC was standardized on this issue. The problem is revving the magic numbers in the standards periodically to reflect changes in topology.

While you can say that Google should stick to the standard, unlike other net neutrality issues this isn't a change available only to a few large companies. Anyone with control of their stack can make this configuration.

The issue in net neutrality is to ensure that changes which are economically feasible for only a small group of companies are not enacted so that they cannot form a defacto monopoly.

VladRussian15y ago

that sums about the "trick" i used in 97-98 over "tin cans and string" Russian links of the time to make sure that my large data would make it. It was really hard on my "neighbors" at the time.

ankimal15y ago

You re spot on. Whether the protocol needs to change with changing times and network speeds is an entirely different question.

bstrongOP15y ago

I'm not sure how the fact that it's a server-side change makes it ok. I think that everyone would agree that turning off congestion control entirely on the server side would be bad and would negatively impact other flows.

The question, then, is whether this change is significant enough to increase internet congestion (and therefore packet loss for others). This is a subject of heated debate at the moment.

dedward15y ago

It doesn't make it okay, it just makes it not a "net neutrality" issue. It's more of a "good neighbour" issue.

3 more replies

jamesaguilar15y ago

Also, as a technical point, the amount of total internet utilization caused by the fetching of http web pages is so small that I doubt this practice could significantly harm any other traffic.

chollida115y ago

> that I doubt this practice could significantly harm any other traffic.

I agree with you, but whenever I see something like this the back of my mind always chimes in with "famous last words".

1 more reply

dedward15y ago

IT has nothing to do with net neutrality, but it does have to do with the stability and reliability of the internet at large. If everyone, for example, tweaked TCP in different, incompatible ways, we'd have contention all over and things just wouldn't work well.

ergo9815y ago· 7 in thread

Very interesting. Is such a thing configurable in Apache or nginx? It seems to be a rather rude behavior, but I'm curious how accessible it is.

riffraff15y ago

I don't think any web server gets to this level of control on the tcp/ip level, this is something that should be addressed in the OS' network stack.

eru15y ago

Unless you have an exokernel operating system.

2 more replies

bstrongOP15y ago

It's not tunable at the app level. On linux it requires a kernel patch to change. I'm not sure about other OS's.

pquerna15y ago

I wish there was a patch where you could enable it on linux via a ioctl/setsockopt. Would be very useful to those of us who aren't google.

Edit:

I wished, google already sent the patch in May for this:

http://www.amailbox.org/mailarchive/linux-netdev/2010/5/26/6...

Nice!

1 more reply

8plot15y ago

something like: ip route change default via 0.0.0.0 dev eth0 initcwnd 10

self15y ago

Perhaps "ip route change default via <your gateway address> dev eth0 initcwnd 10" -- your gateway address, not 0.0.0.0.

kragen15y ago

That doesn't seem to work for me:

    RTNETLINK answers: No such file or directory

Apparently I need to patch my kernel?

1 more reply

arturadib15y ago· 5 in thread

Really interesting research, but man, if you really, really have to worry about premature optimization for your web app, I'd start with the usual bottlenecks first - i.e. anything that involves disk IO and/or processor work, such as databases and mathematical calculations.

Unless you are serving static content only (in which case you are hardly creating an "app"), the milliseconds you might save with TCP-level optimizations are peanuts in comparison to the multiple seconds your database and computations will be requiring.

kragen15y ago

This is exactly backwards. My network latency to North America is >200ms (RTT). Three round-trip times is about 750ms. You can do 75 disk accesses and three billion mathematical calculations in that time.

If your database and computations are requiring multiple seconds on a normal web page, you have serious user experience problems. When you're under 140ms, it feels like the response is happening at the same time as the request (Dabrowski and Munson weren't able to reproduce the old 50- or 100-millisecond rule of thumb in what sounds to me like a poorly-controlled experiment; http://books.google.com/books?id=aU0MR-MA-BMC&pg=PA292&#...). Increasing Google search page render time from 400ms to 900ms dropped traffic by 20%, according to Marissa Mayer (http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20....). Traditional OLTP systems tried to keep response times under one second; beyond a second, people start to get frustrated and wonder if something is broken.

So, for a normal application, the milliseconds you might save by optimizing your database and computations are peanuts in comparison to the second or more that TCP-level optimizations could save you.

ssp15y ago

Dabrowski and Munson weren't able to reproduce the old 50- or 100-millisecond rule of thumb in what sounds to me like a poorly-controlled experiment

It sounds rather poorly-controlled to me too.

They mention that they didn't account for the time from mouse-down to mouse-release. Seriously? Here is a program that can measure that difference: http://www.daimi.au.dk/~sandmann/click.py. It uses GTK+ so you'll probably need Linux to run it. For me, the delay seems to be around 30 ms. They also don't mention the framerate of the screen or whether they controlled for that. On a 60Hz monitor there is a delay of 17ms between frames.

Both 17 and 30 ms are huge numbers if you are measuring intervals on the order of 100ms.

Then there is the question of what you consciously perceive vs. what you subconsciously perceive. It would surprise me if you couldn't measure a difference even in cases where the subjects didn't notice anything themselves.

Finally, we can definitely reject the idea that latencies below 100 ms never matter: there is an obvious difference between a 10 fps animation and a 60 fps one.

_delirium15y ago

> If your database and computations are requiring multiple seconds on a normal web page, you have serious user experience problems.

This is almost always the case when I think "this website is slow", though. When HN is slow, it's not because of some added network latency, but because something is making HN take 3 seconds to serve up my "threads" page, or 2 seconds to successfully post my comment. Same with "reddit is slow", or "this Wordpress blog is taking forever to load" or "Twitter spins the 'working' icon for 2 seconds when I click on that 'retweets' thing in the sidebar before returning anything". Those things are really common in my experience, and those rather than network round-trip times are by far the biggest and most annoying slowdowns, at least in my browsing.

1 more reply

arturadib15y ago

Clearly you have never dealt with a large database.

bstrongOP15y ago

I agree fully. I focused on the front-end because squeezing milliseconds out of the backend is my day job, and I'm pretty confident I can generate pages in < 50ms. Given that, I thought it would be interesting to see just how much I could squeeze out of the delivery time.

epi0Bauqu15y ago· 3 in thread

Does anyone know what you would do to easily tune this for FreeBSD?

SageRaven15y ago

My guess is setting the sysctl "net.inet.tcp.slowstart_flightsize" from the default value of "1" to something else.

epi0Bauqu15y ago

Thx. Seems to work! Looks like the defaults are:

net.inet.tcp.local_slowstart_flightsize: 4 net.inet.tcp.slowstart_flightsize: 1

Also useful: http://spatula.net/blog/2007/04/freebsd-network-performance-...

StavrosK15y ago

Any idea if that would work on Linux?

1 more reply

jhrobert15y ago· 2 in thread

I believe the current limit for slow-start are not adapted to the current Internet anymore.

According to my own observations, the first 30Ko of my pages seem to be transfered faster then the next 30ko. It is not until much more is sent that the average throughput eventually get up to what it was during the first 30ko.

This is definitely weird.

Note: I am using Ubuntu on EC2 hosted VMs.

As a result, for as much as I can, I try to keep the size of my content below 30ko, using multiple concurrent HTTP requests.

I believe this is related to "slow-start" being pessimistic.

Unfortunately, "slow-start" is not configurable on Linux and I don't feel confident enough to go with some kernel level patch...

Any clue?

danudey15y ago

You can't use custom kernels on Amazon EC2 anyway, so kernel patches aren't really an option (unless you had some kind of kernel module you can load that would change the value in memory, which seems dangerous).

spullara15y ago

You can use custom kernels on EC2 now.

http://ec2-downloads.s3.amazonaws.com/user_specified_kernels...

ajb15y ago· 1 in thread

Google is proposing this should be allowed as a modification to rfc-3390. Their draft is http://tools.ietf.org/html/draft-hkchu-tcpm-initcwnd-01. Active discussion of the issue may be found at http://www.ietf.org/mail-archive/web/tcpm/current/maillist.h...

benblack15y ago

Minor correction: the current draft is http://tools.ietf.org/html/draft-ietf-tcpm-initcwnd-00

sdizdar15y ago· 1 in thread

It seems Linux does not have option to skip slow start and just use receiver's advertised window. Does anybody know where in net/ipv4/tcp.c this should be set?

wmf15y ago

Note that we're not talking about skipping slow start completely; we're talking about changing slow start parameters.

bemmu15y ago· 1 in thread

Do app engine apps also serve like this?

jws15y ago

I don't think so. I'm pulling a 27k URL over a 100ms latency and I'm seeing roughly 2, 4, 8, 8... for the send bursts.

d0m15y ago· 1 in thread

Interesting, but there are so much more important things to consider before worrying about the load time. (i.e. 0 user experiencing 30 ms is far worst..)

patio1115y ago

Half agree: this level of optimization is less useful when you're not starting from Google's performance baseline. That said, can't agree with point generally applied to load times: optimizing them made a difference even at BCC scales back in 2008ish. Implementing half of the YSlow recommendations takes under an hour in modern web frameworks.

sh1mmer15y ago

This isn't much of a secret. As it says in the article Google are lobbying to change the initial window size in the RFC. A lot of people here at Yahoo! want to see that too, and personally I think we should be more aggressive with our initial window, RFC be damned.

This topic was covered really well by Amazon's John Rauser at Velocity Conf: http://velocityconf.com/velocity2010/public/schedule/detail/...

To address the points in the conclusion:

1. Fast is good. Fast is also profit.

2. The net-neutrality argument here is totally bogus, anyone that knows how can up their slow-start window today if they choose to. There doesn't really have anything to do with traffic shaping.

3. Google have been using their usual data driven approach to support their proposal for IETF. We need a lot more of that. It's great. The only way we can really find out how the Internet in general will react to changes like this is to test them in some real world environment.

4. I agree, slow-start is a good algorithm with a very valid purpose. The real problem here is that the magic numbers powering it aren't being kept inline with changes to connectivity technology and increases in consumer/commercial bandwidth.

ig115y ago

There's all sorts of latency problems caused by the congestion window size (and how it gets reset), because of how the algorithm works unless you're sending a continuous stream of data (which allows the congestion window to grow) than the window gets reset to it's initial size which can mean waiting for an ack round-trip before you get the whole message.

While it's not that big a deal if your users are local to you, if they're on a different continent each extra roundtrip can easily add 100ms.

I used to do TCP/IP tuning for low latency trading applications (sometimes you need to use a third party data protocol so can't just use UDP), this sort of stuff used to bite us all the time.

If latency is important it is worth sitting down with tcpdump and seeing how your website loads (i.e how many packets, how many acks, etc.) as often there are ways of tweaking connection setting (either via socket options or kernel settings) that can result in higher performance.

(Try using tcp_slow_start_after_idle if you're using a recent linux kernel; this won't give you a bigger initial window, but it means once your window size has grown it won't get reset straight away if you have a gap between data sends)

necro15y ago

There was a large discussion earlier about the subject. I posted detailed comments in that thread so I won't repost but just link. http://news.ycombinator.com/item?id=1143317

matthiasl15y ago

Can anyone else repeat his experiment?

I tried repeating the experiment. I'm in Sweden, so, annoyingly, a request to google.com redirects to google.se. If I send my request directly to google.se, I get 9k response in 130ms and the initial window looks like 4 to me, i.e. I can't see anything unexpected happening.

I then tried repeating on Amazon EC2. I can't see anything unexpected there either, but the RTT from EC2 to google is only about 3ms, which means I can't assume that the ACKS don't get there.

(The original article author looks at how long the initial 3-way handshake takes and then assumes that all packets take that long, or, probably, half as long, i.e. he assumes that ACKS sent up to one RTT before a packet from google can't have arrived at google in time to affect that packet)

Can anyone else reproduce the experiment?

Other ideas: repeat from Sweden, but send a cookie so that I really get google.com. Repeat from EC2, but make sure I never send any ACKs after the three-way handshake. I'm not curious enough to do the latter, it's a fair bit of work.

vinutheraj15y ago

"It is better to ask for forgiveness than permission" - Rear Admiral Grace Hopper

bbuffone15y ago

We have been measuring google's "reachability" performance and it is quite amazing. The results of their tuning is that they can achieve downloading of their initial HTML in under ~250 milliseconds and many locations under 100 ms. The other thing the data shows is the standard deviation on the download times are very small making the site consistently load fast.

http://www.yottaa.com/url/4be004065df8ca5a730001fb/reachabil...

tlrobinson15y ago

"They actually managed to deliver the whole response in just 70ms, 30ms of which was spent generating the response"

Isn't part of that just the network latency? Based on the timestamps for the SYN and SYN-ACK it looks like a RTT of about 16ms.

EDIT: Nevermind.

Request was sent by the client at 00.017437

Request ACK was received by the client at 00.037139

RTT of about 20ms, so the request was received by the server around 00.027

First packet of the response was received by the client at 00.067151

67-27=40. Assuming a latency of 10ms it took 30ms to generate the request.

fleitz15y ago

One should also note that when IE is talking to IIS, the request will be sent in the first packet and the initial response will be sent in the first ACK. You can actually complete a request and response (if small enough) in 3 packets. Also, when tearing down the connection, it's left half-open.

http://osdir.com/ml/mozilla.devel.netlib/2003-01/msg00018.ht...

samueladam15y ago

Mike Belshe - An Argument For Changing TCP Slow Start (Jan 11, 2010):

http://sites.google.com/a/chromium.org/dev/spdy/An_Argument_...

bengtan15y ago

On Ubuntu 8.04 (at least), you can set this per route via something like:

ip route change default via x.x.x.x dev eth0 initcwnd 6

but please test thoroughly if trying this.

iepaul15y ago

very interesting post.

1 more reply

phillijw15y ago

Interesting. But it really annoys me when people use "begs the question" incorrectly. Look it up!

j / k navigate · click thread line to collapse

67 comments

52 comments · 21 top-level

Pahalial15y ago· 10 in thread

shadowmatter15y ago

"Benefiting _their_ users," yes. But they're not benefiting whoever else is sharing the smallest-capacity link -- what he's arguing is that Google is crowding them out.

Of course, speed is a critical factor for Google. Android by default uses TCP Westwood+.

It's been six years since I tinkered with TCP/IP and really focused on networking, so someone please correct me if I'm wrong >_<

kragen15y ago

Right? I've never tinkered with this stuff, so please correct me if I'm wrong.

1 more reply

sh1mmer15y ago

Microsoft on the other hand should really use slow-start.

The issue in net neutrality is to ensure that changes which are economically feasible for only a small group of companies are not enacted so that they cannot form a defacto monopoly.

VladRussian15y ago

that sums about the "trick" i used in 97-98 over "tin cans and string" Russian links of the time to make sure that my large data would make it. It was really hard on my "neighbors" at the time.

ankimal15y ago

You re spot on. Whether the protocol needs to change with changing times and network speeds is an entirely different question.

bstrongOP15y ago

The question, then, is whether this change is significant enough to increase internet congestion (and therefore packet loss for others). This is a subject of heated debate at the moment.

dedward15y ago

It doesn't make it okay, it just makes it not a "net neutrality" issue. It's more of a "good neighbour" issue.

3 more replies

jamesaguilar15y ago

Also, as a technical point, the amount of total internet utilization caused by the fetching of http web pages is so small that I doubt this practice could significantly harm any other traffic.

chollida115y ago

> that I doubt this practice could significantly harm any other traffic.

I agree with you, but whenever I see something like this the back of my mind always chimes in with "famous last words".

1 more reply

dedward15y ago

ergo9815y ago· 7 in thread

Very interesting. Is such a thing configurable in Apache or nginx? It seems to be a rather rude behavior, but I'm curious how accessible it is.

riffraff15y ago

I don't think any web server gets to this level of control on the tcp/ip level, this is something that should be addressed in the OS' network stack.

eru15y ago

Unless you have an exokernel operating system.

2 more replies

bstrongOP15y ago

It's not tunable at the app level. On linux it requires a kernel patch to change. I'm not sure about other OS's.

pquerna15y ago

I wish there was a patch where you could enable it on linux via a ioctl/setsockopt. Would be very useful to those of us who aren't google.

Edit:

I wished, google already sent the patch in May for this:

http://www.amailbox.org/mailarchive/linux-netdev/2010/5/26/6...

Nice!

1 more reply

8plot15y ago

something like: ip route change default via 0.0.0.0 dev eth0 initcwnd 10

self15y ago

Perhaps "ip route change default via <your gateway address> dev eth0 initcwnd 10" -- your gateway address, not 0.0.0.0.

kragen15y ago

That doesn't seem to work for me:

    RTNETLINK answers: No such file or directory

Apparently I need to patch my kernel?

1 more reply

arturadib15y ago· 5 in thread

kragen15y ago

ssp15y ago

Dabrowski and Munson weren't able to reproduce the old 50- or 100-millisecond rule of thumb in what sounds to me like a poorly-controlled experiment

It sounds rather poorly-controlled to me too.

Both 17 and 30 ms are huge numbers if you are measuring intervals on the order of 100ms.

Finally, we can definitely reject the idea that latencies below 100 ms never matter: there is an obvious difference between a 10 fps animation and a 60 fps one.

_delirium15y ago

> If your database and computations are requiring multiple seconds on a normal web page, you have serious user experience problems.

1 more reply

arturadib15y ago

Clearly you have never dealt with a large database.

bstrongOP15y ago

epi0Bauqu15y ago· 3 in thread

Does anyone know what you would do to easily tune this for FreeBSD?

SageRaven15y ago

My guess is setting the sysctl "net.inet.tcp.slowstart_flightsize" from the default value of "1" to something else.

epi0Bauqu15y ago

Thx. Seems to work! Looks like the defaults are:

net.inet.tcp.local_slowstart_flightsize: 4 net.inet.tcp.slowstart_flightsize: 1

Also useful: http://spatula.net/blog/2007/04/freebsd-network-performance-...

StavrosK15y ago

Any idea if that would work on Linux?

1 more reply

jhrobert15y ago· 2 in thread

I believe the current limit for slow-start are not adapted to the current Internet anymore.

This is definitely weird.

Note: I am using Ubuntu on EC2 hosted VMs.

As a result, for as much as I can, I try to keep the size of my content below 30ko, using multiple concurrent HTTP requests.

I believe this is related to "slow-start" being pessimistic.

Unfortunately, "slow-start" is not configurable on Linux and I don't feel confident enough to go with some kernel level patch...

Any clue?

danudey15y ago

spullara15y ago

You can use custom kernels on EC2 now.

http://ec2-downloads.s3.amazonaws.com/user_specified_kernels...

ajb15y ago· 1 in thread

benblack15y ago

Minor correction: the current draft is http://tools.ietf.org/html/draft-ietf-tcpm-initcwnd-00

sdizdar15y ago· 1 in thread

It seems Linux does not have option to skip slow start and just use receiver's advertised window. Does anybody know where in net/ipv4/tcp.c this should be set?

wmf15y ago

Note that we're not talking about skipping slow start completely; we're talking about changing slow start parameters.

bemmu15y ago· 1 in thread

Do app engine apps also serve like this?

jws15y ago

I don't think so. I'm pulling a 27k URL over a 100ms latency and I'm seeing roughly 2, 4, 8, 8... for the send bursts.

d0m15y ago· 1 in thread

Interesting, but there are so much more important things to consider before worrying about the load time. (i.e. 0 user experiencing 30 ms is far worst..)

patio1115y ago

sh1mmer15y ago

This topic was covered really well by Amazon's John Rauser at Velocity Conf: http://velocityconf.com/velocity2010/public/schedule/detail/...

To address the points in the conclusion:

1. Fast is good. Fast is also profit.

2. The net-neutrality argument here is totally bogus, anyone that knows how can up their slow-start window today if they choose to. There doesn't really have anything to do with traffic shaping.

ig115y ago

While it's not that big a deal if your users are local to you, if they're on a different continent each extra roundtrip can easily add 100ms.

I used to do TCP/IP tuning for low latency trading applications (sometimes you need to use a third party data protocol so can't just use UDP), this sort of stuff used to bite us all the time.

necro15y ago

There was a large discussion earlier about the subject. I posted detailed comments in that thread so I won't repost but just link. http://news.ycombinator.com/item?id=1143317

matthiasl15y ago

Can anyone else repeat his experiment?

I then tried repeating on Amazon EC2. I can't see anything unexpected there either, but the RTT from EC2 to google is only about 3ms, which means I can't assume that the ACKS don't get there.

Can anyone else reproduce the experiment?

vinutheraj15y ago

"It is better to ask for forgiveness than permission" - Rear Admiral Grace Hopper

bbuffone15y ago

http://www.yottaa.com/url/4be004065df8ca5a730001fb/reachabil...

tlrobinson15y ago

"They actually managed to deliver the whole response in just 70ms, 30ms of which was spent generating the response"

Isn't part of that just the network latency? Based on the timestamps for the SYN and SYN-ACK it looks like a RTT of about 16ms.

EDIT: Nevermind.

Request was sent by the client at 00.017437

Request ACK was received by the client at 00.037139

RTT of about 20ms, so the request was received by the server around 00.027