undefined | Better HN

0 pointsrdtsc13y ago0 comments

Hi bhauer,

> concurrency higher than 256 yields very little of interest. The reason being that we are fully saturating the server's CPU cores at 256 concurrency [1].

Well websocket connections are becoming more and more popular. Maybe that's a different benchmark.

But the level of concurrency is pretty important. It basically tells the story of what happens to a "slashdotted" server. If nothing crazy like that happens than most servers might be ok, just maybe have a little higher latency. It is when shit hits the fan that different servers start separating from the herd. Some gracefully slow down, some scale smoothly across CPUs, some start throwing socket errors.

Who cares about these issues? Well anyone who becomes successful. If there are no visitors and no customers and only a GET request here and there every 10 minutes, then those places could really just use any server. A simple Perl or Ruby one will do. Now those that grow and see customers they will be interested in what happens in cases like that. There is a traffic spike at launch of new product so now there is a 200% increase in traffic for that one day and it tapers off.

Maybe we just come from a different background and that's why they focus is on different metrics.

> It doesn't make the server any faster at completing those requests.

But I am not sure what story does benchmarking the servers at an artificial level of concurrency tells us. Maybe it helps those that have a throttling/balancing proxy that always sets the number of connection to 256 at most and otherwise balances out the rest to other servers... And I am not sure if a the heuristic that "If it can handle 2456 requests/second with a single connection at a time" can be extrapolated and implies then it can "handle 2456 concurrent connections in a single second".

0 comments

1 comments · 1 top-level

bhauer13y ago

Hi again rdtsc,

Yes, WebSocket is a different test and we aim to include a WebSocket test in the future.

I understand what you're saying about the "Slashdot Effect," but I think you may be misunderstanding me.

Taken from the context of preparing for a Slashdot effect, the 256-concurrency test we are running against high-performance frameworks on our i7 hardware plays out like the world's worst case of Slashdotting. Think about it for a moment: Finagle is processing 232,000 JSON requests per second. It would be even higher if our gigabit Ethernet weren't limiting the test.

With requests being pulled off the web server's inbound queue and processed so quickly, do you think it would be easy to simulate and maintain 1,000, 5,000, or 10,000+ concurrency?

Conceptually, the load tool has an opposite goal of the web server. From the load tool's point of view, an ideal request is one that takes infinitely long. If the request takes a long time for the server to fulfill, the load tool can just keep the request's connection open and satisfy the user's concurrency requirement. East peasy. But as soon as the server fulfills the request, the load tool must snap to it and get another request created ASAP to keep up its agreed-to concurrency level. Asking a load tool to maintain 1,000 (or worse) concurrency versus a web server completing requests at the rate of 232,000 per second is asking a lot. Wrk is up to the challenge, but gigabit Ethernet holds everything back. The Ethernet saturation means that even if you crank up concurrency against a high-performance web server, the results look basically the same. The web server simply doesn't perceive the concurrency target because gigabit Ethernet can't meet the demand.

As I wrote in the blog entry I cited earlier, if you start thinking about the idealized goal of a web server--to reduce all HTTP requests to zero milliseconds--it should become more clear why increasing concurrency beyond the CPU's saturation level doesn't actually do much except show the depth of the web-server's inbound request queue. In other words, once we've saturated the web server's CPUs with busy worker threads, we can increase concurrency for only one goal: to determine at what rate can we get the server to reject requests with 500-series HTTP responses. For the JSON test on gigabit Ethernet, we find it's impossible to cause high-performance frameworks to return 500-series HTTP responses because the load tool simply cannot transmit requests fast enough to keep the server's request queue full.

A slightly less-performant framework--let's use Unfiltered as an example--is not running into the gigabit Ethernet wall but is still processing 165,000 JSON requests per second. Since the network is not limiting the test, the CPU cores are completely saturated. 100% utilization.

165,000 requests per second is way worse than being "slashdotted." Slashdot has many readers, but they can't generate that kind of request rate in their wildest dreams. Hacker News also has a great deal of readers, but nothing with a narrow audience such as this could generate 165,000 requests per second from readers clicking on a news link. Not even an article about Tesla, Google, hackathons, lean startups, girl coders, 3D printers, web frameworks, and classic computing all wrapped into one could generate that kind of request rate from Hacker News readers. Being at the #1 spot on Hacker News will see a few dozen requests per second or so.

* Web servers maintain an inbound queue to hold requests that are to be handed off to worker threads or processes.

* If there are worker threads available, the web server will assign requests to worker threads immediately without queuing the request.

* If there are no worker threads available, the web server will put the request into its queue.

* If a worker thread becomes available and a request is in the queue, it will be assigned as above.

* If no worker thread is available, and the queue is full, the server will reject the request with a 500-series HTTP response.

* Worker threads are made available very quickly if requests are fulfilled very quickly.

* The server becomes starved for worker threads if requests are not fulfilled quickly enough to keep the inbound queue routinely flushed.

* Actual usage does not come in as 1,000 requests in a nanosecond followed by nothing, and then another burst of 1,000 requests in a nanosecond. Even if it did, because gigabit Ethernet is slow, the server's perception of that traffic would be 1,000 requests spread over several milliseconds.

* For (nearly) all platforms and frameworks below roughly 200,000 JSON responses per second, 256 concurrency causes the worker threads to completely saturate the server's CPU cores, busy fulfilling requests. In fact, in many frameworks' cases, even 128 and 256 concurrency are nearly identical--check the data tables view in the result page.

* Since the CPU cores are saturated, increasing concurrency only can demonstrate the limits of the server's inbound request queue. Doing so does not show anything of interest from a performance (completed requests per second; speed of computation) perspective. Once your CPU cores are saturated, your server is in dire risk of filling up its request queue. A hot-fix is quickly adjusting the queue size in the configuration and hoping that's enough to survive the traffic; the real fix is simply fulfilling requests faster.

* In practice, if your server can fulfill 200,000+ requests per second, your server will essentially never actually perceive concurrency over 256 anyway. Gigabit Ethernet simply can't transmit the requests rapidly enough.

I find that questions about very high concurrency (where the questioner is not asking about a WebSocket scenario wherein connections are held live but are mostly idle) are confusing high concurrency with a simpler matter: inability to fulfill requests rapidly enough to keep the web server's inbound queue flushed. That is a performance problem, plain and simple, and not a high-concurrency problem.

In other words, one may perceive a large number of live connections, and think, "this is a high concurrency situation," but what they are actually contending with is a side-effect of being slow.

j / k navigate · click thread line to collapse