Hi again rdtsc,
Yes, WebSocket is a different test and we aim to include a WebSocket test in the future.
I understand what you're saying about the "Slashdot Effect," but I think you may be misunderstanding me.
Taken from the context of preparing for a Slashdot effect, the 256-concurrency test we are running against high-performance frameworks on our i7 hardware plays out like the world's worst case of Slashdotting. Think about it for a moment: Finagle is processing 232,000 JSON requests per second. It would be even higher if our gigabit Ethernet weren't limiting the test.
With requests being pulled off the web server's inbound queue and processed so quickly, do you think it would be easy to simulate and maintain 1,000, 5,000, or 10,000+ concurrency?
Conceptually, the load tool has an opposite goal of the web server. From the load tool's point of view, an ideal request is one that takes infinitely long. If the request takes a long time for the server to fulfill, the load tool can just keep the request's connection open and satisfy the user's concurrency requirement. East peasy. But as soon as the server fulfills the request, the load tool must snap to it and get another request created ASAP to keep up its agreed-to concurrency level. Asking a load tool to maintain 1,000 (or worse) concurrency versus a web server completing requests at the rate of 232,000 per second is asking a lot. Wrk is up to the challenge, but gigabit Ethernet holds everything back. The Ethernet saturation means that even if you crank up concurrency against a high-performance web server, the results look basically the same. The web server simply doesn't perceive the concurrency target because gigabit Ethernet can't meet the demand.
As I wrote in the blog entry I cited earlier, if you start thinking about the idealized goal of a web server--to reduce all HTTP requests to zero milliseconds--it should become more clear why increasing concurrency beyond the CPU's saturation level doesn't actually do much except show the depth of the web-server's inbound request queue. In other words, once we've saturated the web server's CPUs with busy worker threads, we can increase concurrency for only one goal: to determine at what rate can we get the server to reject requests with 500-series HTTP responses. For the JSON test on gigabit Ethernet, we find it's impossible to cause high-performance frameworks to return 500-series HTTP responses because the load tool simply cannot transmit requests fast enough to keep the server's request queue full.
A slightly less-performant framework--let's use Unfiltered as an example--is not running into the gigabit Ethernet wall but is still processing 165,000 JSON requests per second. Since the network is not limiting the test, the CPU cores are completely saturated. 100% utilization.
165,000 requests per second is way worse than being "slashdotted." Slashdot has many readers, but they can't generate that kind of request rate in their wildest dreams. Hacker News also has a great deal of readers, but nothing with a narrow audience such as this could generate 165,000 requests per second from readers clicking on a news link. Not even an article about Tesla, Google, hackathons, lean startups, girl coders, 3D printers, web frameworks, and classic computing all wrapped into one could generate that kind of request rate from Hacker News readers. Being at the #1 spot on Hacker News will see a few dozen requests per second or so.
* Web servers maintain an inbound queue to hold requests that are to be handed off to worker threads or processes.
* If there are worker threads available, the web server will assign requests to worker threads immediately without queuing the request.
* If there are no worker threads available, the web server will put the request into its queue.
* If a worker thread becomes available and a request is in the queue, it will be assigned as above.
* If no worker thread is available, and the queue is full, the server will reject the request with a 500-series HTTP response.
* Worker threads are made available very quickly if requests are fulfilled very quickly.
* The server becomes starved for worker threads if requests are not fulfilled quickly enough to keep the inbound queue routinely flushed.
* Actual usage does not come in as 1,000 requests in a nanosecond followed by nothing, and then another burst of 1,000 requests in a nanosecond. Even if it did, because gigabit Ethernet is slow, the server's perception of that traffic would be 1,000 requests spread over several milliseconds.
* For (nearly) all platforms and frameworks below roughly 200,000 JSON responses per second, 256 concurrency causes the worker threads to completely saturate the server's CPU cores, busy fulfilling requests. In fact, in many frameworks' cases, even 128 and 256 concurrency are nearly identical--check the data tables view in the result page.
* Since the CPU cores are saturated, increasing concurrency only can demonstrate the limits of the server's inbound request queue. Doing so does not show anything of interest from a performance (completed requests per second; speed of computation) perspective. Once your CPU cores are saturated, your server is in dire risk of filling up its request queue. A hot-fix is quickly adjusting the queue size in the configuration and hoping that's enough to survive the traffic; the real fix is simply fulfilling requests faster.
* In practice, if your server can fulfill 200,000+ requests per second, your server will essentially never actually perceive concurrency over 256 anyway. Gigabit Ethernet simply can't transmit the requests rapidly enough.
I find that questions about very high concurrency (where the questioner is not asking about a WebSocket scenario wherein connections are held live but are mostly idle) are confusing high concurrency with a simpler matter: inability to fulfill requests rapidly enough to keep the web server's inbound queue flushed. That is a performance problem, plain and simple, and not a high-concurrency problem.
In other words, one may perceive a large number of live connections, and think, "this is a high concurrency situation," but what they are actually contending with is a side-effect of being slow.