undefined | Better HN

0 pointsjerf9y ago0 comments

First of all, tweaking what we're measuring is sort of my point.

Second, though, if we're going to slice and dice that way, which is valid, I think you need to go even farther and point out that there are two cases. The first is when you are hammering requests through as quickly as possible, and the second is when you are not.

The latency numbers are highly specific to your load, because as load increases, things like scheduling algorithms start mattering more, especially the fundamental tradeoffs between latency and throughput. Knowing the distribution of these numbers under load is important... though I'd suggest that said distribution is still fairly likely to be dominated by the user code rather than the framework code. But the hello world benchmark is still a crucial one, because it serves as the limit of performance, so if you can show that some webserver can't even do what you need with that, you can eliminate it.

There is also the "request overhead in seconds" you get for a relatively uncontested system, where the system would have to be fairly pathologically broken to see a high variance in results. (You'll get some from GC, but in this case I wouldn't call that variance high in the patterns you'll see from a hello-world handler.) This number is important because while it is in a lot of ways more boring, it is also I suspect the relevant number for the modal web server. I suspect this is another one of those cases where some very visual image leaps to mind, the web server for Google or Facebook that is constantly getting hammered at 90% of capacity (and that carefully by design since systems get increasingly pathological as you approach 100%) serving highly optimized requests where every microsecond matters... but those are actually the rare web servers in the world. Most webservers are doing at least one of twiddling their thumbs for long stretches of time or waiting for user code to do what it's going to do in the milliseconds... or seconds... or minutes....

0 comments

1 comments · 1 top-level

kasey_junk9y ago

If what you are suggesting is that latency measurement is difficult but what is probably most interesting in the context of http service libraries, I completely agree.

My major issue was, if they had run this exact same test and reported in "request overhead in seconds" would be largely not valuable at all because it doesn't tell you nearly enough information to determine if there has bean a meaningful latency regression.

With throughput, its likely not as valuable in real usage, but the single stat does tell you there was a throughput regression.

So I think we agree that this isn't a meaningful regression, I just disagree that changing how you report the number would be valuable.

j / k navigate · click thread line to collapse