Second, though, if we're going to slice and dice that way, which is valid, I think you need to go even farther and point out that there are two cases. The first is when you are hammering requests through as quickly as possible, and the second is when you are not.
The latency numbers are highly specific to your load, because as load increases, things like scheduling algorithms start mattering more, especially the fundamental tradeoffs between latency and throughput. Knowing the distribution of these numbers under load is important... though I'd suggest that said distribution is still fairly likely to be dominated by the user code rather than the framework code. But the hello world benchmark is still a crucial one, because it serves as the limit of performance, so if you can show that some webserver can't even do what you need with that, you can eliminate it.
There is also the "request overhead in seconds" you get for a relatively uncontested system, where the system would have to be fairly pathologically broken to see a high variance in results. (You'll get some from GC, but in this case I wouldn't call that variance high in the patterns you'll see from a hello-world handler.) This number is important because while it is in a lot of ways more boring, it is also I suspect the relevant number for the modal web server. I suspect this is another one of those cases where some very visual image leaps to mind, the web server for Google or Facebook that is constantly getting hammered at 90% of capacity (and that carefully by design since systems get increasingly pathological as you approach 100%) serving highly optimized requests where every microsecond matters... but those are actually the rare web servers in the world. Most webservers are doing at least one of twiddling their thumbs for long stretches of time or waiting for user code to do what it's going to do in the milliseconds... or seconds... or minutes....