This is not meant as an attack or a defense of Go. The facts are what the facts are. The point here is to suggest that people use terminology that is more informative and easier to understand. There are people for whom 20us per request extra is a sufficiently nasty issue that they will not upgrade. There are also a lot of people who are literally multiple orders of magnitude away from that even remotely mattering because their requests tend to take 120ms anyhow. Using "seconds per request overhead" both makes it easier to understand both the real performance impact with real times, and makes it easier to understand that we're just talking about the base overhead per request rather than the speed of the entire request.
It might also discourage some of our, ah, more junior developers from being too focused on this metric. Why would I want to use a webserver that can only do 100,000 requests per second when I can use this one over here that can do 1,000,000 requests per second? If you look at it from the point of view that we're speaking about the difference between 10 microseconds and 1 microsecond, it becomes easier to see that if my requests are going to take 10 milliseconds on average, this is not a relevant stat to be worried about when choosing my webserver, and I should examine just the other differences instead, which may be a great deal more relevant to my use cases.
Edit: Literally while I was typing this up I see at least three comments already complaining about this regression. My question to you, my honest question to you (because some of you may well be able to answer "yes", especially with some of the tasks Go gets used for), is: Are you really going to have a problem with this? Does the rest of your request really run in microseconds? It's actually pretty challenging in the web world to run in microseconds. It can be done, but a lot of the basic things you want to do end up like "hit a database" generally end up involving milliseconds, i.e., "thousands of microseconds".
E.g. not implementing read/write timeouts allows to omit lots extra code (timer management, synchronization, cancellations), which improves performance. But it might bring a whole system to stop if there a few non responsive clients. Or not implementing flowcontrol through the whole chain and simply buffering at each stage can give a huge boost on the throughput metric. But sooner or later the system might go out-of-memory.
I personally now see reliability the number 1 thing you should achieve in a protocol implementation. Performance is of course also important, but should only be compared if all other parts are also comparable.
I think you read a lot into the parent posts that wasn't there.
Let me restate what I believe to be the parents meaning:
Many junior developers care to much about the "How quick is my normal execution path" form of performance. This is a bad measure for actual performance because the rare, error-related executions can have cascading effects effectively blocking the entire network.
Allowing applications to wait indefinitely for a response. Even if asyncronous, is something like a 'thread' leak where you start accumulating dead threads eventually leading to slowdown. This would be one example.
Another would be weird broadcast storms that happen when a component fails.
Basically, consider cascading effects of errors when optimizing performance.
Projects where 'performance' is taken to be "how quick is my usual case execution path".
The following is tough love:
>I have worked in the network programing domain for the last few years and I also found that especially outsiders and newbies get too obsessed on pure performance figures.
no need for the introduction, your attitude shows it all. It's why we all wait for 35 seconds while we watch a timer animation instead of getting a response instantly (200 milliseconds) and one time out of ten thousand having to resubmit a page and you having to deal with it. But by all means, 10000 * 35 seconds is only 97 hours. I'm happy to wait 97 hours if it means I won't have a 1/10,000 chance of having to click Submit a second time - wouldn't you? Or even a one in fifty chance? I mean wouldn't you rather wait for 35 seconds, versus either getting an instant response (98% chance) or a 98% chance of an instant response the second time you try and a 98% chance of a response the third time you try? No brainer. Who wouldn't love to wait, wait, wait, wait. It's my favorite part of using a computer! Waiting! I can anticipate how great it will be when stuff works. It reminds me of downloading over a 14.4 KBps modem (which due to the lack of web apps at the time was actually much faster in many cases, but thankfully you've fixed that.) On your end you won't have to code up what happens when I do resubmit or not get your response, which takes logic and math or a hand-coded edge case, that civilization probably will never discover and could not possibly code. I mean how can a database possibly be set right if it ever gets a transaction twice or fails to get a transaction the user really did request. It doesn't make any sense! Would you ever tell a friend the same thing twice? Or would you just tell them once, and even if it takes them 3 weeks to get your invitation for Friday, at least you won't accidentally send it twice, embarrassing yourself and your friend, or, worse, having them show up twice. The real world shows that the tradeoffs you network engineers make every day to give me 35 second web page experiences are the correct trade-offs. After all, it's my time, not yours.
/s
You people make the worst trade-offs ever. Your decisions suck. Your work sucks. The web sucks, because of you.
Change everything radically. Figure it out. Don't boast about newbies/outsiders not understanding - you don't understand the correct trade-offs.
Plus the two general's theorem[1] shows that you can never write correct code on the theoretical level, so that other than every single thing you do being practically broken, it's theoretically broken too. Everything you guys do is broken and sucks, theoretically as well as practically. wake up already.
[1] https://en.wikipedia.org/wiki/Two_Generals'_Problem
----
Note: I took a very aggressive tone to counteract the complacency I quoted. My goal is to have parent poster rethink their whole life (in the network programming domain.) Please don't flag/downvote it if you want a better web tomorrow than we have today, because the parent and others like them is the one responsible for this. Only they can wake up and start making the correct trade-offs. It gets so bad that I manually open a new tab, slowly type in google, slowly re-authenticate, and go through the same action a second time, then close the (still loading) first tab, just because people like this person have made trade-offs that are so bad I have to work around it myself. Their decisions are wrong.
Reliability, the way network engineers have been moving toward coding for it for the past decade, is a false God. The approach is not correct. It must change if you want a better web tomorrow (or at least reply to it) or you are complacent in the thinking which the parent comment very explicitly shows. I have edited this comment considerably to be really clear, and gave multiple examples. As you can see I have 2546 karma and have been using HN for 1386 days. I stand by criticism.
The requests per second statistic is measuring throughput, and the results from such a test can be easily represented as a single value. The seconds per request statistic is a measure of latency. Latency can't be represented with a single value in a meaningful way. It is a curve of values, so you'd need to know what percentage of requests fell under a threshold.
Where those thresholds are is extremely use case specific. Some people only care about 95% of requests, others have to care about much higher levels of resolution.
So if anyone gave me a single data point about their system latency, I'd be skeptical they knew what they were talking about. Even in this case we don't know if the latencies changed across the board, only on a few outliers, or on just the middle of the latency curve.
That said, I agree that this is a bit of a tempest in a teapot . In real world usage, if this regression really matters to you, you've probably already moved off of the standard library for a variety of other reasons.
Second, though, if we're going to slice and dice that way, which is valid, I think you need to go even farther and point out that there are two cases. The first is when you are hammering requests through as quickly as possible, and the second is when you are not.
The latency numbers are highly specific to your load, because as load increases, things like scheduling algorithms start mattering more, especially the fundamental tradeoffs between latency and throughput. Knowing the distribution of these numbers under load is important... though I'd suggest that said distribution is still fairly likely to be dominated by the user code rather than the framework code. But the hello world benchmark is still a crucial one, because it serves as the limit of performance, so if you can show that some webserver can't even do what you need with that, you can eliminate it.
There is also the "request overhead in seconds" you get for a relatively uncontested system, where the system would have to be fairly pathologically broken to see a high variance in results. (You'll get some from GC, but in this case I wouldn't call that variance high in the patterns you'll see from a hello-world handler.) This number is important because while it is in a lot of ways more boring, it is also I suspect the relevant number for the modal web server. I suspect this is another one of those cases where some very visual image leaps to mind, the web server for Google or Facebook that is constantly getting hammered at 90% of capacity (and that carefully by design since systems get increasingly pathological as you approach 100%) serving highly optimized requests where every microsecond matters... but those are actually the rare web servers in the world. Most webservers are doing at least one of twiddling their thumbs for long stretches of time or waiting for user code to do what it's going to do in the milliseconds... or seconds... or minutes....
What good does a 100microsecond average latency (calculated as inverse of the throughput) do for you when simply loading a website issues 200 requests and your 99tile is closer 500ms for whatever reason? Suddenly your per-load average looks a lot different than your per-request average.
Pure throughput is what you want for batch processing without those pesky, impatient humans in the loop.
I'll probably still be upgrading. go1.8 has some nice performance improvements overall. Specifically the codegen improvements help in HTML parsing and image resizing.
If I'm still upgrading, I have to wonder how many people out there are pushing Go's net/http harder than I am?
I think what I'm more offended by is how releases are being handled.
There was a regression and it's bothering people, there really is no getting around that. I think having a comment like yours in that ticket thread will really help calm the waters. However, personally, I feel that there should be a point release to revert the regression instead of waiting for a major release.
The reason people are bothered though is based on a synthetic benchmark that blows this out of proportion. Someone's already pointed that out and things have calmed down.
> However, personally, I feel that there should be a point release to revert the regression instead of waiting for a major release.
And if there was a significant regression I'm sure that would happen. However, Go has set forward the way they do releases: https://github.com/golang/go/wiki/Go-Release-Cycle
More specifically:
> A minor release is issued to address one or more critical problem for which there is no workaround (typically related to stability or security). The only code changes included in the release are the fixes for the specific critical problems. Important documentation-only changes may also be included as well, but nothing more.
If this regression can be properly quantified to fall in those categories, then a point release will be issued to fix it. But an at-worst half a micro-second overhead on a synthetic hello-world benchmark really doesn't fall into either of those categories.
From that issue thread:
> So from @OneOfOne's test, go tip made 5110713 requests in 30 seconds, that's 5.87us per request. Go 1.7.5 did 5631803 requests in 30 seconds, 5.33us per request. So when you compare those to eachother, that's like an 11% performance decrease. But if you look at it from an absolute perspective, that's a performance hit of just a half microsecond per request. I can't even imagine an HTTP service where this would be relevant.
There are many people in the Go community that do canary deployments of their services on new Go versions throughout the whole cycle. If anything major really was related to this I'm fairly certain it would've been surfaced already.
All that aside, this kind of benchmarking should have been done during the beta phase. It's even explicitly asked of the community to do so. No changes related to this were merged during the RC-cycle either.
I can't find a single compelling reason why they should break the normal release cycle over this regression.
Honestly it blew me away that we hit that number, but now that we have, 20us could, though generally will not, effect our overall numbers (there are other components in the system that are not this fast).
While I agree with you that this is generally not an issue, in some circumstances, it will be noticeable.
IMO I'd vote for stability over raw performance numbers every single time since the raw performance numbers depend on "best circumstances" and stability accounts for "worst circumstances". You'll see the latter in reality a lot while the former doesn't exist outside of benchmarks.
There's a similar issue with engine efficiency. Here in the U.S. we tend to measure engines in miles per gallon; the problem is that this isn't (typically) what we care about: we care about cost to drive a distance, not distance per dollar. I understand that in Europe engines are measured in kilometres per liter, which makes more sense. If we measured efficiency here in fluid ounces per mile, we'd see that: a 10 mpg car uses 12.8 ounces per mile; a 12 mpg car uses 10.7 ounces per mile; a 24 mpg car uses 5.33 ounces per mile; and a 36 mpg car uses 3.56 ounces per mile.
What about alternative facts?
I'd take 20 us performance degradation in one specific slice of code over a 50% performance increase overall any day.
edit: english in the previous sentence is bad. You know what I mean. Small regression is fine if the overall speed is much better.
Sorry, what ? It's not like the http server of the stdlib is here only for doing hello world code samples... You would imagine those benchmark to be part of some CI process along with unit tests.
Not ideal but it's better than pure X vs Y.
Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix.
Why could the people making an issue about the 0.5 us slow down per request not have tested or ran a benchmark sooner?
avg. std dev max
Latency 195.30us 470.12us 16.30ms -- go tip
Latency 192.49us 451.74us 15.14ms -- go 1.8rc3
Latency 210.16us 528.53us 14.78ms -- go 1.7.5
That is a seriously fat distribution. Has anyone ever benched for percentiles?So basically this is a communication issue with a community that does not understand what to make of its own benchmarks.
What I believe is more serious is that this wasn't catch during the development, it could definitely be a worth trade off however we should be aware of it...
That probably shouldn't be the response for a major performance regression in a release candidate.
Looks like I'm sticking to Go 1.7 for however long it'll take before 1.9 is released.
Yes, too many people do stupid "hello world" tests indeed.
Maybe this is a problem with running "hello world" tests and not that much of a real-world problem. Let's see.
If not, could someone please educate me?
https://github.com/golang/go/issues/18964#issuecomment-27830...
I remember reading about the release cycle here: https://github.com/golang/go/wiki/Go-Release-Cycle
Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix. One of the criteria for issuing a release candidate is that Google be using that version of the code for new production builds by default: if we at Google are not willing to run it for production use, we shouldn't be asking others to.
A 20% performance regression in a minimal http server (i.e. one that doesn't have any business logic) does not sound like a big problem to me; that kind of overhead would normally be dwarfed by database calls, and a 20% increase in the overhead doesn't sound like it's a large increase in what I'd expect to already be a very small number.
So a similar situation in node.js-land would be if require('http') would get a worst-case scenario of a 20% performance hit, right?
If this is the case, even I, who only run single instances of node, would think it would be a fairly big impact that i'd try to fix if I was the maintainer and still had the possibility to fix it.
You're absolutely right about asking if it can be fixed now rather than later (I was very surprised they wanted to wait till 1.9!), and thanks for asking that on there. 1.8 would be known for this bug in case of static site hosting since there are more req/s for that use case, if this did make the official release.
It should be noted that it was tested against a hello world benchmark and it won't matter in higher payload cases when the limiting factor isn't the extra routine but the payload itself by a long shot.
I have never used GO so it might be a bit rude of me to ask. But when I looked at the commit, it seemed to be a fairly small subset of changes which I, maybe stupidly, assumed meant that it would be quick to fix. :)