Knowing where performance issues with certain techniques might arise is not premature optimization. Implement with an appropriate level of care, including performance concerns. Not every kind of poor performance appears as a clear spike in a call graph, and even fewer can be fixed without changing any external API.
And I never said anything remotely close to contradict this statement.
> Knowing where performance issues with certain techniques might arise is not premature optimization.
It is:
- Python: should I use a for loop, a list comprehension or the map function?
- C++: should I use a std::list, std::vector, ...?
- Go: should I use interface{} or generics?
The difference between those options is subtle and completely unrelated to the problem you want to solve.> Implement with an appropriate level of care, including performance concerns.
Step 1: solve your problem naively, aka: make it work
Step 2: add tests, separate business logic from implementation details, aka: make it right
Step 3: profile / benchmark to see where the chokepoints are and optimize them, aka: make it fast
Chances are that if you have deeply nested loops, generics vs interface{} will be the last of your problems.To take the C++ example again, until you have implemented your algorithm, you don't know what kind of operations (and how often) you will do with your container. So you can't know whether std::list or std::vector fits best.
In Go, until you have implemented your algorithm, you don't know how often you will have to use generics / reflection, so you can't know what will be the true impact on your code.
The "I know X is almost always faster so i'll use it instead of Y" will bite you more often than you can count.
> Not every kind of poor performance appears as a clear spike in a call graph
CPU usage, memory consumption, idling/waiting times, etc... Those are the kind of metrics you care about when benchmarking your code. No one said you only look at spike in a call graph.
But still, to look for such information, you need to have at least a first implementation of your problem's solution. Doing this before is a waste of time and energy because 80% of the time, your assumptions are wrong.
> and even fewer can be fixed without changing any external API.
This is why you "make it work" and "make it right" before you "make it fast".
This way you have a clear separation between your API and your implementation details.
> until you have implemented your algorithm, you don't know what kind of operations (and how often) you will do with your container.. until you have implemented your algorithm, you don't know how often you will have to use generics / reflection, so you can't know what will be the true impact on your code.
I don't mean to brag, but I guess I'm a lot better at planning ahead than you. I don't usually have the whole program written in my head before I start, but I also can't remember any time I had to reach for a hammer as big as reflect and didn't expect to very early on, and most of the time I know what I intend to do to my data!
> This is why you "make it work" and "make it right" before you "make it fast"... This way you have a clear separation between your API and your implementation details.
This is not possible. APIs force performance constraints. Maybe wait until your API works before micro-optimizing it, but also maybe think about how many pointers you're going to have to chase and methods your users will need to implement in the first place because you probably don't get to "optimize" those later without breaking the API. You write about "the bottleneck", but there's not always a single bottleneck distinct from "the API". Sometimes there's a program that's slow because there's a singular part that takes 10 seconds and could take 1 second. But sometimes it's slow because every different bit of it is taking 2ns where it could take 1ns.
Consider the basic read-some-bytes API in Go vs. Python (translated into Go, so the difference is obvious):
type GoReader interface { Read([]byte) (int, error) }
type PyReader interface { Read(int) ([]byte, error) }
You're never going to make an API like PyReader anywhere near as fast as GoReader, no matter how much optimization you do!https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vecto...
> some of us have been around the block a few times though, and now need to make sure those spaces are delineated for others in a way that won't force them into a performance corner.
This, just like the rest of your comment, is just patronizing and condescendant.
> I don't mean to brag, but I guess I'm a lot better at planning ahead than you
See previous point...
> I also can't remember any time I had to reach for a hammer as big as reflect and didn't expect to very early on
This is not what I said at all. Let's say you know early on, before any code is written, you will need reflection. Can you tell me how many calls to the reflection API will happen before-hand? Is it `n`? `nlog(n)`? `n²`? Will you use reflection at every corner, or just on the boundaries of your task? Once implemented, could it be refactored in a simpler way? You don't know until you wrote the code.
> most of the time I know what I intend to do to my data
"what" is the spec, "how" is the code, and there is multiple answers to the "how", until you write them and benchmark them, you can't know for sure which one is the best, you can only have assumptions/hypothesis. Unless you're doing constantly exactly the same thing.
> but also maybe think about how many pointers you're going to have to chase and methods your users will need to implement in the first place because you probably don't get to "optimize" those later without breaking the API.
Basically, "write the spec before jumping into code". Which is the basis of "make it work, make it right, make it fast" because if you don't even know what problem you're solving, there is no way you can do anything relevant.
> You write about "the bottleneck", but there's not always a single bottleneck distinct from "the API".
I never implied there is a single bottleneck. But If you separate the implementation details from the High-Level API, they sure are distinct. For example, you can solve the N+1 problem in a GraphQL API without changing its schema.
If your implementation details leaks to your API, it just means it's poorly separated.
> You're never going to make an API like PyReader anywhere near as fast as GoReader, no matter how much optimization you do!
Because Python is interpreted and Go is compiled. Under the hood, the OS uses the `int read(int fd, void
dest, size_t count)`, and there is an upper limit to the `count` parameter (specific to the OS/kernel).Python's IO API knows this and allocates a buffer only once under the hood, it would be equivalent to having a PyReader implementation using a GoReader interface + preallocated []byte slice.
I can't tell you which one is faster without a benchmark because the difference is so subtle, so I won't.
If I'm using low torque, I don't need to know the yield strength of my wrench