https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vecto...
> some of us have been around the block a few times though, and now need to make sure those spaces are delineated for others in a way that won't force them into a performance corner.
This, just like the rest of your comment, is just patronizing and condescendant.
> I don't mean to brag, but I guess I'm a lot better at planning ahead than you
See previous point...
> I also can't remember any time I had to reach for a hammer as big as reflect and didn't expect to very early on
This is not what I said at all. Let's say you know early on, before any code is written, you will need reflection. Can you tell me how many calls to the reflection API will happen before-hand? Is it `n`? `nlog(n)`? `n²`? Will you use reflection at every corner, or just on the boundaries of your task? Once implemented, could it be refactored in a simpler way? You don't know until you wrote the code.
> most of the time I know what I intend to do to my data
"what" is the spec, "how" is the code, and there is multiple answers to the "how", until you write them and benchmark them, you can't know for sure which one is the best, you can only have assumptions/hypothesis. Unless you're doing constantly exactly the same thing.
> but also maybe think about how many pointers you're going to have to chase and methods your users will need to implement in the first place because you probably don't get to "optimize" those later without breaking the API.
Basically, "write the spec before jumping into code". Which is the basis of "make it work, make it right, make it fast" because if you don't even know what problem you're solving, there is no way you can do anything relevant.
> You write about "the bottleneck", but there's not always a single bottleneck distinct from "the API".
I never implied there is a single bottleneck. But If you separate the implementation details from the High-Level API, they sure are distinct. For example, you can solve the N+1 problem in a GraphQL API without changing its schema.
If your implementation details leaks to your API, it just means it's poorly separated.
> You're never going to make an API like PyReader anywhere near as fast as GoReader, no matter how much optimization you do!
Because Python is interpreted and Go is compiled. Under the hood, the OS uses the `int read(int fd, void
dest, size_t count)`, and there is an upper limit to the `count` parameter (specific to the OS/kernel).Python's IO API knows this and allocates a buffer only once under the hood, it would be equivalent to having a PyReader implementation using a GoReader interface + preallocated []byte slice.
I can't tell you which one is faster without a benchmark because the difference is so subtle, so I won't.