For all realizable n, ln(n) is less than 40.
So when amortizing over a large window (say, a block size of 512), it can be the case that, for any realizable n, the logarithmic amortized costs are completely dominated by the constant non-amortized costs (adding an element to a block in this case)... making the operation effectively constant-time.
where log* is the iterated log function; i.e. the number of times you have to apply log repeatedly until you reach a value of 1 or smaller. For reference, log_2*(10^10000) = 5.
Instead of that, the author mentioned that:
> The tests that I did run, initially the libstdc++ and the libc++ versions have roughly the same performance, but the performance diverges after a number of insertions somewhere between 6,400 and 12,800. After that the runtime of the libc++ version is roughly 2-3 times that of the libstdc++ version. (...) the outcomes of which depend on the compiler version anyway.
This does not seem right.
It didn't look quite right to me so I didn't post it.
Here's another benchmark that I did where you can clearly see that something has gone wrong: https://imgur.com/a/s2rA8qE
Anyone who programs low-latency C++ knows that the libstdc++ implementation is great (which is what 99.9% of people use) while others tend to be less stellar.
It's just a segmented vector. The libstdc++ implementation always allocates one segment even if empty, and while I've seen low-latency guidelines arguing for empty things not allocating, my personal guideline is to always allocate reasonable capacity on construction rather than on first insertion.
A ring buffer is a completely different data structure, it only works for fixed-capacity queues.
Maybe we have different ideas about what constitutes "low-latency" but in HFT std::deque is rarely used. Much like std::unordered_map, which allocates every insert potentially costing up to a microsecond for each insert.
>It's just a segmented vector.
https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USER... we can look at the source. This "segmented vector" is `_Tp** _M_map`, essentially a vector of vectors. This means potentially twice the pointer-chasing, as we have to do two lookups. It also means allocation/deallocation each time a new segment is added/destroyed, which leads to more allocation than when using a vector (first need to allocate the segment, then potentially allocate entries within that).
>A ring buffer is a completely different data structure, it only works for fixed-capacity queues.
Where possible it's better to use a fixed capacity queue, especially one where the capacity is a power of 2 so wrap-around calculation can be done more efficiently. But the same kind of thing can be done for a dynamic-capacity queue, by resizing the whole backing array when capacity is reached.
Anyway I have to echo his point that I've never found a situation where deque performs better than vector, even when you really might expect it to. I guess the extra allocations are too slow. Maybe it would be more useful with arenas or something.
That works until probabilistically there is a decent chance the capacity will always be zero.
So does this mean: - We're talking about different complexities (Amortized vs something else) - The libc++ implementation isn't standards conformant - The analysis in the c++ standard is incorrect. - Something else
There is never a way to guarantee that the physical amount of time is bounded by O(1) in the worst case. You could always have pathological scenarios or data structures written so that performing a move or copy of a single T could require O(n^n) time for all anyone knows.
Citation? Are you sure it isn't 64 size-independent elements?
https://github.com/microsoft/STL/blob/main/stl/inc/deque#L56...
They have made clear that this won't be changed, for ABI stability reasons.
That makes std::dequeue basically unusable in a portable context. In virtually any situation, "allocate each element separately" and "allocate elements in 4k chunks" are on opposite ends of the performance spectrum.
Small comment: Ideally, big-O notation is for upper bounds. If you are doing lower bounds, you should ideally use big-Omega notation. But Omegas are harder to format in plain text, so it may be better to abuse notation and use big-O...
The libc++ implementation is bad. The libstdc++ implementation is fine. The issue with the former is that it doesn't have enough spare capacity so it has to shift elements around too often.
Actually I think the push_front is even worse than stated: O(n). Consider an array with capacity N+1, contains N elements. Now if you alternate push_front and pop_back then every push_front will cause memmove of N elements.
Oh and to make like a 4th addition to this comment: It's kind of funny that the code is so unreadable that the author found it more helpful to look at the generated assembler code.
Maybe for a good reason I dunno. But it would be nice if the code was clearer so you could make sense of it when gdb or callgrind jumps into an inlined chunk ...
They choose names like _Capitalized and __lowercase because those identifiers are reserved for the implementation. Its a consequence of the preprocessor's lack of hygiene.
So where you might see a convention of naming members like m_thingish_foo, in the implementation's library headers they would be named _M_thingish_foo or __m_thingish_foo.
It reminds me of the hash maps that are said to be amortized O(1) but can be O(n) for some sequences of operations in various languages like Python and JS: https://blog.toit.io/hash-maps-that-dont-hate-you-1a96150b49...