I thought I could beat the PQ implementation in .NET with something like this but I never even got close.
I think my use case breaks the assumptions in this paper due to the volatility of the distribution over time.
Edit: For reference, this is the approach taken by .NET - https://en.m.wikipedia.org/wiki/D-ary_heap
The calendar queues have an interesting failure mode. If you are randomly inserting elements with a priority that is less than "now", and then pop some number of elements, and doing this repeatedly, then the front of the calendar empties out. As a result, the random insertions create a single value in the front of the calendar, then there are many many days that are empty after that. So subsequent pops will always have to search a lot of days in the calendar. So, these calendar queues are only fast if you keep track of "now" and only insert events after "now".
https://gist.github.com/nbingham1/611d37fce31334a1520213ce5d...
seed 1725545662 priority_queue 0.644354 calendar_queue 0.215860 calendar_queue_vector 0.405788
seed 1725545667 priority_queue 0.572672 calendar_queue 0.196812 calendar_queue_vector 0.392303
seed 1725545672 priority_queue 0.622041 calendar_queue 0.241419 calendar_queue_vector 0.413713
seed 1725545676 priority_queue 0.590372 calendar_queue 0.204428 calendar_queue_vector 0.386992
Then, one day, my team hired this ancient soviet engineer who looked like he could have been Lenin's drinking buddy. He was not impressed that I was using std::priority_queue, and he sat down and wrote a calendar queue.
I'll be damned if that thing wasn't 7 to 9 times faster. I thought I was an engineer, but next to this guy, I was just a monkey poking at the typewriter.
It is possible to make a calendar queue which will absolutely mop the floor with any other queue, but the algorithms given in these papers is just a starting point. Going from the published algorithm to an actual performant, production-ready product is always the hardest part.
I remember writing a heap queue in C++ myself because std::priority_queue had some kind of shortcoming whose specifics I don't remember. Maybe I can find that program and check what it wanted. It wasn't a performance issue, but rather, something I needed was missing from the stdlib API and I remember thinking that it was silly that they omitted it.
I called the algorithm "appointment calendar". Threads were scheduled in a calendar, with the lower priority threads (higher value) getting appointments farther in the future. The scheduler just marched through the calendar in order, taking the appointments.
There, instead of dates, the “priority index” reflects the frequencies of pairs seen in the input string. This leads to guaranteed O(1) runtime amortized over the input string, since the largest frequency count is bounded by the size of the input and can only decrease as merges happen.
The implementor has far less freedom than your answer seems to be implying. The standard specifies, for example:
https://eel.is/c++draft/priority.queue#priqueue.cons-4
> The constructor calls make_heap(c.begin(), c.end(), comp).
https://eel.is/c++draft/priority.queue#priqueue.members-5
> emplace calls push_heap(c.begin(), c.end(), comp).
And so on. In fact, if I weren't trying to "yes and" you, I'd say there is essentially no implementation freedom. In particular, the user-programmer is allowed, at any point, to extract the protected data member `c` and verify that it is in fact heapified in the same way that `std::push_heap` would have heapified it.
That said, std::priority_queue::pop is specified to behave "as if by pop_heap followed by pop_back," and in fact the vendor can do better there, by using Floyd's "bottom-up" algorithm. LLVM's libc++ switched from the naïve implementation to Floyd's version back in early 2022, thus closing a feature request that had been open for 11 years at the time:
https://github.com/llvm/llvm-project/commit/79d08e398c17e83b...
I think the other two major vendors had already switched by then, although I'm not sure.
The implementation definitely does not have the freedom to switch from the mandated heap-based PQ to any alternative kind of PQ, including but not limited to (TAOCP §5.2.3) "leftist or balanced trees, stratified trees, binomial queues, pagodas, pairing heaps, skew heaps, Fibonacci heaps, calendar queues, relaxed heaps, fishspear, hot queues, etc."
I once wrote an STL-style implementation of Fishspear, with some analysis of its pros and cons. https://quuxplusone.github.io/blog/2021/05/23/fishspear/
- Look up an arbitrary element by its value, and then change that element's priority. This is often needed in real life, but is fundamentally incompatible with std::priority_queue's highly restricted design. There is no public API at all for dealing with "arbitrary elements" of a std::priority_queue; you interact only with the .top() element.
- Change the top element's priority, i.e. handle it and then throw it back down to be dealt with again sometime later. This operation is used in e.g. the Sieve of Eratosthenes.
I'm not sure which operation you're thinking of w.r.t. Dijkstra's algorithm; I'd wildly guess it's the first operation, not the second.
Changing the top element's priority is easy to graft onto the STL priority_queue's API. I've done it myself here: https://quuxplusone.github.io/blog/2018/04/27/pq-replace-top... The proper name of this operation is `pq.replace_top(value)`, and for the perfect-forwarding version, `pq.reemplace_top(args...)`.
Search `reemplace_top` in this Sieve of Eratosthenes code: https://godbolt.org/z/bvY4Mr1GE
Sorting integer elements that can only be represented by 64-bit ints and smaller? Radix sort for the win (*). Sorting strings which may be of any length? Well, saying you can do that in linear time is a bit disingenuous.
It is always important to recognize the properties of the problem domain when stating complexity bounds.
(*) of course, any vanilla comparison-based sort is a better first-implementation than radix sort, but we're talking about linear time algorithms here
The definition of the machine for which the O(N log N) bound is proved is very delicate: you have to allow O(1) operations on an arbitrarily large set of values but not encoding tricks allowing multiple values to be packed into one and then manipulated unrealistically cheaply using those operations. In particular, the machine must not be able to do arbitrary arithmetic.
But of course, that is cheating.