Calendar Queues: A Fast O(1) Priority Queue Implementation (1988) (opens in new tab)

(dl.acm.org)

63 pointstithe1y ago31 comments

31 comments

24 comments · 10 top-level

Ono-Sendai1y ago· 5 in thread

I think I reinvented and started implementing something like this, but then just ended up using std::priority_queue (the C++ standard library priority queue) which is pretty fast.

timClicks1y ago

Strictly speaking, you ended up using your compiler's priority queue. The standard defines the interface and invariants, but the implementer has discretion about the implementation. There are also several knobs you can turn to tweak the performance characteristics. std::priority_queue is a container adapter that can be applied to many of the standard containers. https://en.cppreference.com/w/cpp/container/priority_queue

quuxplusone1y ago

[Nit: Not "your compiler's" but "your library's." The C++ Standard Library is generally provided by the compiler vendor, but it's not built into the compiler, except for tiny pieces like `std::bad_alloc` and `std::strong_ordering`.]

The implementor has far less freedom than your answer seems to be implying. The standard specifies, for example:

https://eel.is/c++draft/priority.queue#priqueue.cons-4

> The constructor calls make_heap(c.begin(), c.end(), comp).

https://eel.is/c++draft/priority.queue#priqueue.members-5

> emplace calls push_heap(c.begin(), c.end(), comp).

And so on. In fact, if I weren't trying to "yes and" you, I'd say there is essentially no implementation freedom. In particular, the user-programmer is allowed, at any point, to extract the protected data member `c` and verify that it is in fact heapified in the same way that `std::push_heap` would have heapified it.

That said, std::priority_queue::pop is specified to behave "as if by pop_heap followed by pop_back," and in fact the vendor can do better there, by using Floyd's "bottom-up" algorithm. LLVM's libc++ switched from the naïve implementation to Floyd's version back in early 2022, thus closing a feature request that had been open for 11 years at the time:

https://github.com/llvm/llvm-project/commit/79d08e398c17e83b...

I think the other two major vendors had already switched by then, although I'm not sure.

The implementation definitely does not have the freedom to switch from the mandated heap-based PQ to any alternative kind of PQ, including but not limited to (TAOCP §5.2.3) "leftist or balanced trees, stratified trees, binomial queues, pagodas, pairing heaps, skew heaps, Fibonacci heaps, calendar queues, relaxed heaps, fishspear, hot queues, etc."

I once wrote an STL-style implementation of Fishspear, with some analysis of its pros and cons. https://quuxplusone.github.io/blog/2021/05/23/fishspear/

2 more replies

Sesse__1y ago

std::priority_queue is sorely missing the operation “change the priority of this element” (you need to do it using a delete and then a new insert, which is rather slow), which comes up all the time in e.g. Dijkstra's algorithm.

quuxplusone1y ago

Yes, and, there are two related but different operations there:

- Look up an arbitrary element by its value, and then change that element's priority. This is often needed in real life, but is fundamentally incompatible with std::priority_queue's highly restricted design. There is no public API at all for dealing with "arbitrary elements" of a std::priority_queue; you interact only with the .top() element.

- Change the top element's priority, i.e. handle it and then throw it back down to be dealt with again sometime later. This operation is used in e.g. the Sieve of Eratosthenes.

I'm not sure which operation you're thinking of w.r.t. Dijkstra's algorithm; I'd wildly guess it's the first operation, not the second.

Changing the top element's priority is easy to graft onto the STL priority_queue's API. I've done it myself here: https://quuxplusone.github.io/blog/2018/04/27/pq-replace-top... The proper name of this operation is `pq.replace_top(value)`, and for the perfect-forwarding version, `pq.reemplace_top(args...)`.

Search `reemplace_top` in this Sieve of Eratosthenes code: https://godbolt.org/z/bvY4Mr1GE

1 more reply

alexhutcheson1y ago

Boost.Heap has this functionality. Or if you want to stick with the standard library it’s fairly easy to use the *_heap functions from <algorithm> and just hand-code your own fix_heap(first, last, changed) function. Agree it would be more convenient to have it built-in, though.

amelius1y ago· 5 in thread

If this is really O(1), then that makes sorting O(N).

mananaysiempre1y ago

It does, radix sort is in fact O(N) if size of the universe of values counts as a constant. It’s just slow in practice.

The definition of the machine for which the O(N log N) bound is proved is very delicate: you have to allow O(1) operations on an arbitrarily large set of values but not encoding tricks allowing multiple values to be packed into one and then manipulated unrealistically cheaply using those operations. In particular, the machine must not be able to do arbitrary arithmetic.

Sesse__1y ago

Or stated equivalently: The only operations allowed on elements are binary comparisons and two-element swaps, but both are O(1).

1 more reply

nwellnhof1y ago

A better comparison is bucket sort which is O(N) with uniformly distributed keys.

amelius1y ago

> if size of the universe of values counts as a constant

But of course, that is cheating.

1 more reply

tonyg1y ago

It looks like it's rather sensitive to the distributions of inputs. The claim in the abstract is that it's O(1) "for the priority increment distributions recently considered by Jones in his review article." The conclusion gives a bit more detail.

rhelz1y ago· 2 in thread

I've had very similar experience as other commenters have stated W.R.T. calendar queues vs just good-old-fashioned std::priority_queue.

Then, one day, my team hired this ancient soviet engineer who looked like he could have been Lenin's drinking buddy. He was not impressed that I was using std::priority_queue, and he sat down and wrote a calendar queue.

I'll be damned if that thing wasn't 7 to 9 times faster. I thought I was an engineer, but next to this guy, I was just a monkey poking at the typewriter.

It is possible to make a calendar queue which will absolutely mop the floor with any other queue, but the algorithms given in these papers is just a starting point. Going from the published algorithm to an actual performant, production-ready product is always the hardest part.

throwaway815231y ago

As mentioned, something like it already exists inside Linux. Maybe it could be pulled out and turned into an app library, if it's so much better than a heap queue. Info: https://duckduckgo.com/?q=timer+wheel+linux

I remember writing a heap queue in C++ myself because std::priority_queue had some kind of shortcoming whose specifics I don't remember. Maybe I can find that program and check what it wanted. It wasn't a performance issue, but rather, something I needed was missing from the stdlib API and I remember thinking that it was silly that they omitted it.

Ono-Sendai1y ago

What you are thinking of is probably that you can't erase elements (apart from the top element) from the priority queue.

1 more reply

bob10291y ago· 1 in thread

I spent a solid few days chasing this exact damn rabbit.

I thought I could beat the PQ implementation in .NET with something like this but I never even got close.

I think my use case breaks the assumptions in this paper due to the volatility of the distribution over time.

Edit: For reference, this is the approach taken by .NET - https://en.m.wikipedia.org/wiki/D-ary_heap

neonsunset1y ago

If you would like to contribute, there might be a better optimization opportunity in the current bounded Channel<T> implementation: https://github.com/dotnet/runtime/discussions/104791#discuss...

packetlost1y ago· 1 in thread

Everything can be O(1) if you put bounds on every operation.

kevindamm1y ago

The difference of note is when the problem statement has an implicit constraint that makes the bounds reasonable vs. when the bounds are arbitrary and artificially constrain the problems that can be solved.

Sorting integer elements that can only be represented by 64-bit ints and smaller? Radix sort for the win (*). Sorting strings which may be of any length? Well, saying you can do that in linear time is a bit disingenuous.

It is always important to recognize the properties of the problem domain when stating complexity bounds.

(*) of course, any vanilla comparison-based sort is a better first-implementation than radix sort, but we're talking about linear time algorithms here

nbingham1y ago

I tested two C++ implementations of Calendar Queues again the standard library priority_queue. The first implementation uses a deque as a backing container and then fills the calendar with linked lists. The second implementation just uses vectors in the calendar with no backing container.

The calendar queues have an interesting failure mode. If you are randomly inserting elements with a priority that is less than "now", and then pop some number of elements, and doing this repeatedly, then the front of the calendar empties out. As a result, the random insertions create a single value in the front of the calendar, then there are many many days that are empty after that. So subsequent pops will always have to search a lot of days in the calendar. So, these calendar queues are only fast if you keep track of "now" and only insert events after "now".

https://gist.github.com/nbingham1/611d37fce31334a1520213ce5d...

seed 1725545662 priority_queue 0.644354 calendar_queue 0.215860 calendar_queue_vector 0.405788

seed 1725545667 priority_queue 0.572672 calendar_queue 0.196812 calendar_queue_vector 0.392303

seed 1725545672 priority_queue 0.622041 calendar_queue 0.241419 calendar_queue_vector 0.413713

seed 1725545676 priority_queue 0.590372 calendar_queue 0.204428 calendar_queue_vector 0.386992

throwaway815231y ago

It would be nice to note in the title that this is a pdf. The algorithm is something like the timer wheels in the Linux kernel. Related to radix sorting more or less. Basically there are a bunch of buckets containing sorted lists of events. I didn't read too carefully since most people use a heap for this, which is O(log n) but likely has better constants.

kazinator1y ago

I independently invented something similar around 1993 inside the scheduler of a threading implementation. I wanted to have a priority scheme whereby the ratios of priority values determined the amount of CPU quanta given to the thread. E.g. a priority 5 thread would twice the CPU time compared to a priority 10.

I called the algorithm "appointment calendar". Threads were scheduled in a calendar, with the lower priority threads (higher value) getting appointments farther in the future. The scheduler just marched through the calendar in order, taking the appointments.

kevinventullo1y ago

I believe the Re-Pair algorithm used for doing linear time byte-pair encoding makes use of a similar idea: https://en.m.wikipedia.org/wiki/Re-Pair

There, instead of dates, the “priority index” reflects the frequencies of pairs seen in the input string. This leads to guaranteed O(1) runtime amortized over the input string, since the largest frequency count is bounded by the size of the input and can only decrease as merges happen.

tonyg1y ago

Year in title is wrong: the paper is from 1988, not 1998.

1 more reply

j / k navigate · click thread line to collapse

31 comments

24 comments · 10 top-level

Ono-Sendai1y ago· 5 in thread

I think I reinvented and started implementing something like this, but then just ended up using std::priority_queue (the C++ standard library priority queue) which is pretty fast.

timClicks1y ago

quuxplusone1y ago

The implementor has far less freedom than your answer seems to be implying. The standard specifies, for example:

https://eel.is/c++draft/priority.queue#priqueue.cons-4

> The constructor calls make_heap(c.begin(), c.end(), comp).

https://eel.is/c++draft/priority.queue#priqueue.members-5

> emplace calls push_heap(c.begin(), c.end(), comp).

https://github.com/llvm/llvm-project/commit/79d08e398c17e83b...

I think the other two major vendors had already switched by then, although I'm not sure.

I once wrote an STL-style implementation of Fishspear, with some analysis of its pros and cons. https://quuxplusone.github.io/blog/2021/05/23/fishspear/

2 more replies

Sesse__1y ago

quuxplusone1y ago

Yes, and, there are two related but different operations there:

- Change the top element's priority, i.e. handle it and then throw it back down to be dealt with again sometime later. This operation is used in e.g. the Sieve of Eratosthenes.

I'm not sure which operation you're thinking of w.r.t. Dijkstra's algorithm; I'd wildly guess it's the first operation, not the second.

Search `reemplace_top` in this Sieve of Eratosthenes code: https://godbolt.org/z/bvY4Mr1GE

1 more reply

alexhutcheson1y ago

amelius1y ago· 5 in thread

If this is really O(1), then that makes sorting O(N).

mananaysiempre1y ago

It does, radix sort is in fact O(N) if size of the universe of values counts as a constant. It’s just slow in practice.

Sesse__1y ago

Or stated equivalently: The only operations allowed on elements are binary comparisons and two-element swaps, but both are O(1).

1 more reply

nwellnhof1y ago

A better comparison is bucket sort which is O(N) with uniformly distributed keys.

amelius1y ago

> if size of the universe of values counts as a constant

But of course, that is cheating.

1 more reply

tonyg1y ago

rhelz1y ago· 2 in thread

I've had very similar experience as other commenters have stated W.R.T. calendar queues vs just good-old-fashioned std::priority_queue.

I'll be damned if that thing wasn't 7 to 9 times faster. I thought I was an engineer, but next to this guy, I was just a monkey poking at the typewriter.

throwaway815231y ago

Ono-Sendai1y ago

What you are thinking of is probably that you can't erase elements (apart from the top element) from the priority queue.

1 more reply

bob10291y ago· 1 in thread

I spent a solid few days chasing this exact damn rabbit.

I thought I could beat the PQ implementation in .NET with something like this but I never even got close.

I think my use case breaks the assumptions in this paper due to the volatility of the distribution over time.

Edit: For reference, this is the approach taken by .NET - https://en.m.wikipedia.org/wiki/D-ary_heap

neonsunset1y ago

If you would like to contribute, there might be a better optimization opportunity in the current bounded Channel<T> implementation: https://github.com/dotnet/runtime/discussions/104791#discuss...

packetlost1y ago· 1 in thread

Everything can be O(1) if you put bounds on every operation.

kevindamm1y ago

It is always important to recognize the properties of the problem domain when stating complexity bounds.

(*) of course, any vanilla comparison-based sort is a better first-implementation than radix sort, but we're talking about linear time algorithms here

nbingham1y ago

https://gist.github.com/nbingham1/611d37fce31334a1520213ce5d...

seed 1725545662 priority_queue 0.644354 calendar_queue 0.215860 calendar_queue_vector 0.405788

seed 1725545667 priority_queue 0.572672 calendar_queue 0.196812 calendar_queue_vector 0.392303

seed 1725545672 priority_queue 0.622041 calendar_queue 0.241419 calendar_queue_vector 0.413713

seed 1725545676 priority_queue 0.590372 calendar_queue 0.204428 calendar_queue_vector 0.386992

throwaway815231y ago

kazinator1y ago

kevinventullo1y ago

I believe the Re-Pair algorithm used for doing linear time byte-pair encoding makes use of a similar idea: https://en.m.wikipedia.org/wiki/Re-Pair

tonyg1y ago

Year in title is wrong: the paper is from 1988, not 1998.

1 more reply

j / k navigate · click thread line to collapse