Pattern-defeating quicksort (opens in new tab)

(github.com)

280 pointspettou9y ago72 comments

72 comments

54 comments · 14 top-level

dpcx9y ago· 13 in thread

Question as a non low-level developer, and please forgive my ignorance:

How is it that we're essentially 50 years in to writing sorting algorithms, and we still find improvements? Shouldn't sorting items be a "solved" problem by now?

_hrfd9y ago

Basically all comparison-based sort algorithms we use today stem from two basic algorithms: mergesort (stable sort, from 1945) and quicksort (unstable sort, from 1959).

Mergesort was improved by Tim Peters in 2002 and that became timsort. He invented a way to take advantage of pre-sorted intervals in arrays to speed up sorting. It's basically an additional layer over mergesort with a few other low-level tricks to minimize the amount memcpying.

Quicksort was improved by David Musser in 1997 when he developed introsort. He set a strict worst-case bound of O(n log n) on the algorithm, as well as improved the pivot selection strategy. And people are inventing new ways of pivot selection all the time. E.g. Andrei Alexandrescu has published a new method in 2017[1].

In 2016 Edelkamp and Weiß found a way to eliminate branch mispredictions during the partitioning phase in quicksort/introsort. This is a vast improvement. The same year Orson Peters adopted this technique and developed pattern-defeating quicksort. He also figured out multiple ways to take advantage of partially sorted arrays.

Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction) and then we figure out new tricks to take advantage of modern hardware. And finally, multicore became a thing fairly recently so there's a push to explore sorting in yet another direction...

[1] http://erdani.com/research/sea2017.pdf

xenadu028y ago

It's always good to remember that while Big-O is useful, it isn't the be-all end-all. The canonical example on modern hardware is a linked list. In theory it has many great properties. In reality chasing pointers can be death due to cache misses.

Often a linear search of a "dumb" array can be the fastest way to accomplish something because it is very amenable to pre-fetching (it is obvious to the pre-fetcher what address will be needed next). Even a large array may fit entirely in L2 or L3. For small data structures arrays are almost always a win; in some cases even hashing is slower than a brute-force search of an array!

A good middle ground can be a binary tree with a bit less than an L1's worth of entries in an array stored at each node. The binary tree lets you skip around the array quickly while the CPU can zip through the elements at each node.

It is more important than ever to test your assumptions. Once you've done the Big-O analysis to eliminate exponential algorithms and other basic optimizations you need to analyze the actual on-chip performance, including cache behavior and branch prediction.

1 more reply

lorenzhs8y ago

Don't count out sample-sort just yet, it lends itself to parallelisation very well and is blazingly fast. See https://arxiv.org/abs/1705.02257 (to be presented at ESA in September) for an in-place parallel implementation that overcomes the most important downsides of previous sample-sort implementations (linear additional space).

beagle38y ago

There's actually 3:

Quick sort (unstable, n^2 worst case, in place, heapsort (unstable, n log n worst case, in place) and merge sort (stable, n log n worst case, not in place)

There are variants of each that trade one thing for another (in placeness for stability, constants for worst case), but these are the three efficient comparison sort archetypes.

Of these, quicksort and heap sort can do top-k which is often useful; and heapsort alone can do streaming top-k.

xoroshiro8y ago

Interesting history!

>Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction)

This makes me wonder what other hardware tricks might be used for other popular algorithms such as ones used in graphs. I'm sure shortest path is also one of those algorithms that have been "solved" in theory but have a huge amount of research, but personally, what would be more interesting to hear about is something that isn't quite as easy. Something like linear programming with integer constraints or even something like vehicle routing or scheduling. To anyone studying those areas, is there anything you find particularly interesting?

1 more reply

agumonkey9y ago

Thanks for the alexandrescu paper

yoran8y ago

Thanks for this answer!

DiThi9y ago

One of the problems is that hardware changes. Long time ago memory was very limited and there was virtually no cost with branching. Now we have very complex pipelined architecture with branch prediction, many levels of cache, microcode, etc. And memory is plenty.

contravariant9y ago

What makes things tricky is that there are a couple of common cases that can be sorted in O(n), and that more complicated algorithms might have better asmyptotic behaviour, while being worse for small or even moderately large lists.

To make matters worse there are also more specific sorting algorithms like radix sort, which can be even faster in cases where they can be used.

wiz21c8y ago

There are as many sorting algorithm as their are data distribution. So some algorithm are better suited to some problems. Therefore, each requires specific research.

Moreover, if you study algorithm and if you try to understand/formalize their behaviour, especially theyr memory/speed tradeoffs, then you'll see that they're actually quite complex. See : https://math.stackexchange.com/questions/1313540/a-lower-bou...

Finally, implementing a sort alogirth requires a hell of carefulness. They are super tricky. See for example : http://cs.fit.edu/~pkc/classes/writing/samples/bentley93engi...

lz4008y ago

I don't know about improvements but new methods are still being found, like sleep sort

https://www.reddit.com/r/ProgrammerHumor/comments/5vpdw5/sle...

paulddraper9y ago

This is an improvment in practical cases, not theoretical ones.

jorgemf9y ago

As far as I understood, the base algorithm is the same which has an average case of n log(n). The new algorithms only try to improve the pivot selection to avoid the worst cases and try to be better than the average case for most practical cases. But at the end there are not new algorithms improving the limit of n log(n).

_hrfd9y ago· 8 in thread

I think it's fair to say that pdqsort (pattern-defeating quicksort) is overall the best unstable sort and timsort is overall the best stable sort in 2017, at least if you're implementing one for a standard library.

The standard sort algorithm in Rust is timsort[1] (slice::sort), but soon we'll have pdqsort as well[2] (slice::sort_unstable), which shows great benchmark numbers.[3] Actually, I should mention that both implementations are not 100% equivalent to what is typically considered as timsort and pdqsort, but they're pretty close.

It is notable that Rust is the first programming language to adopt pdqsort, and I believe its adoption will only grow in the future.

Here's a fun fact: Typical quicksorts (and introsorts) in standard libraries spend most of the time doing literally nothing - just waiting for the next instruction because of failed branch prediction! If you manage to eliminate branch misprediction, you can easily make sorting twice as fast! At least that is the case if you're sorting items by an integer key, or a tuple of integers, or something primitive like that (i.e. when comparison is rather cheap).

Pdqsort efficiently eliminates branch mispredictions and brings some other improvements over introsort as well - for example, the complexity becomes O(nk) if the input array is of length n and consists of only k different values. Of course, worst-case complexity is always O(n log n).

Finally, last week I implemented parallel sorts for Rayon (Rust's data parallelism library) based on timsort and pdqsort[4].

Check out the links for more information and benchmarks. And before you start criticizing the benchmarks, please keep in mind that they're rather simplistic, so please take them with a grain of salt.

I'd be happy to elaborate further and answer any questions. :)

[1] https://github.com/rust-lang/rust/pull/38192

[2] https://github.com/rust-lang/rust/issues/40585

[3] https://github.com/rust-lang/rust/pull/40601

[4] https://github.com/nikomatsakis/rayon/pull/379

lorenzhs8y ago

To illustrate the point about branch misprediction, I implemented quicksort with intentionally skewed pivot selection. You can see the results here: https://github.com/lorenzhs/quicksort-pivot-imbalance

For this demo I'm sorting a permutation of the range 1 to n, so obtaining perfect or arbitrarily skewed pivots is free, which isn't realistic, but that's not the point. My experiments show that quicksort is fastest on my machine when 15% of the elements are on one side and 85% on the other. This is a tradeoff between branch prediction (if it always guesses "right side", it's right 85% of the time) and recursion depth (which grows as the pivot becomes more and more imbalanced).

Of course, skewing the pivot is not what you want to do. You should rather look at sorting algorithms which don't suffer from branch mispredictions (as much), such as pdqsort, super-scalar sample sort [1,2] or block quicksort [3]

[1] excellent paper on a parallel in-place adaptation by some of my colleagues: https://arxiv.org/abs/1705.02257 - code will become available soon

[2] my less sophisticated implementation: https://github.com/lorenzhs/ssssort/, I also have a version using pdqsort as base case sorter, which is faster: https://github.com/lorenzhs/ssssort/blob/pdq/speed.pdf (in that plot, ssssort = super scalar sample sort with pdqsort as base case)

[3] https://arxiv.org/abs/1604.06697, implementation at https://github.com/weissan/BlockQuicksort - pdqsort uses the same technique

sgift8y ago

Thanks for making it clear if a sort is stable or unstable in the function name. One of my longer debug sessions was tracing a porting error from Java to C# where the default sort functions are stable vs unstable.

CJefferson8y ago

Super picky reply, the gap programming language (www.gap-system.org) used pdqsort before Rust :) (I know, as I implemented it).

Retric9y ago

Comparing sorting algo's often says more about your benchmark than the algo's themselves. Random and pathological are obvious, but often your dealing with something in between. Radix vs n log n is another issue.

So, what where your benchmarks like?

_hrfd9y ago

That is true - the benchmarks mostly focus on random cases, although there are a few benchmarks with "mostly sorted" arrays (sorted arrays with sqrt(n) random swaps).

If the input array consists of several concatenated ascending or descending sequences, then timsort is the best. After all, timsort was specifically designed to take advantage of that particular case. Pdqsort performs respectably, too, and if you have more than a dozen of these sequences or if the sequences are interspersed, then it starts winning over timsort.

Anyways, both pdqsort and timsort perform well when the input is not quite random. In particular, pdqsort blows introsort (e.g. typical C++ std::sort implementations) out of the water when the input is not random[1]. It's pretty much a strict improvement over introsort. Likewise, timsort (at least the variant implemented in Rust's standard library) is pretty much a strict improvement over merge sort (e.g. typical C++ std::stable_sort implementations).

Regarding radix sort, pdqsort can't quite match its performance (it's O(n log n) after all), but can perform fairly respectably. E.g. ska_sort[2] (a famous radix sort implementation) and Rust's pdqsort perform equally well on my machine when sorting 10 million random 64-bit integers. However, on larger arrays radix sort starts winning easily, which shouldn't be surprising.

I'm aware that benchmarks are tricky to get right, can be biased, and are always controversial. If you have any further questions, feel free to ask.

[1]: https://github.com/orlp/pdqsort

[2]: https://github.com/skarupke/ska_sort

1 more reply

etep9y ago

Hi stjepang,

If the following criteria is met, then perhaps the branch mis-predict penalty is less of a problem: 1. you are sorting a large amount of data, much bigger than the CPU LLC 2. you can effectively utilize all cores, i.e. your sort algorithm can parallelize Perhaps in this case you are memory bandwidth limited. If so, you are probably spending more time waiting on data than waiting on pipe flushes (i.e. consequence of mis-predicts).

stjepang9y ago

Absolutely - there are cases when branch misprediction is not the bottleneck. It depends on a lot of factors.

Another such case is when sorting strings because every comparison causes a potential cache miss and introduces even more branching, and all that would dwarf that one misprediction.

VHRanger9y ago

I have no doubt that PDQSort is faster than the other sorting algorithms, but before I swap std::sort for it in serious commercial apps I'd like some proofs and battle testing.

That said, if it really is an in place swap for std::sort I'll try it out ASAP

atilimcetin9y ago· 4 in thread

Rust's stable sort is based on timsort (https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...) and unstable sort is based on pattern-defeating quicksort (https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...). The documentation says that 'It [unstable sorting] is generally faster than stable sorting, except in a few special cases, e.g. when the slice consists of several concatenated sorted sequences.'

gnarbarian8y ago

I like merge sort. Average time may be worse, but it's upper bound is better and it is conceptually cleaner and easier to understand (IMO).

badminton18y ago

Just that it takes extra space and that's sometimes a constraint.

1 more reply

k__9y ago

how much faster?

sgift9y ago

Here are a few benchmarks results from a recent rayon pull:

https://github.com/nikomatsakis/rayon/pull/379

So, for a dual-core with HT 8.26s vs 4.55s

graycat9y ago· 3 in thread

I always wondered if there would be a way to have quicksort run slower than O(n ln(n)).

Due to that possibility, when I code up a sort routine, I use heap sort. It is guaranteed O(n ln(n)) worst case and achieves the Gleason bound for sorting by comparing keys which means that on average and worst case, on the number of key comparisons, it is impossible to do better than heap sort's O(n ln(n)) forever.

For a stable sort, sure, just extend the sort keys with a sequence number, do the sort, and remove the key extensions.

Quicksort has good main memory locality of reference and a possibility of some use of multiple threads, and heap sort seems to have neither. But there is a version of heap sort modified for doing better on locality of reference when the array being sorted is really large.

But, if are not too concerned about memory space, then don't have to care about the sort routine being in place. In that case, get O(n ln(n)), a stable sort, no problems with locality of reference, and ability to sort huge arrays with just the old merge sort.

I long suspected that much of the interest in in-place, O(n ln(n)), stable sorting was due to some unspoken but strong goal of finding some fundamental conservation law of a trade off of processor time and memory space. Well, that didn't really happen. But heap sort is darned clever; I like it.

Franciscouzo8y ago

> I long suspected that much of the interest in in-place, O(n ln(n)), stable sorting was due to some unspoken but strong goal of finding some fundamental conservation law of a trade off of processor time and memory space. Well, that didn't really happen. But heap sort is darned clever; I like it.

It did happen, it's called block sort, but it's not used because of the complexity of implementing it, the constants of its runtime, it's not easy to parallelize, and its best case complexity is still O(n ln n).

torrent-of-ions8y ago

It's cool to play with a pack of cards and run sorting algorithms on them. To see the worst case of quicksort, use the first element as the pivot and give it an already sorted list. It will take quadratic time to give back the same list.

graycat8y ago

Right. So, for the first "pivot" value, people commonly use the median of three -- take three keys and use as the pivot the median of those three, that is, the middle value. Okay. But then the question remains: While in practice the median of three sounds better, maybe there is a goofy, pathological array of keys that still makes quicksort run in quadratic time. Indeed, maybe for any way of selecting the first pivot, there is an array that makes quicksort quadratic.

Rather than think about that, I noticed that heap sort meets the Gleason bound which means that heap sort's O(n ln)n)) performance both worst case and average case can never be beaten by a sort routine that depends on comparing keys two at a time.

Then, sure, can beat O(n ln(n)). How? Use radix sort -- that was how the old punched card sorting machines worked. So, for an array of length n and a key of length k, the thing always runs in O(nk) which for sufficiently large n is less than O(n ln(n)). In practice? Nope: I don't use radix sort!

jkabrg9y ago· 2 in thread

[Post-edit] I made several edits to the post below. First, to make an argument. Second, to add paragraphs. [/Post-edit]

Tl;dr version: It seems to me you should either use heapsort or plain quicksort; the latter with the sort of optimisations described in the linked article, but not including the fallback to heapsort.

Long version:

Here's my reasoning for the above:

You're either working with lists that are reasonably likely to trigger the worst case of randomised quicksort, or you're not working with such lists. By likely, I mean the probability is not extremely small.

Consider the case when the worst case is very unlikely: you're so unlikely to have a worst case that you're gaining almost nothing for accounting for it except extra complexity. So you might as well only use quicksort with optimisations that are likely to actually help.

Next is the case that a worst case might actually happen. Again, this is not by chance; it has to be because someone can predict your "random" pivot and screw with your algorithm; in that case, I propose just using heapsort. Why? This might be long, so I apologise. It's because usually when you design something, you design it to a high tolerance; a high tolerance in this case ought to be the worst case of your sorting algorithm. In which case, when designing and testing your system, you'll have to do extra work to tease out the worst case. To avoid doing that, you might as well use an algorithm that takes the same amount of time every time, which I think means heapsort.

orlp9y ago

The overhead of including the fallback to heapsort takes a negligible, non-measurable amount of processing time that guarantees a worst case runtime of O(n log n), and to be more precise, a worst case that is 2 - 4 times as slow as the best case.

Your logic also would mean that any sorting function that is publicly facing (which is basically any interface on the internet, like a sorted list of Facebook friends) would need to use heapsort (which is 2-4 times as slow), as otherwise DoS attacks are simply done by constructing worst case inputs.

There are no real disadvantages to the hybrid approach.

jkabrg9y ago

Thanks for your reply.

> Your logic also would mean that any sorting function that is publicly facing (which is basically any interface on the internet, like a sorted list of Facebook friends) would need to use heapsort (which is 2-4 times as slow), as otherwise DoS attacks are simply done by constructing worst case inputs.

Why is that a wrong conclusion? It might be, I'm not a dev. But if I found myself caring about that sort of minutiae, I would reach exactly that conclusion.

Reasons:

* the paranoid possibility that enough users can trigger enough DoS attacks that your system can fall over. If this is likely enough, maybe you should design for the 2-4x worst case, and make your testing and provisioning of resources easier.

* a desire for simplicity when predicting performance, which you're losing by going your route because you're adding the possibility of a 2-4x performance drop depending on the content of the list. Ideally, you want the performance to solely be a function of n, where n is the size of your list; not n and the time-varying distribution of evilness over your users.

Finally, adding a fallback doesn't seem free to me, because it might fool you into not addressing the points I just made. That O(n^2) for Quicksort might be a good way to get people to think; your O(n log n) is hiding factors which don't just depend on n.

2 more replies

j_s9y ago· 2 in thread

Has HN ever discussed the possibilities when purposely crafting worst-case input to amplify a denial-of-service attack?

nightcracker9y ago

If whoever you're targeting uses libc++, I already did the analysis: https://bugs.llvm.org/show_bug.cgi?id=20837

To my knowledge it's still not fixed.

beagle38y ago

Doug McKilroy did some 20 years ago, for quick sort:

http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

beagle39y ago· 2 in thread

Anyone knows how this compares to Timsort in practice?

A quick google turns out nothing

mastax9y ago

https://github.com/rust-lang/rust/pull/40601

"stable" is a simplified Timsort: https://github.com/rust-lang/rust/pull/38192

"unstable" is a pdqsort

stjepang9y ago

To summarize:

If comparison is cheap (e.g. when sorting integers), pdqsort wins because it copies less data around and the instructions are less data-dependency-heavy.

If comparison is expensive (e.g. when sorting strings), timsort is usually a tiny bit faster (around 5% or less) because it performs a slightly smaller total number of comparisons.

jorgemf9y ago· 2 in thread

Where is a high level description of the algorithm? How is it different from quick sort, it seems quite similar based on a quick observation of the code.

klodolph9y ago

The readme file actually contains a fairly thorough description of how it differs from quicksort. Start with the section titled "the best case".

jorgemf9y ago

> On average case data where no patterns are detected pdqsort is effectively a quicksort that uses median-of-3 pivot selection

So basically is quicksort with a bit more clever pivot selection, but only for some cases.

1 more reply

wiz21c9y ago· 2 in thread

Is there a analysis of its complexity ? The algorithm looks very nice !

nightcracker9y ago

Hey, author of pdqsort here, the draft paper contains complexity proofs of the O(n log n) worst case and O(nk) best case with k distinct keys: https://drive.google.com/open?id=0B1-vl-dPgKm_T0Fxeno1a0lGT0...

ouid9y ago

Best case? Give worst and average case when describing complexities.

2 more replies

nneonneo9y ago· 1 in thread

I would love to see the benchmark results against Timsort, the Python sorting algorithm that also implements a bunch of pragmatic heuristics for pattern sorting. Timsort has a slight advantage over pdqsort in that Timsort is stable, whereas pdqsort is not.

I see that timsort.h is in the benchmark directory, so it seems odd to me that the README doesn't mention the benchmark results.

orlp9y ago

There are multiple reasons I don't include Timsort in my README benchmark graph:

1. There is no authoritative implementation of Timsort in C++. In the bench directory I included https://github.com/gfx/cpp-TimSort, but I don't know the quality of that implementation.

2. pdqsort intends to be the algorithm of choice of a system unstable sort. In other words, a direct replacement for introsort for std::sort. So std::sort is my main comparison vehicle, and anything else is more or less a distraction. The only reason I included std::stable_sort in the benchmark is to show that unstable sorting is an advantage for speed for those unaware.

But, since you're curious, here's the benchmark result with Timsort included on my machine: http://i.imgur.com/tSdS3Y0.png

This is for sorting integers however, I expect Timsort to become substantially better as the cost of a comparison increases.

torrent-of-ions8y ago· 1 in thread

I can see why one might blindly call qsort on already sorted data (when using user input), but why sorted data with one out of place element? Presumably that element has been appended to a sorted array, so you would place it properly in linear time using insertion and not call a sort function at all. Why does such a pattern arise in practice?

nightcracker8y ago

You would be surprised how often people just use a (repeatedly) sorted vector in spite of a proper data structure or proper insertion calls. It's a lot simpler to just append and sort again. Or the appending happens somewhere else in the code entirely.

As a real-world example, consider a sprite list in a game engine. Yes, you could keep it properly sorted as you add/remove sprites, but it's a lot simpler to just append/delete sprites throughout the code, and just sort it a single time each frame, even if it only adds a single sprite.

So yes, technically this pattern is not needed if everyone always recognized and used the optimal data structure and algorithm at the right time. But that doesn't happen, and it isn't always the simplest solution for the programmer.

ComputerGuru8y ago

Just because I was confused: this is by Orson Peters who first invented pdq. It's not brand new (as in yesterday), but is a very, very recent innovation (2016).

kleiba8y ago

This book is one of the most general treatments of parameterized Quicksort available: http://wild-inter.net/publications/wild-2016.pdf

unruledboy8y ago

it's interesting that .Net built-in quicksort is actually doing the same thing, with introsort behind the scenes.

j / k navigate · click thread line to collapse

72 comments

54 comments · 14 top-level

dpcx9y ago· 13 in thread

Question as a non low-level developer, and please forgive my ignorance:

How is it that we're essentially 50 years in to writing sorting algorithms, and we still find improvements? Shouldn't sorting items be a "solved" problem by now?

_hrfd9y ago

Basically all comparison-based sort algorithms we use today stem from two basic algorithms: mergesort (stable sort, from 1945) and quicksort (unstable sort, from 1959).

[1] http://erdani.com/research/sea2017.pdf

xenadu028y ago

1 more reply

lorenzhs8y ago

beagle38y ago

There's actually 3:

Quick sort (unstable, n^2 worst case, in place, heapsort (unstable, n log n worst case, in place) and merge sort (stable, n log n worst case, not in place)

There are variants of each that trade one thing for another (in placeness for stability, constants for worst case), but these are the three efficient comparison sort archetypes.

Of these, quicksort and heap sort can do top-k which is often useful; and heapsort alone can do streaming top-k.

xoroshiro8y ago

Interesting history!

>Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction)

1 more reply

agumonkey9y ago

Thanks for the alexandrescu paper

yoran8y ago

Thanks for this answer!

DiThi9y ago

contravariant9y ago

To make matters worse there are also more specific sorting algorithms like radix sort, which can be even faster in cases where they can be used.

wiz21c8y ago

There are as many sorting algorithm as their are data distribution. So some algorithm are better suited to some problems. Therefore, each requires specific research.

Finally, implementing a sort alogirth requires a hell of carefulness. They are super tricky. See for example : http://cs.fit.edu/~pkc/classes/writing/samples/bentley93engi...

lz4008y ago

I don't know about improvements but new methods are still being found, like sleep sort

https://www.reddit.com/r/ProgrammerHumor/comments/5vpdw5/sle...

paulddraper9y ago

This is an improvment in practical cases, not theoretical ones.

jorgemf9y ago

_hrfd9y ago· 8 in thread

It is notable that Rust is the first programming language to adopt pdqsort, and I believe its adoption will only grow in the future.

Finally, last week I implemented parallel sorts for Rayon (Rust's data parallelism library) based on timsort and pdqsort[4].

Check out the links for more information and benchmarks. And before you start criticizing the benchmarks, please keep in mind that they're rather simplistic, so please take them with a grain of salt.

I'd be happy to elaborate further and answer any questions. :)

[1] https://github.com/rust-lang/rust/pull/38192

[2] https://github.com/rust-lang/rust/issues/40585

[3] https://github.com/rust-lang/rust/pull/40601

[4] https://github.com/nikomatsakis/rayon/pull/379

lorenzhs8y ago

[1] excellent paper on a parallel in-place adaptation by some of my colleagues: https://arxiv.org/abs/1705.02257 - code will become available soon

[3] https://arxiv.org/abs/1604.06697, implementation at https://github.com/weissan/BlockQuicksort - pdqsort uses the same technique

sgift8y ago

CJefferson8y ago

Super picky reply, the gap programming language (www.gap-system.org) used pdqsort before Rust :) (I know, as I implemented it).

Retric9y ago

So, what where your benchmarks like?

_hrfd9y ago

That is true - the benchmarks mostly focus on random cases, although there are a few benchmarks with "mostly sorted" arrays (sorted arrays with sqrt(n) random swaps).

I'm aware that benchmarks are tricky to get right, can be biased, and are always controversial. If you have any further questions, feel free to ask.

[1]: https://github.com/orlp/pdqsort

[2]: https://github.com/skarupke/ska_sort

1 more reply

etep9y ago

Hi stjepang,

stjepang9y ago

Absolutely - there are cases when branch misprediction is not the bottleneck. It depends on a lot of factors.

Another such case is when sorting strings because every comparison causes a potential cache miss and introduces even more branching, and all that would dwarf that one misprediction.

VHRanger9y ago

I have no doubt that PDQSort is faster than the other sorting algorithms, but before I swap std::sort for it in serious commercial apps I'd like some proofs and battle testing.

That said, if it really is an in place swap for std::sort I'll try it out ASAP

atilimcetin9y ago· 4 in thread

gnarbarian8y ago

I like merge sort. Average time may be worse, but it's upper bound is better and it is conceptually cleaner and easier to understand (IMO).

badminton18y ago

Just that it takes extra space and that's sometimes a constraint.

1 more reply

k__9y ago

how much faster?

sgift9y ago

Here are a few benchmarks results from a recent rayon pull:

https://github.com/nikomatsakis/rayon/pull/379

So, for a dual-core with HT 8.26s vs 4.55s

graycat9y ago· 3 in thread

I always wondered if there would be a way to have quicksort run slower than O(n ln(n)).

For a stable sort, sure, just extend the sort keys with a sequence number, do the sort, and remove the key extensions.

Franciscouzo8y ago

torrent-of-ions8y ago

graycat8y ago

jkabrg9y ago· 2 in thread

[Post-edit] I made several edits to the post below. First, to make an argument. Second, to add paragraphs. [/Post-edit]

Tl;dr version: It seems to me you should either use heapsort or plain quicksort; the latter with the sort of optimisations described in the linked article, but not including the fallback to heapsort.

Long version:

Here's my reasoning for the above:

orlp9y ago

There are no real disadvantages to the hybrid approach.

jkabrg9y ago

Thanks for your reply.

Why is that a wrong conclusion? It might be, I'm not a dev. But if I found myself caring about that sort of minutiae, I would reach exactly that conclusion.

Reasons:

2 more replies

j_s9y ago· 2 in thread

Has HN ever discussed the possibilities when purposely crafting worst-case input to amplify a denial-of-service attack?

nightcracker9y ago

If whoever you're targeting uses libc++, I already did the analysis: https://bugs.llvm.org/show_bug.cgi?id=20837

To my knowledge it's still not fixed.

beagle38y ago

Doug McKilroy did some 20 years ago, for quick sort:

http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

beagle39y ago· 2 in thread

Anyone knows how this compares to Timsort in practice?

A quick google turns out nothing

mastax9y ago

https://github.com/rust-lang/rust/pull/40601

"stable" is a simplified Timsort: https://github.com/rust-lang/rust/pull/38192

"unstable" is a pdqsort

stjepang9y ago

To summarize:

If comparison is cheap (e.g. when sorting integers), pdqsort wins because it copies less data around and the instructions are less data-dependency-heavy.

If comparison is expensive (e.g. when sorting strings), timsort is usually a tiny bit faster (around 5% or less) because it performs a slightly smaller total number of comparisons.

jorgemf9y ago· 2 in thread

Where is a high level description of the algorithm? How is it different from quick sort, it seems quite similar based on a quick observation of the code.

klodolph9y ago

The readme file actually contains a fairly thorough description of how it differs from quicksort. Start with the section titled "the best case".

jorgemf9y ago

> On average case data where no patterns are detected pdqsort is effectively a quicksort that uses median-of-3 pivot selection

So basically is quicksort with a bit more clever pivot selection, but only for some cases.

1 more reply

wiz21c9y ago· 2 in thread

Is there a analysis of its complexity ? The algorithm looks very nice !

nightcracker9y ago

ouid9y ago

Best case? Give worst and average case when describing complexities.

2 more replies

nneonneo9y ago· 1 in thread

I see that timsort.h is in the benchmark directory, so it seems odd to me that the README doesn't mention the benchmark results.

orlp9y ago

There are multiple reasons I don't include Timsort in my README benchmark graph:

1. There is no authoritative implementation of Timsort in C++. In the bench directory I included https://github.com/gfx/cpp-TimSort, but I don't know the quality of that implementation.

But, since you're curious, here's the benchmark result with Timsort included on my machine: http://i.imgur.com/tSdS3Y0.png

This is for sorting integers however, I expect Timsort to become substantially better as the cost of a comparison increases.

torrent-of-ions8y ago· 1 in thread

nightcracker8y ago

ComputerGuru8y ago

Just because I was confused: this is by Orson Peters who first invented pdq. It's not brand new (as in yesterday), but is a very, very recent innovation (2016).

kleiba8y ago

This book is one of the most general treatments of parameterized Quicksort available: http://wild-inter.net/publications/wild-2016.pdf

unruledboy8y ago

it's interesting that .Net built-in quicksort is actually doing the same thing, with introsort behind the scenes.

j / k navigate · click thread line to collapse