Blitsort: A fast, in-place stable hybrid merge/quick sort (opens in new tab)

(github.com)

202 pointsscandum3y ago88 comments

88 comments

35 comments · 9 top-level

throwaway122453y ago· 7 in thread

Faster than radix sort?

smingo3y ago

Not for big arrays.

Radix sort is O[n] . (or [n * number of bytes in Int] or whatever is being compared).

It's omitted from the comparisons, I see. Radix sort can also have predictable memory overhead.

naasking3y ago

Radix sort is theoretically O(N), but memory access is logarithmic so in reality you can't do better than O(log N) no matter what algorithm you use. Only constant factors matter at that point.

Edit: I misremembered, memory access is actually O(sqrt(N)):

https://github.com/emilk/ram_bench

4 more replies

bugfix-663y ago

Radix sort is also very simple, e.g., https://bugfix-66.com/834f0677c85b23c0bf1047d3654ab7c27ff054...

And djb's vectorized sorting networks are pretty great: https://sorting.cr.yp.to/

EGreg3y ago

Faster than quicksort?

hajile3y ago

Beating quicksort alone is almost always as easy as swapping insertion sort once you get down to around 14 elements. This is used by C#'s quicksort (which also swaps to heapsort after a depth of 32 IIRC) and also in Timsort in JS, Java, Python, etc.

scandumOP3y ago

Blitsort is a hybrid quicksort, see title.

It is slower than it's unstable brother, aptly named crumsort. https://github.com/scandum/crumsort

repiret3y ago

Quicksort isn’t stable.

1 more reply

JaneLovesDotNet3y ago· 6 in thread

Dumb questions from a programmer who is weak at math.

1) Is there a theoretical minimum assymptotic speed for sorting?

2) Some of these newer sorts seem to have arbitrary parameters in them (e.g. do X if size of set is smaller than Y). Are we likely to see more and more complex rule sets like that lead to increased sorting speeds?

magnio3y ago

1) All comparison sorting algorithms, i.e. sorting actually involves comparing elements, has a hard lower bound of O(log n!)=O(n log n) worst case complexity. For non-comparison sorting, the limit depends on the algorithm.

2) Hybrid algorithms, i.e. those that change the underlying strategy based on some parameters, are already quite common in real-world implementaions. Examples include timsort (stable, mix of merge sort and insertion sort), used in Python, Java, and Rust, and pdqsort (unstable, mix of quicksort and heap sort), used in Rust and Go.

m12k3y ago

1) IIRC the best case is O(n) (you just verify that it is already sorted, can't get much faster than that) while the worst case is O(nlog(n)). But some of the best algorithms in real world usage have worse worst case asymptotic speed.

2) I think things like cache and machine word size have huge impact on real world speed, so it makes sense to have knobs to tweak to fit within those limits, even enough a theoretical analysis does away with constants like that

brap3y ago

> the worst case is O(nlog(n))

I think it's worth clarifying that this is the best worst case possible, i.e for every sorting algorithm you could create a certain input in a way that it won't be able to beat O(nlog(n)). In other words, O(nlog(n)) is a minimum hard limit for worst case speed, no algorithm can do better than O(nlog(n)) on all possible inputs (but it can do better on some inputs, o(n) being the hard limit there).

I don't really remember the theory behind this, but hopefully someone here can answer: is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?

4 more replies

Paedor3y ago

For 1) The asymptotic best for sorting is O(n log n), if you're only allowed to compare the size of elements. If you allow operations like array indexing with elements, which is possible for integers, it gets as good as O(n) with radix sort.

hakuseki3y ago

Assuming all values are unequal, there are n factorial (abbreviated n!) possible cases that we need to distinguish.

If we are sorting by comparison, then each comparison will eliminate at most half of the possible cases. So we need at least log_2(n!) comparisons in the worst case.

scandumOP3y ago

Hard to answer. I think the main thing is that there are a lot of stumbling blocks, things that most people overlook, yet logical once you have it explained.

Occasionally one is discovered and solved, and it can lead to a brief domino effect, but I have no idea how many are left.

dleslie3y ago· 4 in thread

Woah, I like the solid memory guarantees. Could be useful for embedded projects.

mmoskal3y ago

Seems to use quite a bit of stack for an embedded usage. An interesting thing to note: microcontrollers are typically memory-bound when using common algorithms - while they are maybe 100x slower than a desktop computer they have say 1000000x less RAM. So, for example, a GC cycle would be often in the range of 1ms.

odo12423y ago

Wouldn’t a GC cycle be short if the microcontroller had less RAM? (as a genuine question)

2 more replies

dleslie3y ago

Oof, heavy stack usage is ungood. But that's probably correctable.

FWIW, I'm thinking of devices that don't have an MMU - like the Sega 32X. It's a hobby of mine. Having any GC time at all is too much suffering, even 1ms.

girvo3y ago

The bigger problem with GC and standard allocators on embedded system is heap fragmentation, in my experience.

1 more reply

that3y ago· 3 in thread

Heads up: No license given in the repo, be careful if you are thinking of using this for a project.

EDIT: Retracted -- did a search like for "license", but apparently the search results omits variants like "sublicense" which would have caught the MIT license at the beginning of the source file: https://github.com/scandum/blitsort/search?q=license vs. https://github.com/scandum/blitsort/search?q=sublicense

HillRat3y ago

Standard MIT licensing's attached to the source code itself.

sergiotapia3y ago

https://github.com/scandum/blitsort/blob/main/src/blitsort.c...

that3y ago

Ah "sublicense" seems to be the keyword to find there otherwise a search comes up empty: https://github.com/scandum/blitsort/search?q=license

g0xA52A2A3y ago· 2 in thread

On mobile at the moment but it will be interesting to see how this compares to Glidesort [1]. Though I don’t think it’s been released yet.

[1] https://m.youtube.com/watch?v=2y3IK1l6PI4

orlp3y ago

Hello, author of glidesort (and for context, pdqsort) here.

No it's not released yet but it's getting real close. The code is virtually done, with some minor cleanup required. Besides some personal stuff, a large source of my delay has been the headache that is panic safety in Rust. Non-trivial sorting code becomes extra difficult if a forced stack unwinding can occur every time you compare two elements, where you are then required to fully restore the input array to a valid state. Doubly so if you can't even assume the comparison operator is valid and obeys the strict order semantics.

I am currently in the process of writing a paper I want to publish along glidesort which is also mostly done. Since the linked talk I've also had some more performance wins making it ~4.5 times faster than std::stable_sort for uniform random integers and much more than that for low cardinality data and input patterns.

Glidesort can use arbitrary amounts of auxiliary memory if necessary, but is fastest when given a fraction of the original input array worth of memory - I'm currently planning on releasing it with n / 8 as the default. I haven't looked at blitsort in detail but I am skeptical of the O(n log n) claim, I believe it is O(n (log n)^2) like glidesort is when given a constant amount of memory, as blitsort's partition and merge functions are recursive. Not that this really matters in practice - ultimately the real runtime is what matters. But I wouldn't be surprised to see blitsort slow down more relative to the competition for larger inputs.

scandumOP3y ago

It should be O(n log n) comparisons and technically O(n (log n)^2) moves. The moves are reduced by a relatively large constant however, and blitsort might qualify as O(n log n) moves when given sqrt(n) aux.

In your Youtube presentation you do seem to skip mentioning that many of the performance innovations in glidesort were derived from quadsort and fluxsort. Some credit in your upcoming paper would be much appreciated. Feel free to email me if you have any questions, some things like my first publication of a "branchless" binary search in Aug 2014 may be hard to find, though there might be prior claim.

~4.5 times faster than std::stable_sort for uniform random integers is pretty impressive. Is this primarily from increasing the memory regions from 2 to 4 for parity merges / partitions? I'm benching on somewhat dated hardware and had mixed results (including slowdowns), so I never went further down that rabbit hole.

1 more reply

blondin3y ago· 1 in thread

thanks for sharing this with us. i admire this approach that builds on small improvements here and there. and these improvements are interesting in their own way. it reminds me of micro-optimizations with encoding routines. i didn't SIMD in the code at first glance. you might gain significant speed with SIMD.

janwas3y ago

:) Indeed, we're seeing 10x speedups from SIMD for some distributions. It's useful both for sorting networks and quicksort partitioning. Code here: https://github.com/google/highway/tree/master/hwy/contrib/so... (disclosure: I'm one of the co-authors).

jansan3y ago· 1 in thread

Since the sorting scene seems to be fully assembled in this thread, I take the opportunity to ask quick question about a use case that does not seem uncommon: Is there an algorithm that is especially efficient if the array contains many sorted sequences, and still works alright for fully shuffled arrays? And are there algorithms that should be avoided in these cases.

scandumOP3y ago

Quadsort, fluxsort, blitsort, and crumsort all qualify depending on your needs.

skasort_cpy is pretty good on 32 bit integers if you give it n auxiliary memory.

rhsort is very good and likely the best for 31 bit integers, but a bit rough around the edges still, and doesn't work well on arrays above 1M elements. Avoid radix sorts for 64 bit integers, they're ideal for 16 bit.

glidesort is promising, though I haven't seen it benched against the latest fluxsort / blitsort.

Timsort's main problem is that it's slow on shuffled arrays.

pdqsort and other introsorts aren't good on semi-ordered data.

k2xl3y ago· 1 in thread

Question: are any of these novel sorting algorithms being used in modern databases or tech stacks?

ismailmaj3y ago

pdqsort by Orson Peters is used in Rust std for `sort_unstable`.

https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...

ZhongDongLong3y ago· 1 in thread

Is there a TrollSort? I'm thinking of an algorithm which initially seems to be fast and efficient but takes exponentially longer time with larger arrays and exponentially longer towards the end of sorting.

justansite3y ago

Not sure if this qualifies, but the first thing that came to mind for me was bogosort sometimes called bozosort. https://en.m.wikipedia.org/wiki/Bogosort

j / k navigate · click thread line to collapse

88 comments

35 comments · 9 top-level

throwaway122453y ago· 7 in thread

Faster than radix sort?

smingo3y ago

Not for big arrays.

Radix sort is O[n] . (or [n * number of bytes in Int] or whatever is being compared).

It's omitted from the comparisons, I see. Radix sort can also have predictable memory overhead.

naasking3y ago

Radix sort is theoretically O(N), but memory access is logarithmic so in reality you can't do better than O(log N) no matter what algorithm you use. Only constant factors matter at that point.

Edit: I misremembered, memory access is actually O(sqrt(N)):

https://github.com/emilk/ram_bench

4 more replies

bugfix-663y ago

Radix sort is also very simple, e.g., https://bugfix-66.com/834f0677c85b23c0bf1047d3654ab7c27ff054...

And djb's vectorized sorting networks are pretty great: https://sorting.cr.yp.to/

EGreg3y ago

Faster than quicksort?

hajile3y ago

scandumOP3y ago

Blitsort is a hybrid quicksort, see title.

It is slower than it's unstable brother, aptly named crumsort. https://github.com/scandum/crumsort

repiret3y ago

Quicksort isn’t stable.

1 more reply

JaneLovesDotNet3y ago· 6 in thread

Dumb questions from a programmer who is weak at math.

1) Is there a theoretical minimum assymptotic speed for sorting?

magnio3y ago

m12k3y ago

brap3y ago

> the worst case is O(nlog(n))

4 more replies

Paedor3y ago

hakuseki3y ago

Assuming all values are unequal, there are n factorial (abbreviated n!) possible cases that we need to distinguish.

If we are sorting by comparison, then each comparison will eliminate at most half of the possible cases. So we need at least log_2(n!) comparisons in the worst case.

scandumOP3y ago

Hard to answer. I think the main thing is that there are a lot of stumbling blocks, things that most people overlook, yet logical once you have it explained.

Occasionally one is discovered and solved, and it can lead to a brief domino effect, but I have no idea how many are left.

dleslie3y ago· 4 in thread

Woah, I like the solid memory guarantees. Could be useful for embedded projects.

mmoskal3y ago

odo12423y ago

Wouldn’t a GC cycle be short if the microcontroller had less RAM? (as a genuine question)

2 more replies

dleslie3y ago

Oof, heavy stack usage is ungood. But that's probably correctable.

FWIW, I'm thinking of devices that don't have an MMU - like the Sega 32X. It's a hobby of mine. Having any GC time at all is too much suffering, even 1ms.

girvo3y ago

The bigger problem with GC and standard allocators on embedded system is heap fragmentation, in my experience.

1 more reply

that3y ago· 3 in thread

Heads up: No license given in the repo, be careful if you are thinking of using this for a project.

HillRat3y ago

Standard MIT licensing's attached to the source code itself.

sergiotapia3y ago

https://github.com/scandum/blitsort/blob/main/src/blitsort.c...

that3y ago

Ah "sublicense" seems to be the keyword to find there otherwise a search comes up empty: https://github.com/scandum/blitsort/search?q=license

g0xA52A2A3y ago· 2 in thread

On mobile at the moment but it will be interesting to see how this compares to Glidesort [1]. Though I don’t think it’s been released yet.

[1] https://m.youtube.com/watch?v=2y3IK1l6PI4

orlp3y ago

Hello, author of glidesort (and for context, pdqsort) here.

scandumOP3y ago

1 more reply

blondin3y ago· 1 in thread

janwas3y ago

jansan3y ago· 1 in thread

scandumOP3y ago

Quadsort, fluxsort, blitsort, and crumsort all qualify depending on your needs.

skasort_cpy is pretty good on 32 bit integers if you give it n auxiliary memory.

glidesort is promising, though I haven't seen it benched against the latest fluxsort / blitsort.

Timsort's main problem is that it's slow on shuffled arrays.

pdqsort and other introsorts aren't good on semi-ordered data.

k2xl3y ago· 1 in thread

Question: are any of these novel sorting algorithms being used in modern databases or tech stacks?

ismailmaj3y ago

pdqsort by Orson Peters is used in Rust std for `sort_unstable`.

https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...

ZhongDongLong3y ago· 1 in thread

justansite3y ago

Not sure if this qualifies, but the first thing that came to mind for me was bogosort sometimes called bozosort. https://en.m.wikipedia.org/wiki/Bogosort

j / k navigate · click thread line to collapse