Radix sort is O[n] . (or [n * number of bytes in Int] or whatever is being compared).
It's omitted from the comparisons, I see. Radix sort can also have predictable memory overhead.
Edit: I misremembered, memory access is actually O(sqrt(N)):
And djb's vectorized sorting networks are pretty great: https://sorting.cr.yp.to/
It is slower than it's unstable brother, aptly named crumsort. https://github.com/scandum/crumsort
1) Is there a theoretical minimum assymptotic speed for sorting?
2) Some of these newer sorts seem to have arbitrary parameters in them (e.g. do X if size of set is smaller than Y). Are we likely to see more and more complex rule sets like that lead to increased sorting speeds?
2) Hybrid algorithms, i.e. those that change the underlying strategy based on some parameters, are already quite common in real-world implementaions. Examples include timsort (stable, mix of merge sort and insertion sort), used in Python, Java, and Rust, and pdqsort (unstable, mix of quicksort and heap sort), used in Rust and Go.
2) I think things like cache and machine word size have huge impact on real world speed, so it makes sense to have knobs to tweak to fit within those limits, even enough a theoretical analysis does away with constants like that
I think it's worth clarifying that this is the best worst case possible, i.e for every sorting algorithm you could create a certain input in a way that it won't be able to beat O(nlog(n)). In other words, O(nlog(n)) is a minimum hard limit for worst case speed, no algorithm can do better than O(nlog(n)) on all possible inputs (but it can do better on some inputs, o(n) being the hard limit there).
I don't really remember the theory behind this, but hopefully someone here can answer: is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?
If we are sorting by comparison, then each comparison will eliminate at most half of the possible cases. So we need at least log_2(n!) comparisons in the worst case.
Occasionally one is discovered and solved, and it can lead to a brief domino effect, but I have no idea how many are left.
FWIW, I'm thinking of devices that don't have an MMU - like the Sega 32X. It's a hobby of mine. Having any GC time at all is too much suffering, even 1ms.
EDIT: Retracted -- did a search like for "license", but apparently the search results omits variants like "sublicense" which would have caught the MIT license at the beginning of the source file: https://github.com/scandum/blitsort/search?q=license vs. https://github.com/scandum/blitsort/search?q=sublicense
No it's not released yet but it's getting real close. The code is virtually done, with some minor cleanup required. Besides some personal stuff, a large source of my delay has been the headache that is panic safety in Rust. Non-trivial sorting code becomes extra difficult if a forced stack unwinding can occur every time you compare two elements, where you are then required to fully restore the input array to a valid state. Doubly so if you can't even assume the comparison operator is valid and obeys the strict order semantics.
I am currently in the process of writing a paper I want to publish along glidesort which is also mostly done. Since the linked talk I've also had some more performance wins making it ~4.5 times faster than std::stable_sort for uniform random integers and much more than that for low cardinality data and input patterns.
Glidesort can use arbitrary amounts of auxiliary memory if necessary, but is fastest when given a fraction of the original input array worth of memory - I'm currently planning on releasing it with n / 8 as the default. I haven't looked at blitsort in detail but I am skeptical of the O(n log n) claim, I believe it is O(n (log n)^2) like glidesort is when given a constant amount of memory, as blitsort's partition and merge functions are recursive. Not that this really matters in practice - ultimately the real runtime is what matters. But I wouldn't be surprised to see blitsort slow down more relative to the competition for larger inputs.
In your Youtube presentation you do seem to skip mentioning that many of the performance innovations in glidesort were derived from quadsort and fluxsort. Some credit in your upcoming paper would be much appreciated. Feel free to email me if you have any questions, some things like my first publication of a "branchless" binary search in Aug 2014 may be hard to find, though there might be prior claim.
~4.5 times faster than std::stable_sort for uniform random integers is pretty impressive. Is this primarily from increasing the memory regions from 2 to 4 for parity merges / partitions? I'm benching on somewhat dated hardware and had mixed results (including slowdowns), so I never went further down that rabbit hole.
skasort_cpy is pretty good on 32 bit integers if you give it n auxiliary memory.
rhsort is very good and likely the best for 31 bit integers, but a bit rough around the edges still, and doesn't work well on arrays above 1M elements. Avoid radix sorts for 64 bit integers, they're ideal for 16 bit.
glidesort is promising, though I haven't seen it benched against the latest fluxsort / blitsort.
Timsort's main problem is that it's slow on shuffled arrays.
pdqsort and other introsorts aren't good on semi-ordered data.
https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...