Edit: I misremembered, memory access is actually O(sqrt(N)):
Nothing theoretical about it: Sorting a list of all IP addresses can absolutely and trivially be done in O(N)
> in reality you can't do better than O(log N)
You can't traverse the list once in <N so the complexity of any sort must be ≥N.
> but memory access is logarithmic
No it's not, but it's also irrelevant: A radix sort doesn't need any reads if the values are unique and dense (such as the case IP addresses, permutation arrays, and so on).
> Edit: I misremembered, memory access is actually O(sqrt(N)): https://github.com/emilk/ram_bench
It's not that either.
The author ran out of memory; They ran a program that needs 10GB of ram on a machine with only 8GB of ram in it. If you give that program enough memory (I have around 105gb free) it produces a silly graph that looks nothing like O(√N): https://imgur.com/QjegDVI
The latency of accessing memory is not a function of N.
This is not generally relevant for PCs because the distance between cells in the DIMM does not affect timing; ie. the memory is timed based on worst-case delay.
How could it not be, given that any physical medium has finite information density, and information cannot propagate faster than the speed of light?
And on a practical computer the size of N will determine if you can do all your lookups from registers, L1-L3, or main RAM (plus SSD, unless you disable paging).
You could have a tape of infinite length, and if you only ever request "the next one" then clearly the latency is constant.
> And on a practical computer the size of N ...
Don't be silly. N is the size of the input set, not the size of the universe.
No, you can clearly see the O(sqrt(N)) trend at every level of the memory hierarchy.
Lines goes up, then goes down. Very much unlike sqrt.
I’m glad other people are having this conversation now and not just me.
All the time.
IP addresses
permutation arrays (⍋⍋)
Sometimes I pretend characters are numbers; short fixed-length strings (like currencies or country codes or even stock tickers) can be numbers.
If I can get away with it, a radix sort is better than anything else.
More often than not. Sorting by morton code, sorting by index, etc.
Stop faffing about with college text book solutions to problems like you’re filling in a bingo card and use a real database. Otherwise known as Architecture.
I haven’t touched a radix sort for almost twenty years. In that time, hardware has sprouted an entire other layer of caching. I bet you’ll find that on real hardware, with complex objects, a production-ready Timsort is competitive with your hand written radix sort.
> and use a real database.
I'm not going to use a database to store tens of billions of 3D vertices. And I'm not going to use a database to sort them, because it's multiple times, probably orders of magnitude faster to sort them yourself.
It's weird to impose completely out-of-place suggestions onto someone who does something completely different to what you're thinking of.
In high performance systems, constant time random access is just not constant.