My favourite example: multiplication of very large square matrices. The algorithms with the best time complexity are never used in the real world. Under no realistic circumstances would they offer the best performance.
In that case, it's not cache behaviour or branch-prediction that dooms the algorithm, but very intensive steps in the algorithm that are nonetheless constant-time, and so don't impact the complexity theoretic properties.
Do we describe such algorithms as the 'fastest'? I wouldn't. They have the lowest time complexity. Not the same thing.
Models are there to enable us to reason about things that are too complex to reason about directly. They're always going to lose information. Sometimes using a different model is the right answer, and sometimes you need practical experiments to determine what actually works.
Sounds similar to the 'Brodal queue'.
For example for very small sets of numbers you could basically pre-sort every combination there is then have a very large lookup table. Much like a rainbow table for passwords. Your O is basically a binary search lookup O(log(n)) or even better O(1) if you can make it linear. However, the upfront cost is huge and storage cost is huge, lookup would probably be big too. Hmm, now that I think about it this could be an interesting thing to mess with. Also at this time anything past 4 bytes would be unusable. Hmm, maybe later. Like you point out a 'galactic alg'. Portions of that thinking can be pulled out and used for other items though.
With sorting using the comparison is a good proxy for if it might perform well. But it is just that, a proxy. Like the weird sort I just made up. There is probably a few shifts and muls in there. Those are decently expensive and can blow your perf right out of the water.
Generally I'd say that's true, but even that depends on context. For sorting very small arrays, on typical hardware, you can't beat bubble-sort and insertion-sort.
https://www.geeksforgeeks.org/strassens-matrix-multiplicatio... , see also https://youtube.com/watch?v=ORrM-aSNZUs
> To achieve agreement when discussing the speed of a sorting algorithm, it’s necessary to establish some common ground. How do we measure speed? There is a well-established model – the von Neumann computer, or unit-cost RAM model – widely used by computer scientists to evaluate algorithms without introducing all the complexities of real-world computers. This model doesn’t account for some important things, such as the performance gap between RAM and CPU, but typically gives good approximations of real-life performance.