You don't even need to have specific knowledge of the hardware as long as you can identify cross-over points where one algorithm starts significantly outperforming others. An old HPC trick is to write software that thoroughly measures several algorithm strategies on your specific hardware environment and then code-gens an algorithm that has an optimal set of strategies and strategy-switching thresholds. The meta-algorithm is mostly focused on cheaply detecting these thresholds at runtime.
Where possible, I always liked arrays of function pointers for each next power of 2 in the input sizes. A few popcounts and a jump later, and you're executing the right routine. It's not great if you're throwing a general-purpose tool against tiny problems, but it's not a huge amount of overhead either, and it's dead simple to code.
There's a massive performance benefit to doing this at the cost of implementation complexity. I haven't studied the implementations or tried my hand one, but I get the impression that these are tough to implement correctly in a way that takes full advantage of the hardware.
(In that sense, it's awesome that the researchers also did the legwork to implement and maintain a library!)
The core here, which is bitmap storage and the basic optimization are simple, but solid and general. Mathematically it takes less space to store data as positions on a bitmap rather than fully spelt associations, and it takes even less space to seggregate the bitmaps in chunks.
This will hold and be smaller and faster in any computer. So, it's not some case of special case based on heuristics, either related to the specific frequencies or sample characteristics of some particular set of data, or of specific CPU peculiarities or whatever.
More exotic optimizations piled on top, sure. But "compressed bitmaps" themselves as a concept, not.
Given a large number of 32-bit integers, radix sort is indeed significantly faster.
While I would reach for a comparison sort method if I had a large number of arbitrary-length Unicode strings, which I wanted to sort in a case-ignoring order.
Also, I found timsort faster than radix sort when there was a small number (as I recall, <100 or so) of elements.
While Roaring Bitmap performs well on space saving with good performance, my gut feeling is there're better ways to achieve the same goals, especially it involving sorting on insertion and binary search on lookup, both on the sorted top level list and the sorted lower-bit integers.
Also it works well on 32-bit integers and probably 64-bit integers but it's not clear it can scale well beyond to bigger integers.
Nevertheless it's an excellent algorithm that works well in practice.
> Bitmaps are arrays of bits used to store sets of integers. They work by setting the Nth bit when an integer N is in the set
They waste space. But if you were to compress them, they'd compress very well. As in, for example, if you were to compress them using roaring bitmaps.
I wouldn't use the word exotic but rather specialized and/or optimized.
Infact later versions of Roaring use run-length-encoding (RLE) containers in addition to array and bitmap containers which is a technique shared with more traditional/general compression algorithms.
In effect rather than achieving "very good" compression, Roaring is very close to "optimal" compression for sparse bitmaps.
sorted-list/bitmap/runs, in a two level tree. cool.
it's technically missing sorted compliment-list, i.e. only the items that are missing, although worst case runs only use twice as much space, so more complexity without much savings, esp. considering real workloads
performs better with sequential ids than random uuids because you use fewer pages
further research
investigating the effect of adding more layers to the tree
using additional set representations e.g. balanced tree
https://github.com/apache/lucene/blob/84cae4f27cfd3feb3bb42d...
the Judy project seems inactive to me:
https://sourceforge.net/p/judy/bugs/
this short HN thread resonates with the intuition of jude1 being stalled, it's last optimistic comment points to an orphaned comment in the big list above
On the surface it's just a very optimized 256-ary radix trie. I think it would take some software archeology to determine if there was more to it than that and if it's assumptions still hold on todays processors.
It's not written in an highly unstable language. The safety of every bridge you drive over was probably partially validated using fortran numerical code from the 1980s.
Today it'd make much more sense to implement ART, which is dramatically simpler.
Today we open sourced about $25m in R&D (more to come)! They take a bit of harnessing, but once you do the results are amazing. We have released an OLAP database called FeatureBase, built entirely on bitmaps (well bitmaps in a b-tree)...
Here is a bit (pun intended) more about our big bet on bitmaps: https://www.featurebase.com/blog/bitmaps-making-real-time-an...
What would an array of pointers to X containers which are N size each look like on disk?
To de-serialize, read each container by reading its prefix length and its container id, and then read the rest of the container by the length. The top level list is sorted by the container id. It can be reconstructed by inserting the container id with the container memory pointer into the sorted list.
It's funny that in the end Bitmaps essentially are a modified data structure and process.
The way things are going, I imagine someone will at some point take all the different tricks we have of inserting, sorting, finding, deleting data, take millions of datasets and run some machine learning type process to create libraries that can perform these operations optimally.
Most developers do not have a good intuition for the efficiency and scalability of sequential brute-force on modern architectures. There is a cross-over threshold but I think many people would be surprised at how high it is in practice.
I wonder if it would be worth the trouble to code-gen a bunch of variants with things like 8-bit entries, and benchmark to death to determine the optimal cutover points...
Having said that, I applaud that this type of datastructure is being investigated. An efficient bitset should be present in every collections library.
E.g. a thing that legacy can but the new www (Solr) currently can't is allow downloads in a streaming fashion of more than 5million documents. As maintaing a roaringbitmap cost very little memory but depends on docid to key/value store mapping. Our extensions allowed this and gave us a very easy to use rest API.