undefined | Better HN

0 pointsssivark11d ago0 comments

Daniel Lemire's points about low-level hardware optimization notwithstanding, it's worth pointing out that binary search (or low-level implementation variants) is the best only if you know nothing about the data beyond the fact that it is sorted / monotonic.

If you have priors about the data distribution, then it's possible to design algorithms which use that extra information to perform MUCH better. eg: a human searching a physical paper dictionary can zoom into the right bunch of pages faster than pure idealized binary search; it's a separate matter it's hard for humans to continue binary search till the very end and we might default to scanning linearly for the last few iterations (cognitive convenience / affordances of human wetware / etc).

In mathematical language, searching a sorted list is basically inverting a monotonic function, by using a closed-loop control algorithm. Often, we could very well construct a suitable cost function and use gradient descent or its accelerated cousins.

More generally, the best bet to solving a problem more efficiently is always to use more information about the specific problem you want to solve, instead of pulling up the solution for an overly abstract representations. That can offer scalable orders of magnitude speedup compared to constant factor speedups from just using hardware better.

0 comments

charleslmunger10d ago

I've spent some brainpower on binary search and have not been able to beat this:

https://github.com/protocolbuffers/protobuf/blob/44025909eb7...

1. Check for dense list O(1) 2. Check upper bound 3. Constant trip count binary search

The constant trip count is great for the branch predictor, and the core loop is pretty tightly optimized for the target hardware, avoiding multiplies. Every attempt to get more clever made the loop worse and did not pay for itself. It's hard because it's an array-of-structs format with a size of 12, and mostly pretty small N.

ben-schaaf10d ago

For a pretty small N I've found that less clever can be quite a bit faster. I'd try a linear search - possibly SIMD if you can change the data format to struct-of-arrays. An adaptive approach that uses linear search up to a certain N can also yield some benefit.

charleslmunger10d ago

The first implementation I encountered was a linear search, starting at the last-found field. Empirically it performed better to do a binary search with early exit and branchless bounds selection, I think due to branch predictor pressure. The data representation could be changed but it's tricky, as there are other traversals that want to go in sorted order, and there are lots of places that pass just one pointer for fields. But I agree any further improvement will probably have to come from that.

SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units, plus arm little cores can be configured to share a vector unit with another core.

1 more reply

ChadNauseam10d ago

If you control the layout, eytzinger layout typically will give you the best of both worlds. As fast as a linear scan for small N, much faster than binary search over a sorted array for large N.

echelon10d ago

I know protobuf code is extremely high quality, but I really can't stand the c-style naming conventions.

I know people train themselves into grokking this and reading and emitting this way, but it sounds like writing "bork bork bork bork" runes to me.

I'm glad Rust feels more like Ruby and Python and that method and field names are legible.

My eyes just glaze over:

    UPB_API_INLINE
    const struct upb_MiniTableField* upb_MiniTable_FindFieldByNumber(
        const struct upb_MiniTable* m, uint32_t number) {
      const uint32_t i = number - 1;  // 0 wraps to UINT32_MAX
    
      // Ideal case: index into dense fields
      if (i < m->UPB_PRIVATE(dense_below)) {
        UPB_ASSERT(m->UPB_ONLYBITS(fields)[i].UPB_ONLYBITS(number) == number);
        return &m->UPB_ONLYBITS(fields)[i];
      }
    
      // Early exit if the field number is out of range.
      uint32_t hi = m->UPB_ONLYBITS(field_count);
      uint32_t lo = m->UPB_PRIVATE(dense_below);
      UPB_ASSERT(hi >= lo);
      uint32_t search_len = hi - lo;
      if (search_len == 0 ||
          number > m->UPB_ONLYBITS(fields)[hi - 1].UPB_ONLYBITS(number)) {
        return NULL;
      }
    
      // Slow case: binary search
      const struct upb_MiniTableField* candidate;
    #ifndef NDEBUG
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
      UPB_ASSERT(candidate ==
                 UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number));
    #elif UPB_ARM64_ASM
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
    #else
      candidate = UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number);
    #endif
    
      return candidate->UPB_ONLYBITS(number) == number ? candidate : NULL;
    }

teo_zero10d ago

> I really can't stand the c-style naming conventions.

Honestly I don't see much difference between

  upb_MiniTable_FindFieldByNumber

and

  upb::MiniTable::FindFieldByNumber

1 more reply

charleslmunger10d ago

Yeah namespaces and public/private would be quite nice, but C doesn't have them, so they get hacked on via macros and prefixing. The syntax was not the hard part of working or analyzing this code, though.

ahartmetz10d ago

I think this needs way more "upb" and "UPB" to make it clear that it is, in fact, dealing with UPBs. Whatever these are.

1 more reply

unholiness10d ago

More accurately, binary search is optimal only if you cannot determine distance between the data points (you can compute `<` but not `-`). It's inaccurate to say it's optimal if you know "nothing" about the distribution. If you encounter a point that's much closer to your high pivot vs much closer to your low pivot, there is no possible prior knowledge state uninformed enough to conclude that the best place to search in both cases is in the middle.

hinkley10d ago

I swear I read an article about treaps but instead of being used to balance the tree, they used the weights to Huffman encode the search depth to reduce the average access time for heterogenous fetch frequencies.

I did not bookmark it and about twice a year I go searching for it again. Some say he’s still searching to this day.

ssivarkOP10d ago

Huffman coding assumes your corpus is a string of discrete elements (symbol strings) without any continuous structure (eg. topology/geometry). With that fairly mild assumption, it gives a recipe to reorganize (transform/encode) your data as a prefix-tree, to minimize the bits of information needed to communicate the contents of your corpus i.e. reducing (on average) the bits of information you need to identify a specific item. Eg. To go back to the analogy from my previous comment above... if the function you are inverting via search has long plateaus then you could simply front-load those as guesses; that's roughly the spirit of Huffman coding, except it eschews monotonicity.

mvelbaum10d ago

https://arxiv.org/abs/2206.12110 ?

hinkley7d ago

'22 feels a bit to recent but maybe. This looks like a good spot though. I'll check the bibliography to see if they acknowledge prior art.

crazygringo10d ago

Sure, but the whole point is that you often don't know anything further about the data.

That's why b-trees are the standard in databases. The data could be anything, and its characteristics could massively change at any time, as you suddenly import a whole bunch of new rows at once.

And while you can certainly design algorithms around e.g. gradient descent to try to accelerate lookup, b-trees are already incredibly fast, and have lots of other benefits like predictable worse-case performance and I/O requirements, supporting range scans, ordered traversal, prefix conditions, etc.

So yes, you can certainly design lookup algorithms that are more efficient for particular data distributions, but they will also often lack other important properties. And b-trees are already so fast, improvements are often negligible -- like even if another algorithm produces a closer initial guess, it may be slower to locate the final item, or it may be faster on average but have horrible worst-case performance that makes it unusable.

Even with a paper dictionary, I've always used pretty much a binary search beyond the first initial guess, which only saves you a couple of hops. And actually, once I get to the right handful of pages I'm probably more linear than I should be, and I'd probably be faster if I tried to do a rigorous binary search, but I have to balance that with how long it takes to flip pages.

skybrian10d ago

Databases often use table statistics to try to do better at generating query plans. I wonder if they use them to make indexes faster as well?

10000truths10d ago

The cost plan is a crude approximation of the actual query cost. Sometimes, the query planner makes a terrible guess. Your resident DBA won't appreciate being sometimes paged at 3 AM on a Sunday. A good strategy is to freeze the query plan once you have sufficient sample size of data in the involved tables.

1 more reply

_jackdk_10d ago

Fritz Henglein has done some interesting work on fast sorting/grouping. I think Generic Discrimination Sorting and Partitioning Unshared Data in Linear Time[1] is the main paper. Ed Kmett took those ideas and refined them into the discrimination[2] library for Haskell, and gave a very interesting tech talk about it[3].

[1]: https://dl.acm.org/doi/epdf/10.1145/1411203.1411220

[2]: https://hackage.haskell.org/package/discrimination

[3]: https://www.youtube.com/watch?v=cB8DapKQz-I

rixed10d ago

> it's worth pointing out that binary search (or low-level implementation variants) is the best only if you know nothing about the data beyond the fact that it is sorted / monotonic

Also if you do not learn anything about the data while performing the binary search, no? Like, if you are constantly below the estimate, you could gess that the distribution is biases toward large values and adjust your guess based on this prediction.

ssivarkOP10d ago

> Also [IFF] you do not learn anything about the data while performing the binary search, no?

Yes, absolutely!

I forgot to share this general perspective above, and it's too late to edit, so I'll add it here...

Since binary search assumes only monotonicity; splitting your interval into two equal parts extracts one bit of information per step, and any other choice would extract less information on average. One bit of information per step is how you end up needing log(n) steps to find the answer.

To accelerate your search, you basically need to extract those log(n) bits as fast as you can. You can think of that as leveraging both the prior, and everything you learn along the way -- to adaptively design each step to be the optimal experiment to extract maximum amount of information. And adaptive local models of your search space (gradient / hessian / etc) allow you to extract many more bits of information from each query / experiment, provided the function you are inverting has some local structure.

PS: That is why we leverage these ideas to "search" for the optimum, among a space of solutions.

Nevermark10d ago

For a list of sorted values with no other knowledge, the binary search is optimal. Provably, it is simple information theory on binary information.

You can do better if the list is stable by reusing information.

But gathering that information during searches is going to require great complexity to leverage, as searches are an irregular information gathering scheme.

So create RAM for speedup optimizations up front.

1) Create a table that maps the first 8 bits to upper and lower indexes in the list. Then binary search over the last 8 bits. That reduces the search time in half.

2) Go all the way, and create an array of 32,768 indexes, with all 1's for misses. Either way, search returns O(1).

Stable lists allow for sliding parametric trade offs between RAM-lookup vs. binary search. From full lookup, to full binary.

molf10d ago

It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right. And if we guess incorrectly, in the worst case, the algorithm degrades to a linear scan.

Unless we have prior knowledge. For example: if there is a particular distribution, or if we know we're dealing with integers without any repetition (i.e. each element is strictly greater than the previous one), etc.

kryptiskt10d ago

> It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

You have another piece of information, you don't only know if the element was before or after the compared element. You can also know the delta between what you looked at and what you're looking for. And you also have the delta from the previous item you looked at.

2 more replies

travisjungroth10d ago

> If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right.

This is true for abstract and random data. I don't think it's true for real world data.

For example, python's sort function "knows nothing" about the data you're passing in. But, it does look for some shortcuts and these end up saving time, on average.

IIAOPSW10d ago

>More generally, the best bet to solving a problem more efficiently is always to use more information about the specific problem you want to solve

It is both obvious and profound, the more information you already have, the more information you already have.

painted-now10d ago

> In mathematical language, searching a sorted list is basically inverting a monotonic function, by using a closed-loop control algorithm.

Never thought about it this way. Brilliant!

dnnddidiej10d ago

For humans binary searching a dict is slower because it requires a different physical action vs. scanning and we have advanced OCR and flipping through capabilites. Especially recognizing when something hasnt changed e.g. still on Es keep flipping is maybe 10ms.

theptip10d ago

No the point is you know exactly where T is just by looking at the dictionary (or at least, you learn this if you use a dictionary a lot).

IOW your prior on the data distribution lets you skip the first 4-5 binary chops.

mycall11d ago

Furthermore, with the vast and immediate knowledge that LLMs have, we could see a proliferation of domain-specific sorting algorithms designed for all types of purposes.

tantalor11d ago

> use that extra information to perform MUCH better

Do you mean using a better estimator for the median value? Or something else?

giovannibonetti10d ago

Say, if you know the function is a polynomial of degree N, with N+1 datapoints you can find it – e.g. with Lagrange's polynomial, although the finite precision of computer numbers might make that more complex.

locknitpicker11d ago

> If you have priors about the data distribution, then it's possible to design algorithms which use that extra information to perform MUCH better.

You don't even need priors. See interpolation search, where knowing the position and value of two elements in a sorted list already allows the search to make an educated guess about where the element it's searching for is by estimating the likely place it would be by interpolating the elements.

rv64imafdc11d ago

> knowing the position and value of two elements in a sorted list

That's a prior about the distribution, if a relatively weak one (in some sense, at least).

esafak10d ago

https://en.wikipedia.org/wiki/Empirical_Bayes_method

darknoon10d ago

This relies on knowledge of the distribution, just querying in the middle of A = [1, 2, 4, 8, 16, ..., 2^(n-1)] is slower than binary search

locknitpicker10d ago

> just querying in the middle

It's an interpolation search. You interpolate the values you evaluated by whatever method you'd like. No one forces you to do linear interpolation. You can very easily fit a quadratic polynomial with the last 3 points, for example.

Interpolation search seems to have a convergence rate of log log n. That's pretty efficient.

j / k navigate · click thread line to collapse

0 comments

charleslmunger10d ago

I've spent some brainpower on binary search and have not been able to beat this:

https://github.com/protocolbuffers/protobuf/blob/44025909eb7...

1. Check for dense list O(1) 2. Check upper bound 3. Constant trip count binary search

ben-schaaf10d ago

charleslmunger10d ago

1 more reply

ChadNauseam10d ago

If you control the layout, eytzinger layout typically will give you the best of both worlds. As fast as a linear scan for small N, much faster than binary search over a sorted array for large N.

echelon10d ago

I know protobuf code is extremely high quality, but I really can't stand the c-style naming conventions.

I know people train themselves into grokking this and reading and emitting this way, but it sounds like writing "bork bork bork bork" runes to me.

I'm glad Rust feels more like Ruby and Python and that method and field names are legible.

My eyes just glaze over:

    UPB_API_INLINE
    const struct upb_MiniTableField* upb_MiniTable_FindFieldByNumber(
        const struct upb_MiniTable* m, uint32_t number) {
      const uint32_t i = number - 1;  // 0 wraps to UINT32_MAX
    
      // Ideal case: index into dense fields
      if (i < m->UPB_PRIVATE(dense_below)) {
        UPB_ASSERT(m->UPB_ONLYBITS(fields)[i].UPB_ONLYBITS(number) == number);
        return &m->UPB_ONLYBITS(fields)[i];
      }
    
      // Early exit if the field number is out of range.
      uint32_t hi = m->UPB_ONLYBITS(field_count);
      uint32_t lo = m->UPB_PRIVATE(dense_below);
      UPB_ASSERT(hi >= lo);
      uint32_t search_len = hi - lo;
      if (search_len == 0 ||
          number > m->UPB_ONLYBITS(fields)[hi - 1].UPB_ONLYBITS(number)) {
        return NULL;
      }
    
      // Slow case: binary search
      const struct upb_MiniTableField* candidate;
    #ifndef NDEBUG
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
      UPB_ASSERT(candidate ==
                 UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number));
    #elif UPB_ARM64_ASM
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
    #else
      candidate = UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number);
    #endif
    
      return candidate->UPB_ONLYBITS(number) == number ? candidate : NULL;
    }

teo_zero10d ago

> I really can't stand the c-style naming conventions.

Honestly I don't see much difference between

  upb_MiniTable_FindFieldByNumber

and

  upb::MiniTable::FindFieldByNumber

1 more reply

charleslmunger10d ago

ahartmetz10d ago

I think this needs way more "upb" and "UPB" to make it clear that it is, in fact, dealing with UPBs. Whatever these are.

1 more reply

unholiness10d ago

hinkley10d ago

I did not bookmark it and about twice a year I go searching for it again. Some say he’s still searching to this day.

ssivarkOP10d ago

mvelbaum10d ago

https://arxiv.org/abs/2206.12110 ?

hinkley7d ago

'22 feels a bit to recent but maybe. This looks like a good spot though. I'll check the bibliography to see if they acknowledge prior art.

crazygringo10d ago

Sure, but the whole point is that you often don't know anything further about the data.

That's why b-trees are the standard in databases. The data could be anything, and its characteristics could massively change at any time, as you suddenly import a whole bunch of new rows at once.

skybrian10d ago

Databases often use table statistics to try to do better at generating query plans. I wonder if they use them to make indexes faster as well?

10000truths10d ago

1 more reply

_jackdk_10d ago

[1]: https://dl.acm.org/doi/epdf/10.1145/1411203.1411220

[2]: https://hackage.haskell.org/package/discrimination

[3]: https://www.youtube.com/watch?v=cB8DapKQz-I

rixed10d ago

> it's worth pointing out that binary search (or low-level implementation variants) is the best only if you know nothing about the data beyond the fact that it is sorted / monotonic

ssivarkOP10d ago

> Also [IFF] you do not learn anything about the data while performing the binary search, no?

Yes, absolutely!

I forgot to share this general perspective above, and it's too late to edit, so I'll add it here...

PS: That is why we leverage these ideas to "search" for the optimum, among a space of solutions.

Nevermark10d ago

For a list of sorted values with no other knowledge, the binary search is optimal. Provably, it is simple information theory on binary information.

You can do better if the list is stable by reusing information.

But gathering that information during searches is going to require great complexity to leverage, as searches are an irregular information gathering scheme.

So create RAM for speedup optimizations up front.

1) Create a table that maps the first 8 bits to upper and lower indexes in the list. Then binary search over the last 8 bits. That reduces the search time in half.

2) Go all the way, and create an array of 32,768 indexes, with all 1's for misses. Either way, search returns O(1).

Stable lists allow for sliding parametric trade offs between RAM-lookup vs. binary search. From full lookup, to full binary.

molf10d ago

It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

kryptiskt10d ago

> It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

2 more replies

travisjungroth10d ago

> If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right.

This is true for abstract and random data. I don't think it's true for real world data.

For example, python's sort function "knows nothing" about the data you're passing in. But, it does look for some shortcuts and these end up saving time, on average.

IIAOPSW10d ago

>More generally, the best bet to solving a problem more efficiently is always to use more information about the specific problem you want to solve

It is both obvious and profound, the more information you already have, the more information you already have.

painted-now10d ago

> In mathematical language, searching a sorted list is basically inverting a monotonic function, by using a closed-loop control algorithm.

Never thought about it this way. Brilliant!

dnnddidiej10d ago

theptip10d ago

No the point is you know exactly where T is just by looking at the dictionary (or at least, you learn this if you use a dictionary a lot).

IOW your prior on the data distribution lets you skip the first 4-5 binary chops.

mycall11d ago

Furthermore, with the vast and immediate knowledge that LLMs have, we could see a proliferation of domain-specific sorting algorithms designed for all types of purposes.

tantalor11d ago

> use that extra information to perform MUCH better

Do you mean using a better estimator for the median value? Or something else?

giovannibonetti10d ago

locknitpicker11d ago

> If you have priors about the data distribution, then it's possible to design algorithms which use that extra information to perform MUCH better.

rv64imafdc11d ago

> knowing the position and value of two elements in a sorted list

That's a prior about the distribution, if a relatively weak one (in some sense, at least).

esafak10d ago

https://en.wikipedia.org/wiki/Empirical_Bayes_method

darknoon10d ago

This relies on knowledge of the distribution, just querying in the middle of A = [1, 2, 4, 8, 16, ..., 2^(n-1)] is slower than binary search

locknitpicker10d ago

> just querying in the middle

Interpolation search seems to have a convergence rate of log log n. That's pretty efficient.

j / k navigate · click thread line to collapse