The surprisingly difficult problem of user-defined order in SQL (opens in new tab)

(begriffs.com)

136 pointsccmcarey5y ago93 comments

93 comments

73 comments · 19 top-level

teddyh5y ago· 9 in thread

Why not (assuming that every entry has a unique ID) add a “next_id” field, and treat it like a linked list?

I implemented this. Some notes:

1) Make front-end calculate the next-id for each element after a sort, and call the back-end to update only required records.

2) next_id is a unique key, so if data was stale, worst case, a transaction error occurs, inform user that sort failed and undo the optimistic update, but this happens if multiple users are sorting the same list like crazy. generally, it just works. can be problematic if users are doing mass updates on a giant list. in such a case, instead of allowing free-sort, adding a user-adjustable "priority" column which allows equals would make much more sense.

3) deletes and inserts are more costly, because now they also require an extra read and update, but we never have the case of updating the whole list, and updating a record through a unique indexed key is an insignificant cost (in most cases - noted as otherwise someone would surely nerd-snipe me with an uncommon case, hehe).

4) for whatever reason, you want sorted results from the back-end and not do sorting on the front-end, if keys are not sorted, means recursive CTE, which isn't the end of the world but could be slower and means additional complexity.

5) you can change next_id to prev_id and spare the updates for the common case of inserting at the bottom (you still need the read, and a retry mechanism on transaction fail though)

rileymat25y ago

That was my thought, but the query is not simple to get the list in order. Plus inserts also require an update.

zzzeek5y ago

If your application only needs to deal with the list as a whole, then this doesn't matter, you load it in and build the linked list in memory using a hash lookup. Or as someone else mentioned recursive CTEs can do it too but for something small I'd just load it in and be done with it. This article is really overthinking things for the typical real world case.

2 more replies

johnnyRose5y ago

I used this approach as part of a personal learning project. It ended up working very well because I was using Mongo, so I could just use $graphLookup with an index on the next ID.

It was definitely fast enough for my purposes, sorting about 30k items in 3 different linked lists in just a second or two. The function allows you to limit your depth or modify your starting location in the linked list (graph) which opens a lot of possibilities and/or performance improvements when only loading the first N records.

coding1235y ago

How do you get the list back in order if there are thousands of entries?

oconnor6635y ago

It seems like you're in one of two situations: (1) you want the entire list, or (2) you want a small "page" of entries in the list, starting from a known point. For (1), you can fetch all of the entries in table order, whatever that may be, and then figure out the list order after you have the entries. For (2), you make the database do seeks for you in a loop, but it's not such a big deal because it's a small number of them.

I guess the worst case is where you want to iterate over a large dataset in list order, but either you can't fit it in memory, or you don't want it all to go over the wire. In that case...yeah I don't know...what's the linked list equivalent of a B-tree? :)

2 more replies

seedless-sensat5y ago

Retrieving the list will involve a lot of random seeks. A database is not that great at it versus scans.

spiffytech5y ago

Fortunately modern SQL dialects that support recursive CTEs make the syntax for doing this in a single query approachable . I haven't found good data on how performant recursive CTEs are at scale, but it's surely better than doing the loop with round trips to the database inside your application code.

1 more reply

slaymaker19075y ago

I agree, that seems like a better approach, though there isn't really a way to do the sorting in plain SQL with that method.

_3u105y ago· 8 in thread

There are two problems here:

1. The position of an element in an array is not a property of the element but of the array.

2. Don't use sorted sets when you need the properties of an array / linked list.

You can store the todo list as an array of todos (use a user defined datatype), you can create another table called todo_list that contains an array of references to the todos. You can also create a linked list. Note that you'll have to use modern SQL to retrieve the linked list 'efficiently'.

The problem only becomes difficult when we dislike the obvious solutions because it violates some property of 'elegance' which apparently means restricting yourself to the 1980s versions of SQL implementations.

If the language we were writing this in was called JS and the code has the order of the array as a property of the object and we were continually sorting it every time we accessed the array instead of just reordering the array we'd recognize this as the anti-pattern it is. Similarly if for some reason using JS functions was considered 'inelegant', or for some reason using only JS features of 1996 instead of 2020.

Modern SQL has CTEs, functions, and arrays. They are very elegant when solving the problem of arrays in SQL.

earthboundkid5y ago

Is there a way in PG to keep the foreign key constraints on an array of ids, or do you just have to give them up for practical reasons?

saltcured5y ago

A foreign key constraint in postgres requires that the entire column value in the foreign key match a corresponding column value of a key in the referenced domain table. Unless there have been quite recent extensions to the DDL in postgres, there is no foreign key constraint syntax to express a REFERENCES constraint over parts of a structured column. This is not supported for even the simple case of constraining one field in a UDT/composite type, much less the harder problem of constraining a variable cardinality set of elements in an array, or keys/values within a json/jsonb document.

You can try to emulate it with triggers, and start to appreciate the subtle complexities of the problem. For example, would we want a new class of element-wise actions to act analogous to ON DELETE/UPDATE CASCADE/SET NULL? Rather than pruning referenced row or referencing column value, you'd want to mutate just an element within the structured type, right? How would you expose these many choices if trying to design a new constraint syntax with reusable machinery?

tda5y ago

You could also use the array to store only the order, and regular FK relationships for membership. That way the referential integrity of the items is guaranteed, but the order is more of a "best effort" approach, as in that there my be members that have no defined order (default to putting them last or first), or there may be items in the order array that have been deleted (and can then be ignored)

_3u105y ago

As far as I know the FKs dont work with arrays but you can emulate them via triggers.

mythrwy5y ago

This is a much more practical answer. Point #1 seems exactly right.

Even without modern SQL arrays one could even keep references to place, per user, in a simple comma separated text field and munge it in code. Elegant? Probably not. But perhaps more practical then some of the approaches presented.

But modern SQL does have arrays.

tda5y ago

Surprised to see the obvious solution at the bottom of the comments

jjgreen5y ago

This is interesting, when you say arrays, do you mean an array column-type? More details (or links) would be appeciated.

_3u105y ago

Array column type, use unnest to create a table that can be right? joined to the un-nested array. https://www.postgresql.org/docs/13/functions-array.html

spiffytech5y ago· 7 in thread

I've been reviewing this problem lately for a project, and I've settled on a solution that feels elegant and versatile which this article doesn't cover: lexicographical sort.

I'm using Mudder.js to do it, but the algorithm is straightforward. Define an alphabet (e.g., A-Z). Every time an item is inserted/repositioned, its sort order is given the midpoint between the sort values of its two neighbors. So the first item is given the sort value M (midpoint between A and Z). Inserting before that item gets G (between A and M). Inserting between these two items gets J (between G and M).

Once you run out of letters between two sort values, the algorithm tacks a digit onto the sort value of the item ahead of it. So inserting between M and N yields MM. If you do this enough times in a pathological pattern and wind up with some long string of characters you can reflow your list and rebalance everything out evenly (though that's strictly an optimization for storage space/bandwidth, and not a requirement for the algorithm to function).

This all sorts perfectly with ORDER BY etc., supports an number of repositions bounded only by your storage space, and doesn't require arbitrary-precision decimal datatypes or fraction handling.

saurik5y ago

Isn't this just a less efficient version of the "arbitrary precision" variant of approach 2 from the article?

spiffytech5y ago

It applies the same philosophy but doesn't require special data types. Not all data stores support decimal numbers, and every programming language I've used uses floating point numbers my default. JS doesn't seem to even have a native decimal implementation at all.

Strings work everywhere, and are first-class data types it most systems/languages. They take more bits per digit than numbers (though you can choose a wider alphabet to mitigate that, such as ASCII 33 '!' through 126 '~'), but I'm happy to trade the storage away for using a first-class data types.

2 more replies

jameshart5y ago

What makes you say it's less efficient? It's exactly analogous to storing increasing decimal values between 0 and 1, just using base 26 digits (encoded A-Z) rather than base 10 or base 2.

Start with .M (encoded as "M"); add in .F, .T to insert before or after. Once you find yourself trying to insert between .M and .N, do it by adding .MM.

But in a database char types can be arbitrarily long, and database indexes are well suited to indexing them. That gives them a lot of advantages for this kind of usecase.

1 more reply

avmich5y ago

String used for indexing can be further used for sorting not only a flat list, but whole trees of nodes. Similar technique was used before (at least in 1997) to answer queries about the order at a particular time in the past.

spiffytech5y ago

Can you share any keywords I can research for your technique? I've looked into string-based tree sorting before, but the only answers I can come up with involve recursive queries, or encoding the entire tree path into each item, which means updating many records if an item with many descendants gets repositioned.

1 more reply

infogulch5y ago

Could you also use something like varbinary / bit varying to do this same process in binary? It would avoid the waste of limiting to a particular alphabet and might make the logic even simpler.

dunham5y ago

There are CRDT algorithms that are similar to this. Some pick from the middle, some from one end or the other and some randomize the choice. It's the solution I'd probably go with for this specific problem (maybe with a bigger alphabet, although some DBMSs are case-insensitive).

dmurray5y ago· 5 in thread

There's a no-free-lunch theorem here that's being ignored.

If you have a space of 64 bits to represent elements of some ordered field, and you insert items one at a time, there is some ordering of the items that allows you to make only 64 inserts.

Claiming that numbering the items with integers only allows you 16 insertions, but "It would take virtually forever to run out of space through repeated list reordering" using rational numbers just can't be correct. The author cherry-picked a bad implementation for the integers and a favourable insertion order for his pet approach.

Where exactly did he go wrong? By picking the first number in the sequence of integers to be 65536, one of the smallest of the unsigned 64-bit integers, and then deciding all the inserts would happen in descending order. If he had picked a number like 2^63 instead, he would have got at least 63 inserts no matter what the insertion order. Using this integer strategy also lends itself to easy reindexing if you do need to: the row of rank N gets reindexed to 2^64 / N. Or if you want to increase the size of the index field to 128 bits, bitshift everything left.

Note, just because we have this NFL theorem doesn't mean one approach can't outperform another in typical use patterns, but I don't see evidence for that here. The most logical integer-index strategy isn't considered and the insertion orders that exhaust the rational strategy are not particularly pathological.

dan-robertson5y ago

I don’t understand what the statement of this theorem is. I can’t come up with a simple argument in my head for why there can never be enough information, but I’m not really sure how to think about it and I didn’t try very hard.

barrkel5y ago

Think about bisection of ordering state space for every insert.

If 64 bits represent an ordering, and you split it in two for each insert - necessarily, to permit future insertion on either side of the new insert - then at least one side has no more than 63 bits left to store its ordering information for all future inserts on its side. You can't avoid this even with tricks that represent fractional bits.

The problem is that in order to avoid renumbering (which redistributes ordering state), the bisection needs to pre-allocate state space for future ordering information. The best it can do (without having more information about future insert order) is to split the state space perfectly evenly.

In the article, there's a big blue Update box which mentions a sequence of L/R path flips which make the fraction into a Fibonacci sequence, where you can go to 46 in sequence before exceeding a 32-bit integer. You'll observe than 46 is less than 64.

1 more reply

Someone5y ago

I wouldn’t have picked 65,536, too, but there’s a trade-off between the increment and the number of items you can append without running into problems.

If he had picked 2^63, it wouldn’t be possible to create an initial list of even 2 items.

Depending on the use case, I probably would have picked 2^54 or so (allows for appending about a thousand items)

If your lists are small it doesn’t matter much what you do, but for large lists, I don’t think there’s a clean solution for this, if one keeps the key size constant.

dmurray5y ago

If you have a known large number N of items to insert up front, and future inserts will be random, you can do the reindexing to 2^64 / N in advance. (Perhaps there's a further optimization in rounding N up to the next power of 2).

N = 2^48 is what gives you his strategy of numbering the initial entries in increments of 65536.

1 more reply

raverbashing5y ago

Pretty much. Also, how long do you expect the list to be? 10 items? 100 items? If the user is ordering them manually, then if it's 100 items there are other issues to consider rather than just the order

Keeping a space between the items and reordering/"defraging" periodically might be easier.

exabrial5y ago· 5 in thread

Seems like the easiest way would be to emulate a linked list. Have a pointer to the next surrogate id.

names_are_hard5y ago

The linked list solution is easiest to update, not easiest to retrieve. So it depends what scenario you're optimizing for.

jerf5y ago

For pretty much any list that a human is ordering, it is the optimal solution. Humans do not manually re-order lists with more than a couple hundred elements in them in general, and I'd submit by a 1000, if you're supporting extensive manual reordering you shouldn't be. (For example, the JIRA board we use nominally orders any of thousands or 10,000s of bug in the backlog in total order, but the orders are honestly meaningless past about 50 or so, because we humans simply do not have that detailed of a total order in mind! So it mat initially look like an example, but it really isn't... it's just an accident of implementation.)

A DB schema with an ID for the whole collection and a linked list is basically optimal here. Various obvious extension present themselves at that point, such as multiple orders by moving the linked list to an external table or adding reverse links. You don't need a complicated CTE, in 2021 you just query the whole list by the collection ID and assemble in memory.

By the time this solution doesn't scale, you've almost certainly got other problems where your data structure no longer matches the true nature of the data. Humans do not generally have total orders on 10,000s of elements. In those rare cases where for some reason you do, which I have encountered 0 times in ~25 years, by all means use some other solution. But don't expect it to come up often.

andreareina5y ago

Is there a way to sort on that without a recursive query?

lisper5y ago

Yes: retrieve all the records and assemble the list in memory.

RHSeeger5y ago

My understanding is that it's possible, but not fast. A quick google search turned up this page, but I'm sure there are more https://stackoverflow.com/questions/515749/how-do-i-sort-a-l...

I seem to recall seeing _something_ more elegant than a linked list for this type of thing, but I don't recall what is what, offhand.

1 more reply

swagasaurus-rex5y ago· 5 in thread

I find similar issues with z-index in CSS.

Early in a project, people stick to some heuristic, like using small z-index values like 0, 1, 2, 3... etc. But this has drawbacks because you can't add new z-index items in between without also shifting everything with a higher z-index

Using z-index values with a gap is a better bet. 0, 10, 20, 30, etc. Of course, you are limited to only a few items in between the original. You've also lost an expectation of ordering in the values.

Larger gaps make more sense, as long as you don't exceed the size of signed integers ±2147483647.

The author of the post mentions how floating point has a similar problem in the amount of floating point precision you can wring out before you end up with rounding errors.

The true crux of the problem is one of items related to one another, in other words it's a graph problem. I want these items to show up underneath these other items.

recursive5y ago

If I need to fit something between z-index 2 and 3, I'd just put it at 2, and subsequent to all the other z-index 2 elements.

rubatuga5y ago

Then you have projects like bootstrap using z vals of 3000 etc.

names_are_hard5y ago

Then someone gets fed up because they just want their item to be on top, dammit, so they can finish their jira task and move on. So they just type 9999 because that ought to beat all those other pesky z-indexes...

kflzufkrbzi5y ago

Wrap whatever they provide with position: relative, z-index: 1 and their 3000 index will not go over your 2 index.

earthboundkid5y ago

The IAB has standard z-indexes in the millions: https://cravencode.com/post/essentials/iab-z-index-guideline...

tzs5y ago· 4 in thread

There was a discussion of this before on HN almost three years ago [1], with 134 comments.

This seems to have exposed a bug in HN. Clicking on the "past" link for the present submission does not turn up that past submission, although the links are identical.

However, clicking on the "past" link for that three year old submission does turn up the present submission.

[1] https://news.ycombinator.com/item?id=16635440

Arnavion5y ago

>This seems to have exposed a bug in HN. Clicking on the "past" link for the present submission does not turn up that past submission, although the links are identical.

>However, clicking on the "past" link for that three year old submission does turn up the present submission.

The "past" feature is implemented by a third-party site, not HN.

Also, you can see the URL of the page and highlighted text on the page that it's searching by the title text as well as the link. The previous submission's title did not have all the words of the current one, hence does not show in the search results.

jaredsohn5y ago

Shows for https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

The current 'past' link isn't searching for the url at all. I think if it looked at the url instead of the title then it would miss when there are articles from other sites on the same topic so there is a bit of a tradeoff. If it looked at both, Algolia treats it as an 'and' so it would also not return the older submission. An 'or' would probably work well for this use case but not sure if it is supported.

spullara5y ago

Might have to call Jepsen.

earleybird5y ago

Maybe

ramraj075y ago· 3 in thread

I actually solved this exact problem recently (who is NOT trying to recreate Roam Research amirite?) And surprised no one else suggested the solution I came up with: just use a text column? When a new list is created the items get incrementing chars as text. If a user reorders an item, the items new sort order value is the value of the item above it concatenated to an "a". This is infinitely extensible, it's only slightly a drag on storage, and very trivial to clean up with some periodic code if storage is actually getting to e a problem. Works like a charm and Dont need to modify postgres itself for this!

yorwba5y ago

> the items new sort order value is the value of the item above it concatenated to an "a"

Say you insert three items. a: 1, b: 2, c: 3.

Then you reorder 3 after 1. The new order is a: 1, aa: 3, b:2.

Then you reorder 2 after 1. The new order is a: 1, aa: 2, aa: 3.

But since there are now two values with the same sort key, it might as well be a: 1, aa: 3, aa: 2.

Did I misunderstand your solution or is that a bug?

ramraj075y ago

I ran into this problem and instead of solving it the proper way I took the lazy route by adding a unique constraint on the column and when it threw an errror in the insert I just added one more extra letter!

1 more reply

lukeramsden5y ago

Here is a StackOverflow Q&A about something just like this: https://stackoverflow.com/questions/38923376/return-a-new-st...

A similar system is used in Jira called LexoRank

smolder5y ago· 3 in thread

Any of these methods can be combined with a process that re-indexes the order column once things get too pathological, or even does it on a schedule.

ivansavz5y ago

Yeah float sort order field + some "DB maintenance" script seems like it would solve this.

To be fair though, the scenario of inserting 38 items in a row in the same location (the fail mode of Approach 2) does sound like something that could happen in one user session, so might need another check of some sort in addition to the weekly check.

Come to think of it, 38 inserts in the same location might be a good test case to add for any sortable data model.

recursive5y ago

Just make sure there's no competing edits going on at the time that might cause a race.

tehjoker5y ago

That's what a transaction is for.

1 more reply

putzdown5y ago· 2 in thread

A linked list approach would also be worth considering. Give each item a permanent, unique id. Make each item remember the id of the next and/or previous item in the list. Reordering touches just two or three items, and there is no problem with running out of space or needing to re-index anything, ever. Of course, basic ordered query is murder from an efficiency standpoint, and the approach might be firmly rejected on that basis. But if we’re looking at approaches with a variety of tradeoffs, the linked list is worth mentioning.

willj5y ago

Is there a way to do this in SQL? I don’t think there’s a way to deal with low-level data structures like that within SQL, especially since SQL, as a declarative language, is meant to hide these details. I think what makes this article interesting is that it is about how to do this in SQL.

_pastel5y ago

Sure. Add columns "previous" and "next"; both can also be foreign keys into this table.

Querying is a bit less straightforward, though; you need a recursive query to traverse the pointers. (Or traverse in application logic, at the cost of a bunch of unnecessary round-trips.)

1 more reply

asddubs5y ago· 2 in thread

IMO the first approach is the best, unless there's a very high number of items (which usually in these scenarios, there's not, even a couple thousand wouldn't be catastrophically bad to update if it's not a common operation). they all kind of suck but everything else seems worse.

spiffytech5y ago

A basic ordered list of integers can cause a lot more problems than I would have expected at first glance.

Updating dozens/hundreds/thousands of records at once is obviously unnecessarily heavy to process. But depending on your storage layer's locking model, it could cause high lock contention if concurrent actors are updating the same list and fight over large fractions of the dataset with every update.

If you're working in a networked scenario, naïve implementations can turn into transmitting a lot of data across the wire for every single reposition.

It's also algorithmically challenging in concurrent scenarios. If I update my local state rearrange items, and you update your local state to rearrange items in an overlapping range, when they have to sync up this quickly turns into the developer having to learn to implement CRDTs or Operational Transforms. You could implement optimistic or pessimistic locking, but you have to lock such a large fraction of the dataset that it's easy to limit throughput.

Distributed/eventual-consistency problems are particularly challenging with ordering solutions of this variety, but the difference between one operation updating a wide swath of values vs just the single updated value can make a difference in how primitive a reconciliation algorithm the develop can bring to bear.

asddubs5y ago

that's true, I forgot that when I last faced this problem it was specifically a situation where there would not be concurrent updates

tester7565y ago· 1 in thread

yolo hack - this way we only update 1 row, but when obtaining items, then we have to join/load entry from this table too

ItemsOrderConfiguration Table

Id | Json (varchar(MAX) or proper JSON type in Relational DBs)

1 | [

     { 
     
      "ItemId": 1, 
      
      "Order": 2
      
     },
     
     { 
     
      "ItemId": 2, 
      
      "Order": 1
      
     },
     
    ]

2 | [

     { 
     
      "ItemId": 15, 
      
      "Order": 1
      
     },
     
     { 
     
      "ItemId": 32, 
      
      "Order": 2
      
     },
     
    ]

the bad thing about this is that you lose constrains e.g that ItemId actually exists, but it shouldn't be a problem.

earthboundkid5y ago

Postgres has an array type, so you can just make an array of IDs.

Animats5y ago

This is the same problem seen in algorithms for concurrent remote document editing, and the author seems to have re-invented some of the same solutions. See [1]

This belongs to the class of problems involving trying to represent a tree in a relational database. There are many solutions, all with some problem.

[1] https://news.ycombinator.com/item?id=24617542

dizzystar5y ago

I would suggest picking up Bill Karwin's book on SQL Antipatterns. He goes through a few ideas.

I found this slide show from him. It covers the same ideas:

https://www.slideshare.net/billkarwin/models-for-hierarchica...

conistonwater5y ago

> However with our choice of 2^16 blanks between each item, we can support no more than sixteen consecutive insertions between the first and next item. After reaching this limit we would have revert to the previous approach of shifting items forward.

I do not see why this is necessary, you could also reindex the positions because only the relative order of integers matters. How hard can it be to recreate a bunch of integers once every 16 insertions in the same place in the list?

geekpowa5y ago

Rationals are to decimals as arbitrary precision decimals are to binary floats.

Drill the junior devs: use decimals if you want accuracy.

What is unsaid with this: if you want accuracy with monetary representation.

Lots of problems out there where decimals are simply inappropriate.

One day, rational as a core type in postgres would be a nice addition. Rationals should belong as a core type in most every platform where general purpose compute can happen.

coding1235y ago

Every time I do this I get frustrated thinking about the fractions getting smaller and smaller. I don't think I ever solved it perfectly but I was always doing this with tiny pet projects where the order didn't matter much. Next time I'm in this situation and using postgres, I'll try this.

throwaway1892625y ago

You can emulate a tree easily by just putting new items half way between adjacent ones. If two items are next to each other have a fallback that regenerates the "tree" by re spacing all the items. Re generating the spacing is expensive but rare if you pick a big enough range.

renatovico5y ago

For me the final solution is the Left, right and depth approach Ex https://github.com/collectiveidea/awesome_nested_set

j / k navigate · click thread line to collapse

93 comments

73 comments · 19 top-level

teddyh5y ago· 9 in thread

Why not (assuming that every entry has a unique ID) add a “next_id” field, and treat it like a linked list?

egeozcan5y ago

I implemented this. Some notes:

1) Make front-end calculate the next-id for each element after a sort, and call the back-end to update only required records.

5) you can change next_id to prev_id and spare the updates for the common case of inserting at the bottom (you still need the read, and a retry mechanism on transaction fail though)

rileymat25y ago

That was my thought, but the query is not simple to get the list in order. Plus inserts also require an update.

zzzeek5y ago

2 more replies

johnnyRose5y ago

I used this approach as part of a personal learning project. It ended up working very well because I was using Mongo, so I could just use $graphLookup with an index on the next ID.

coding1235y ago

How do you get the list back in order if there are thousands of entries?

oconnor6635y ago

2 more replies

seedless-sensat5y ago

Retrieving the list will involve a lot of random seeks. A database is not that great at it versus scans.

spiffytech5y ago

1 more reply

slaymaker19075y ago

I agree, that seems like a better approach, though there isn't really a way to do the sorting in plain SQL with that method.

_3u105y ago· 8 in thread

There are two problems here:

1. The position of an element in an array is not a property of the element but of the array.

2. Don't use sorted sets when you need the properties of an array / linked list.

Modern SQL has CTEs, functions, and arrays. They are very elegant when solving the problem of arrays in SQL.

earthboundkid5y ago

Is there a way in PG to keep the foreign key constraints on an array of ids, or do you just have to give them up for practical reasons?

saltcured5y ago

tda5y ago

_3u105y ago

As far as I know the FKs dont work with arrays but you can emulate them via triggers.

mythrwy5y ago

This is a much more practical answer. Point #1 seems exactly right.

But modern SQL does have arrays.

tda5y ago

Surprised to see the obvious solution at the bottom of the comments

jjgreen5y ago

This is interesting, when you say arrays, do you mean an array column-type? More details (or links) would be appeciated.

_3u105y ago

Array column type, use unnest to create a table that can be right? joined to the un-nested array. https://www.postgresql.org/docs/13/functions-array.html

spiffytech5y ago· 7 in thread

I've been reviewing this problem lately for a project, and I've settled on a solution that feels elegant and versatile which this article doesn't cover: lexicographical sort.

This all sorts perfectly with ORDER BY etc., supports an number of repositions bounded only by your storage space, and doesn't require arbitrary-precision decimal datatypes or fraction handling.

saurik5y ago

Isn't this just a less efficient version of the "arbitrary precision" variant of approach 2 from the article?

spiffytech5y ago

2 more replies

jameshart5y ago

What makes you say it's less efficient? It's exactly analogous to storing increasing decimal values between 0 and 1, just using base 26 digits (encoded A-Z) rather than base 10 or base 2.

Start with .M (encoded as "M"); add in .F, .T to insert before or after. Once you find yourself trying to insert between .M and .N, do it by adding .MM.

But in a database char types can be arbitrarily long, and database indexes are well suited to indexing them. That gives them a lot of advantages for this kind of usecase.

1 more reply

avmich5y ago

spiffytech5y ago

1 more reply

infogulch5y ago

Could you also use something like varbinary / bit varying to do this same process in binary? It would avoid the waste of limiting to a particular alphabet and might make the logic even simpler.

dunham5y ago

dmurray5y ago· 5 in thread

There's a no-free-lunch theorem here that's being ignored.

If you have a space of 64 bits to represent elements of some ordered field, and you insert items one at a time, there is some ordering of the items that allows you to make only 64 inserts.

dan-robertson5y ago

barrkel5y ago

Think about bisection of ordering state space for every insert.

1 more reply

Someone5y ago

I wouldn’t have picked 65,536, too, but there’s a trade-off between the increment and the number of items you can append without running into problems.

If he had picked 2^63, it wouldn’t be possible to create an initial list of even 2 items.

Depending on the use case, I probably would have picked 2^54 or so (allows for appending about a thousand items)

If your lists are small it doesn’t matter much what you do, but for large lists, I don’t think there’s a clean solution for this, if one keeps the key size constant.

dmurray5y ago

N = 2^48 is what gives you his strategy of numbering the initial entries in increments of 65536.

1 more reply

raverbashing5y ago

Keeping a space between the items and reordering/"defraging" periodically might be easier.

exabrial5y ago· 5 in thread

Seems like the easiest way would be to emulate a linked list. Have a pointer to the next surrogate id.

names_are_hard5y ago

The linked list solution is easiest to update, not easiest to retrieve. So it depends what scenario you're optimizing for.

jerf5y ago

andreareina5y ago

Is there a way to sort on that without a recursive query?

lisper5y ago

Yes: retrieve all the records and assemble the list in memory.

RHSeeger5y ago

My understanding is that it's possible, but not fast. A quick google search turned up this page, but I'm sure there are more https://stackoverflow.com/questions/515749/how-do-i-sort-a-l...

I seem to recall seeing _something_ more elegant than a linked list for this type of thing, but I don't recall what is what, offhand.

1 more reply

swagasaurus-rex5y ago· 5 in thread

I find similar issues with z-index in CSS.

Using z-index values with a gap is a better bet. 0, 10, 20, 30, etc. Of course, you are limited to only a few items in between the original. You've also lost an expectation of ordering in the values.

Larger gaps make more sense, as long as you don't exceed the size of signed integers ±2147483647.

The author of the post mentions how floating point has a similar problem in the amount of floating point precision you can wring out before you end up with rounding errors.

The true crux of the problem is one of items related to one another, in other words it's a graph problem. I want these items to show up underneath these other items.

recursive5y ago

If I need to fit something between z-index 2 and 3, I'd just put it at 2, and subsequent to all the other z-index 2 elements.

rubatuga5y ago

Then you have projects like bootstrap using z vals of 3000 etc.

names_are_hard5y ago

kflzufkrbzi5y ago

Wrap whatever they provide with position: relative, z-index: 1 and their 3000 index will not go over your 2 index.

earthboundkid5y ago

The IAB has standard z-indexes in the millions: https://cravencode.com/post/essentials/iab-z-index-guideline...

tzs5y ago· 4 in thread

There was a discussion of this before on HN almost three years ago [1], with 134 comments.

This seems to have exposed a bug in HN. Clicking on the "past" link for the present submission does not turn up that past submission, although the links are identical.

However, clicking on the "past" link for that three year old submission does turn up the present submission.

[1] https://news.ycombinator.com/item?id=16635440

Arnavion5y ago

>This seems to have exposed a bug in HN. Clicking on the "past" link for the present submission does not turn up that past submission, although the links are identical.

>However, clicking on the "past" link for that three year old submission does turn up the present submission.

The "past" feature is implemented by a third-party site, not HN.

jaredsohn5y ago

Shows for https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

spullara5y ago

Might have to call Jepsen.

earleybird5y ago

Maybe

ramraj075y ago· 3 in thread

yorwba5y ago

> the items new sort order value is the value of the item above it concatenated to an "a"

Say you insert three items. a: 1, b: 2, c: 3.

Then you reorder 3 after 1. The new order is a: 1, aa: 3, b:2.

Then you reorder 2 after 1. The new order is a: 1, aa: 2, aa: 3.

But since there are now two values with the same sort key, it might as well be a: 1, aa: 3, aa: 2.

Did I misunderstand your solution or is that a bug?

ramraj075y ago

1 more reply

lukeramsden5y ago

Here is a StackOverflow Q&A about something just like this: https://stackoverflow.com/questions/38923376/return-a-new-st...

A similar system is used in Jira called LexoRank

smolder5y ago· 3 in thread

Any of these methods can be combined with a process that re-indexes the order column once things get too pathological, or even does it on a schedule.

ivansavz5y ago

Yeah float sort order field + some "DB maintenance" script seems like it would solve this.

Come to think of it, 38 inserts in the same location might be a good test case to add for any sortable data model.

recursive5y ago

Just make sure there's no competing edits going on at the time that might cause a race.

tehjoker5y ago

That's what a transaction is for.

1 more reply

putzdown5y ago· 2 in thread

willj5y ago

_pastel5y ago

Sure. Add columns "previous" and "next"; both can also be foreign keys into this table.

Querying is a bit less straightforward, though; you need a recursive query to traverse the pointers. (Or traverse in application logic, at the cost of a bunch of unnecessary round-trips.)

1 more reply

asddubs5y ago· 2 in thread

spiffytech5y ago

A basic ordered list of integers can cause a lot more problems than I would have expected at first glance.

If you're working in a networked scenario, naïve implementations can turn into transmitting a lot of data across the wire for every single reposition.

asddubs5y ago

that's true, I forgot that when I last faced this problem it was specifically a situation where there would not be concurrent updates

tester7565y ago· 1 in thread

yolo hack - this way we only update 1 row, but when obtaining items, then we have to join/load entry from this table too

ItemsOrderConfiguration Table

Id | Json (varchar(MAX) or proper JSON type in Relational DBs)

1 | [

     { 
     
      "ItemId": 1, 
      
      "Order": 2
      
     },
     
     { 
     
      "ItemId": 2, 
      
      "Order": 1
      
     },
     
    ]

2 | [

     { 
     
      "ItemId": 15, 
      
      "Order": 1
      
     },
     
     { 
     
      "ItemId": 32, 
      
      "Order": 2
      
     },
     
    ]

the bad thing about this is that you lose constrains e.g that ItemId actually exists, but it shouldn't be a problem.

earthboundkid5y ago

Postgres has an array type, so you can just make an array of IDs.

Animats5y ago

This is the same problem seen in algorithms for concurrent remote document editing, and the author seems to have re-invented some of the same solutions. See [1]

This belongs to the class of problems involving trying to represent a tree in a relational database. There are many solutions, all with some problem.

[1] https://news.ycombinator.com/item?id=24617542

dizzystar5y ago

I would suggest picking up Bill Karwin's book on SQL Antipatterns. He goes through a few ideas.

I found this slide show from him. It covers the same ideas:

https://www.slideshare.net/billkarwin/models-for-hierarchica...

conistonwater5y ago

geekpowa5y ago

Rationals are to decimals as arbitrary precision decimals are to binary floats.

Drill the junior devs: use decimals if you want accuracy.

What is unsaid with this: if you want accuracy with monetary representation.

Lots of problems out there where decimals are simply inappropriate.

One day, rational as a core type in postgres would be a nice addition. Rationals should belong as a core type in most every platform where general purpose compute can happen.

coding1235y ago

throwaway1892625y ago

renatovico5y ago

For me the final solution is the Left, right and depth approach Ex https://github.com/collectiveidea/awesome_nested_set

j / k navigate · click thread line to collapse