Really interesting, going to have a look at that paper -- thanks for sharing your insight
Let me ask this: if you have something like a GraphQL API, which often does sparse column selection from multiple different tables, would that also be a good fit for columnar database?
In most GraphQL queries, you're grabbing a portion of the fields from one or more tables, IE something like:
query JoesCompletedTodos {
users(where: { name: { _eq: "Joe" } }) {
id
name
todos(where: { completed: { _eq: true } }) {
text
}
}
}
Where this will get translated to something along the lines of
SELECT users.id, users.name, json_arrayagg(todos) FROM users
INNER JOIN todos ON todos.user_id = users.id
WHERE users.name = 'Joe' AND todos.completed = true
GROUP BY users.id, users.name
Many times you'll see queries spanning 3-4 relation levels deep, plucking something like 2-6 columns from each table.
Curious how well a columnar DB would do with something like this?
Also, on this point:
> "Most columnstores don't use compression schemes that are incremental. To grab a single row the columnstore likely decompresses many adjacent rows (could be millions of rows - depends on the particular columnstore)."
I think this is something that could be avoided if the data were in IE, Arrow, and you used Arrow Flight/FlightSQL as the transport mechanism, right? But this isn't my area of expertise, for sure.