[Recursion with PostgreSQL, Follup 1, Perfomances](https://github.com/vb-consulting/blog/discussions/4)
I managed to get some really good perfomances (757K records in 5 seconds).
Also, method 1 contained a nasty bug :(
That's why next time I'll try to focus on debugging and testing.
Just some possibilities for optimization of the recursive CTE. (I haven't tested it and don't know if it'd automatically improve things.)
WITH RECURSIVE _recursive_cte AS MATERIALIZED ( … )
https://www.postgresql.org/docs/current/queries-with.html#QU...Basically makes the temp table from your other implementation for you.
Then you could use built-in cycle detection to be able to simply the syntax while also allowing UNION ALL instead of UNION for some extra speed.
CYCLE id SET is_cycle USING path
https://www.postgresql.org/docs/current/queries-with.html#QU...> As far as I know, SQL was never even intended to do that in the first place.
49 years ago, certainly not. Postgres still can't beat a dedicated graph database for non-trivial hierarchies, but a lot of work has already been done in the last five years to get it a lot closer to that milestone.
It takes a strange situation for Postgres' recursion features to be a good idea. All 3 solutions would be red flags to me if I saw them in a real codebase - and accompanied by of a large number of other unmaintainable SQL functions. This is the fast way to turn a small for loop into a week of Senior Dev billable hours. Someday I'll see an exception to that rule of thumb, so far no luck.
I used to think so too, then I found a use case. TBF, there exists an alternative that doesn't use recursion, but you wouldn't like that either, because it'd require a stored procedure / SQL UDF.
Requirement: Split a large sequential key-space (of a SERIAL PK) into the smallest number of non-overlapping, exhaustive key ranges where no range contains more than N keys. Extracting a range is very fast, so communicating with the app layer for coordination between two successive ranges would introduce too much latency overhead. But the DB also has a small timeout (order of minutes) on how long any statement may execute.
Breaking it down: we need iteration to start extracting the next key range after we've extracted one key range. But this iterative process cannot synchronise with app layer at the end/beginning of each iteration.
Solution: Synchronise with app layer / transaction manager every M iterations. If M is too large for statement timeout, reduce M and try again.
This can be done by either procedural code or by iteration via recursion in a single pass over the data. There exist alternative methods that can do this, but require multiple passes. In fact, the first attempted solution used one of those alternatives: bucket based partitioning.
But runtime performance objectives demanded the lowest latency, and the redundancy of extra passes could not be optimised away by PG 12. The single pass iteration-via-recursion method was twice as fast, and the recursion skeleton code was simple enough — equivalent to a recursive function with one tail call and an accumulator — so it was shipped.
but wouldn't an abstraction (I think like 'connect by') that computes the fixed point or transitive closure be really useful in many of these cases and involve less cognitive overhead (even if its less general)
Their rolled out linear English-like syntax makes stuff like this so awkward, and this article is evidence of that, including the reach for the imperative style with the procedure and loops and so on.
I'd love for someone to rewrite the examples in here in Datalog (or RAI's "Rel") just to show how much clearer and brief they become.
[Recursion with PostgreSQL, Follup 1, Perfomances](https://github.com/vb-consulting/blog/discussions/4)
I managed to get some really good perfomances (757K records in 5 seconds).
Also, method 1 contained a nasty bug :(
That's why next time I'll try to focus on debugging and testing.
No please. Recursive CTEs are really easy to understand, and they are tail-recursive loops, so they are just loops, and they are easy to understand as such if loops is what you understand. Writing a loop in procedural SQL is not better, just a) wordier, b) not easily amenable to relational algebra. (b) is a big deal.