Yes, that benchmark result is quite old in Duck years! :-)
We actually run that benchmark as a part of our test suite now, so I am certain that there is improvement from that version.
The biggest DuckDB I've used so far was about 400 GB on a machine with ~250 GB of RAM.
There is ongoing work that we are treating as a high priority for handling larger-than-memory intermediate results within a query. But we can handle larger than RAM in many cases already - we sometimes run into issues today if you are joining 2 larger than RAM tables together (depending on the join), or if you are aggregating a larger than RAM table with really high cardinality in one of the columns you are grouping on.
Would you be open to testing out your use case and letting us know how it goes? We always appreciate more test cases!