But I think this does show how, for specific data/queries, sometimes you do have to denormalize, so that you can create the ideal compound index, for specific problem queries. Should still go with normalized schemas and joins as a default, but if problem queries pop up like this that are taking 10, 20, 30 seconds sometimes (when caches are cold), compromising a bit on clean schemas/code for performance makes sense.
I also created a benchmark here, for Postgres: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...