The problem is exactly that 8-9TB range because running spark on just two or three machines will be slower than on a laptop with an extra external drive. You need to scale up into potentially dozens of machines just to get the same performance you were getting on a laptop. You were ok with a laptop, add more data and now you have a not insignificant AWS bill, unless you are ok puttering around on a few machines much more slowly than on the laptop.
There is no middle ground solution, so everyone starts with a overkill solution that scales out of fear of getting stuck on one machine when the dataset grows. But most of these systems never grow enough to need to scale this way. So we are wasting resources running toy clusters on problems that would fit on a laptop.
Maybe I am becoming a cranky old man who yells at clouds, but I miss MPI. It had no frills but it runs with next to no overhead and scales up to super computers with no donut hole in between.