[speaking from experience writing both in memory and disk based data analysis packages]
Disk based work is pretty straight forward. It's not rocket science, and makes your code much, much faster when working in memory in most circumstances.
Of course, this is about how the system could evolve - the possibility might not help a simple user now.
Memory mapped won't help when you have a 100gb+ file, and as you say it gets slow as it's definitely not optimal.
You also need custom indexing structures and data caching strategies for most algorithms that aren't easily moved to disk. And most aren't unfortunately. The other issue is that you end up doing a lot of research, because there just aren't many people who have done this. It's a time sucker.
I must say it was awesome seeing our decision tree system running on huge dataset sizes (tested > 100gb) in similar time (~30 seconds) to an in memory database after indexing.