It would be super fun to implement an intelligent head() function that shows a representative sample rather than the first X rows. Do the profiling & identify a collection of rows that represent the overall distribution.
You could develop some IP around efficient and effective ways to do this. Probably would require an ensemble of unsupervised methods.