undefined | Better HN

0 pointspbronez3y ago0 comments

It would be super fun to implement an intelligent head() function that shows a representative sample rather than the first X rows. Do the profiling & identify a collection of rows that represent the overall distribution.

You could develop some IP around efficient and effective ways to do this. Probably would require an ensemble of unsupervised methods.

0 comments

aarondia3y ago

That's a cool idea! One helpful .head() function could include the most unique data typed data. It could help you identify which columns have mixed dtypes: mostly numbers, and some cells that are supposed to be numbers but are actually strings because of additional decimals.

j / k navigate · click thread line to collapse

0 pointspbronez3y ago0 comments

You could develop some IP around efficient and effective ways to do this. Probably would require an ensemble of unsupervised methods.

0 comments

aarondia3y ago

j / k navigate · click thread line to collapse