story
I think in the end I just started taking the data from the end of the file, but if you're going with subsets, it's probably better to use a pseudo-randomly selected subset rather than a sequential subset. It doesn't have to be a different pseudo-random subset for each file, but I imagine there's an ideal noise profile in the sampling (maybe white noise is best).