They did this back in their trading firm days, and...
Imagine that you have a sequence of numbers. You want to randomly select a window of, say, 1024 consecutive numbers, a sequence, as input to your model. Now, say, you have n items in this sequence, you want to sample n/c (c is a constant and << 1024) sequences in total. How to do fixed shuffle?
The key is, we have overlap in data we want to read. If we brute force fixed shuffle and expand, we need to save 1024/c times more than original data.
This isn't useful for LLMs, but hey, wonder how it started?