Performance-oriented library with no benchmarking instructions, fun. I get 850ms to shuffle 32-bit integers up to 1e8 with this library versus 400ms in BQN (•rand.Deal•_timed 1e8). However, BQN also has a large advantage at smaller sizes, such as 425us versus 120us for 1e5 elements, so a lot of this may be the random number generator. I haven't figured out how to get PCG working yet. BQN uses wyrand which is faster but I now know has quality issues (can't even generate every possible output; I need to update the page I linked...).
It's substantially the same algorithm so any differences would just be down to implementation details. Other than multi-threading which BQN doesn't do. The usage is also a little different as BQN generates shuffled integers directly; generating the integers is 100ms of the 850ms for rip_shuffle but I'm not sure whether it makes sense to subtract that or not.