Edit: I have tried making an iterator-based version to elide bound checks, but had to resort to unsafe, and it's barely 50% faster than the original rust version (not as fast as C): https://gist.github.com/anisse/6b580628206293ef242faa7db6219...
Edit 2: updated, and my rust iterator version now ~equivalent to C with no unsafe.
Edit 3: too late, the repo has been updated with an other iterator-based version that is just as fast.