Very cool to see a team use Jepsen for super early pre-release testing of the system.
I wonder if you wish you had waited for the runtime to be a bit more stable, or you feel this was already well worth the effort, even with some of the identified failures being in "known incomplete" areas? (I could see either side of the argument - waiting longer might give you more valuable failures, but testing early gives you a chance to catch problems before they become baked into the foundation and become more difficult to fix...)
Another tool that feels like sci-fi to me any time I hear a mention of it, is Antithesis [1] - written by the people who built FoundationDB. Could be another interesting integration to investigate in the future to help bulletproof the language runtime?