In a single threaded version, it beats C#, though not by as much as I would have expected. The essence is that I have to run the same calculation on a large array of doubles that spits out another array of doubles, so I parallelize it with SIMD and threads. In C# I max out at AVX-2 for instruction level parallelism, but for Rust, I use AVX-512, and it's not even 2x faster, though it is faster--it should be more than 2x faster because AVX-512 has better instructions to work with. But when I combine this with doing the calculation in threaded parallel chunks on the array, it goes far slower than it should.