I just discovered that MSVC++ can automatically run each iteration of a loop in parallel in separate threads. Add "#pragma loop(hint_parallel(8))" and "#pragma loop(ivdep)" before the loop, and compile with the /Qpar option. This simple change sped up my cryoablation simulation code by 4x.