It may be a good idea to use a framework with explicitly stateless "tasks" and an orchestrator (parallel, distributed, or both). This is what Spark, Tensorflow, Beam and others do. Those will have a "parallel for" as well, but now in addition to threads you can use remote computers as well with a configuration change.