How do you rule out that the races are a problem in the code vs a problem in the tests? If we have tests that are sometimes red, we strive hard to remedy them. We've found issues before where we thought it was the tests that were wrong when it was actually the code.
You can't easily, you have to debug them. Debugging intermittent test failures is hard, and you have no guarantee you're actually improving much tangible things despite the time invested in them. It just sucks.