Running the detector on just a few nodes sounds like a great way to offset the performance penalty a bit. The docs on the race detector say that "memory usage may increase by 5-10x and execution time by 2-20x" which could be quite significant.
I also wonder about the effectiveness of randomly fuzzing your app with the race detector on as a form of testing.