Generally, you're on the hook for the parallelization primitive (Director/worker/reaper/aggregator), and the test data generator/stager.
The actual runner itself is generally pretty dumb, and is just there to take it's chunk of the request load, and dump it's results.
Then you've got the results visualizer that consumes and aggregates everything.
People look at Performance testing/engineering as it's own specialty because you really have to take in everything as a whole. You can totally make a living out of just doing that.