For a lot of such algorithms there are also pretty cheap ways to remove jitter from the measurements (in case you're operating on top of a general purpose OS or something) -- you expect for many algorithms that in the absence of external factors the runtime is convex monotonic as a multivariate function of the input sizes and that external confounders don't speed things up. Under that assumption you can measure the runtime for many different input sizes and de-jitter each data point by pushing down that convexity constraint, even if for each input size you only gather one single noisy data point.
Where possible, I always liked arrays of function pointers for each next power of 2 in the input sizes. A few popcounts and a jump later, and you're executing the right routine. It's not great if you're throwing a general-purpose tool against tiny problems, but it's not a huge amount of overhead either, and it's dead simple to code.