consider the task of computing a gradient at a point p0 = <x1, x2, x3, ..., xN> in an N-dimensional space.
the naive approach was for a long time: compute the value of the score at p0, then for each coordinate compute the score for the same point but shifted a delta in the direction of that coordinate, i.e. the i-th component is computed as:
component_i = (score( <x1, x2, ..., xi+delta, ..., xN> ) - score (p0))/delta
thats N+1 evaluations or trials of the score to compute the final gradient <component_1, ..., component_N>
notice how reminescent this is of natural selection: the average of the last generation p0 is used to generate ~N trials, which then result in the average of the next generation shifting somewhat.
compare reverse mode automatic differentiation to calculate a gradient: one forward pass of the computation with one backward pass...
I am not complaining about the complexity of the fitness or scoring function, I am complaining about trial and error approaches, when we have discovered a rocket for differentiation!