Most of the genetic programming results code generated by my algos doesn't compile. Very occasionally the random conditions exist to allow it to jump over a "local maxima" and come up with a useful candidate source code. Sometimes the candidates compile, run, and produce correct results.
The time it takes to run varies vastly with parameters (like population, how the mutation function works, how the fitness function weights/scores, etc).
Personally I really like that these DeepMind announcements don't get lost in performance comparisons, because inevitably those would get bogged down in complaints like "the other thing wasn't tuned as well as this one was". Let 3rd party researchers who have access to both do that work, independently.