The difficulty is that the performance of queries on the final graph is dependent on it's shape. As lorenzhs said, you want the shortcuts to be as long as possible.
The final shape of the graph is highly dependent on the order you contract the nodes in - small changes in contraction order have large effects on the final shape.
One of the very expensive parts of the pre-processing step is determining the best order to perform contraction. Sure, you could just iterate over all nodes, contracting as you go (and parallelize), but you'd end up with a contracted graph that's not a whole lot better for queries than the original. Order matters.
The original CH paper covers lots of the details:
http://algo2.iti.kit.edu/documents/routeplanning/geisberger_...
There is a general group of approaches that do what you're describing - partition the graph recursively, and produce optimized overlays in various forms. This can be done in parallel, and recursively:
http://www.dis.uniroma1.it/challenge9/papers/holzer.pdf
Query performance is generally not quite as fast as a well-optimizied CH graph, but the overlays can be generated much faster and that work can be highly parallelized. We hope one day to get a chance to implement this approach in OSRM.
The difficult problem with the second approach is partitioning the graph well :-)