I think that is the point Peter Cordes is trying to make (quite politely but firmly) over there on stackoverflow. It's not only about autovectorization. The main point there is the loop-carried dependency that prevents both the compiler (autovec)
and the processor (ILP) to do their thing.
Loop-carried dependency is the big culprit here. I wish we had a culture of writing for loops with the index a constant inside the loop, as in the Ada for statement, and not the clever C while loops or for with two running variables... Simpler loop syntax makes so many static analyses 'easier' and kind of forces the brain to think in bounded independant steps, or to reach for higher level constructs (e.g. reduce).