So, the overhead I'm referring to isn't so much stuff like function dispatch (which is almost immeasurable with the heavy lifting happening in C), but idiomatic overhead. Creating an anonymous function to map across a lazy sequence wrapping a persistent data structure doesn't have a chance in hell against a native for loop on a native vim list. I actually did quite a bit of optimization in this area (that's where chunked seqs came from), and it's quite usable for many tasks, but it's still potentially bottlenecking so I never really found myself "trusting" it for anything significant.
Of course, I am in a rather unique position of being able to bang out well optimized VimL in my sleep, so paradoxically that biases me against my own creation.