If you compile your own lame ls implementation and read in say a 1 meg buffer vs 1024 bytes at a time, you can print out directories with over 4 million entries in about a second versus 20ish hours with 1024 byte buffers (yeah I timed it).
I'm not saying ext4 doesn't have issues, just that its not as simple as it seems on the surface.