They both boiled down to a sendfile call, effectively. The differences came to runtime weight, and parallelism strategy/implementation. It turns out that not having to pay for object headers/stack allocation by default, helps a lot more than I anticipated. I did this to actually measure what the difference was.