I'd be very surprised if well written Julia was in the middle speed pack here. I suspect there. are some known performance anti-patterns used if it's slower than Luajit.
It's most probably due to the startup and JIT penalty, but it was on par with Racket which is quite fast. I'm sure it can be optimized, but the implementation is quite similar to the other languages in order to have a level playing field. It's basically two nested for loops (in julia and racket is a single one with two indexes) adding valus to a set and another one doing set lookup and pushing into an array.
Ah, you're benchmarking wall time including startup and compilation?
If you're interested, I'd be happy to take look anyways and see if there are any easy, idiomatic performance changes that can made for the Julia code without changing the algorithm.