Then finally, the kicker is that Julia, especially on v0.7, doesn't optimize functions separately: it optimizes them together. It will inline small functions into others, perform interprocedural optimizations and utilize compilation-time constants, etc. Thus when the code is Julia all the way down, it can and will compile everything together to optimize it a lot more than functions compiled separately, giving a lot more performance benefits. When you add in the macros to turn off things like bounds checks and adding in explicit SIMD, you truly get to C-level of performance and many times beyond because your code is so architecturally and vertically optimized (it's like you put on the flags to say "compile code that only works for this current machine with this current codebase", and it can safely make this assumption because it's JITing).
Because of this, it goes much further than Cython, and this also makes the type system and multiple dispatch central to the language. So I would say at a surface level it's "fast Python" (or "more productive C", that's how I usually think of it). However, at a deeper level the type system is so central that larger software architectures will be different to accommodate this multiple dispatch style as opposed to OOP.