The code is in the Benchmarks test in test/Benchmark-test.jl.
It's not surprising the first benchmark is faster in Maple, since Maple can make use of a true sparse representation, possibly quite a bit of vectorisation on the processor you have and possibly multiple cores. The benchmark here uses only a dense representation, a single core and there is no explicit vectorisation (possibly none at all).
Pihanha for example will do both of those first examples in a fraction of the time that Nemo will. But again, it uses sparse representation and again can use multiple cores.
We'll do a sparse representation in Nemo later on, perhaps even wrap Pirhana.
The main purpose of the benchmark is actually to show off what the really fast Julia generics do for us, not to actually do this particular benchmark as fast as is humanly possible. In order to do that as fairly as possible, we deliberately use univariate polynomials over other univariate polynomials in all systems (except Pari, as noted), rather than dedicated multivariate polynomial rings.