Of course it can be pure, so long as the shared data is not mutated. Functional languages such as Haskell don't copy values all over the place when calling functions, they pass in pointers.
But that's an optimisation. Which can be switched off to get the benefits from parallelism. This is a tradeoff that programmers would traditionally have to consider, but with functional programming the compiler can figure it out for us.