Re grid representation - The convolution operator is translation equivariant. (waving hands, it means that the translation operated on the Input Signal is still detectable in the output features set)
However, it was shown many times that coupling the convolution operator with a pooling layer achieves translation invariance by means of dimensionality reduction.
Moreover, rotational equivariance (and subsequently invariance) is an active area of research. There's an interesting talk (https://www.youtube.com/watch?v=-UKL3kOlOds&list=PLlMMtlgw6q...) by Boomsma/Frellsen about the use of spherical convolutions in deep learning applications of molecular structures.