They're NNs because you learn the representation using RNNs. Everything afterwards is trivial since you're in a hilbert space. But getting the representations is the hard part.
word2vec does not use RNNs, the network is trained on a simple classification task "neighborhood" -> "word". Each word in the corpus is an independent example, there's no sequential dependence.
Or you could use a pre-trained list like the ones from Google [1]. If not you probably solved an open problem in the area and publishing it would help us not to lose time trying to solve it again.