They're NNs because you learn the representation using RNNs. Everything afterwards is trivial since you're in a hilbert space. But getting the representations is the hard part.
Or you could use a pre-trained list like the ones from Google [1]. If not you probably solved an open problem in the area and publishing it would help us not to lose time trying to solve it again.
word2vec does not use RNNs, the network is trained on a simple classification task "neighborhood" -> "word". Each word in the corpus is an independent example, there's no sequential dependence.