If you add the additional constraint of the signal being "bandpass-limited" where, X(w) = 0 for A > abs(w) > B for some A, B, then yes, you can under sample.
And that's where the information-theory idea comes in where the amount of information contained in the band only "needs" 2X sampling rate to reconstruct perfectly.
You can think of aliasing being somewhat orthogonal to that in the sense that you need 2X bandwidth so you don't corrupt the signal, but 2X max frequency so you don't alias anything else into the signal. (I say this realizing that aliasing is what would cause the former signal corruption, hence "somewhat")