Maybe I have this wrong but looking at academic torrents, the entire pushshift archive from 2005 to 2022 is 2TB compressed. Are they really operating at a massive scale that you are describing? Pushshift is supposed to have ingested all of Reddit's comment data no?
(Of course this is assuming pushsift has gotten the majority of comments)