The crawler itself is aware of 470 million URLs.
I've actually had it up to 50 million before, but that was a lot noisier data with fewer keywords per document. The current 60 million is significantly "bigger" than the old 50 million. Index size is not actually a great metric for how comprehensive a search engine is. A small index with good signal-to-noise ratio is much more useful than a large one where 95% is chaff.
100 million is my current goal. I think that's about what's doable on my current hardware. It also gets increasingly unwieldy to deal with the data. I've already got processes that require several days non-stop computation.