Assuming they'll need to index 820 billion pages (the number of pages preserved in the internet archive), at 100kb each, and assuming they use a database with 0.3x text data compression efficiency, they'll need at least 24600 TB to store those text data. Assuming $300 per 16TB disk, then they'll need to spend at least $7,380,000 for disk alone. This is a lot of money just for storage and we haven't included stuff like replication and backup, indexing metadata overhead, etc.