why? The human would just talk to the AI agent. Why would they need to scroll through that many files?
I made a similar system with 232k files (1 file might be a slack message, gitlab comment, etc). it does a decent job at answering questions with only keyword search, but I think i can have better results with RAG+BM25.