You're not going to be able to simply upload a PDF and search for text using the raw file data. It's not readable. You're going to have to either use a tool to extract embedded text, or perform OCR on the document if it's image-only. A really good tool, that I have used before, is called Aspose. If you are allowing users to upload these PDFs, you'd also need some sort of distributed task queue, because performing the PDF file operations is not something you want the user to have to wait on. I've used RabbitMQ for this, and haven't had many issues. Once you have OCR'd the document and extracted the text, then you can store the text as well as the native document in a database like MongoDB. You would maybe even benefit from using a full-text search engine, like ElasticSearch.