Hi, I'm building an RAG system with OpenAI embeddings and the ChatGPT api.
I chunked all the documents into 400-800 character chunks, vectorized them all, and put them in a vector database.
The results are pretty bad--the surfaced document chunks kind-of-but-not-really match up with the query.
I'm getting much better results from simple keyword searches (using meilisearch).
Am I doing something wrong? Do I need to use a fine-tuned model like BERT? Is this technology vastly overhyped?