If you’re at all interested in LLMs/LLM-apps, you’ve probably heard of RAG: Retrieval-Assisted Generation, i.e. retrieving relevant documents to give to your LLM as context to answer user queries.
Today, I’m releasing RAGatouille v0.0.1, whose aim is to make it as easy as can be to improve your RAG pipelines by leveraging state-of-the-art Information Retrieval research.
As of right now, there’s quite a big gap between common everyday practice and the IR literature, and a lot of the gap is because there just aren’t good ways to quickly try out and leverage SotA IR techniques. RAGatouille aims to contribute to that problem! We do have a bit of a roadmap to support more IR papers, like UDAPDR [1], but for now, we focus on integrating ColBERT[2]/ColBERTv2[3], super strong retrieval methods, who are particularly good at generalising to new data (i.e.e your dataset!)
RAGatouille can train&fine-tune ColBERt models, index documents and search those indexes, in just a few lines of code. We also include an example in the repo on how to use GPT-4 to create fine-tuning data when you don’t have any annotated user queries, which works really well in practice.
Feel free to also check out the thread and discussion on Twitter/X[4] if you're interested!
I hope some of you find this useful, and please feel free to reach out and report any bugs, this is essentially a beta release and any feedback would be much appreciated.
[1] https://arxiv.org/abs/2303.00807 [2] https://arxiv.org/abs/2004.12832 [3] https://arxiv.org/abs/2112.01488 [4] https://twitter.com/bclavie/status/1742950315278672040