We spent a lot of time building ML infrastructure and realized that most data warehouses and data pipelines are not designed for unstructured data (documents, PDFs, calls).
While something like a Vector database and RAG are great at search tasks, they really struggle with aggregation and SQL type queries such as
1. How many emails in the past 6 months contain complaints about the product? 2. What are the top 3 features from feature request tickets?
Check out the sandbox (no sign-in required) at https://demo.runtrellis.com/
Some interesting results from analyzing Enron email can be found at https://demo.runtrellis.com/showcase/enron-email-analysis
You can also run the transformation on a larger amount of data
by signing up here https://dashboard.runtrellis.com/
We would love to hear your feedback and the different use cases that you come up with.