Show HN: Trellis, ETL specifically for unstructured data (opens in new tab)

(demo.runtrellis.com)

3 pointsmacklinkachorn2y ago5 comments

Hey HN — We're excited to share Trellis -- Run SQL like queries directly on top of your unstructured data. We've built an AI engine that turns unstructured data into structured SQL-format based on the schema you define in natural language.

We spent a lot of time building ML infrastructure and realized that most data warehouses and data pipelines are not designed for unstructured data (documents, PDFs, calls).

While something like a Vector database and RAG are great at search tasks, they really struggle with aggregation and SQL type queries such as

1. How many emails in the past 6 months contain complaints about the product? 2. What are the top 3 features from feature request tickets?

Check out the sandbox (no sign-in required) at https://demo.runtrellis.com/

Some interesting results from analyzing Enron email can be found at https://demo.runtrellis.com/showcase/enron-email-analysis

You can also run the transformation on a larger amount of data

by signing up here https://dashboard.runtrellis.com/

We would love to hear your feedback and the different use cases that you come up with.

Show HN: Trellis, ETL specifically for unstructured data

(demo.runtrellis.com)

3 pointsmacklinkachorn2y ago5 comments

We spent a lot of time building ML infrastructure and realized that most data warehouses and data pipelines are not designed for unstructured data (documents, PDFs, calls).

While something like a Vector database and RAG are great at search tasks, they really struggle with aggregation and SQL type queries such as

1. How many emails in the past 6 months contain complaints about the product? 2. What are the top 3 features from feature request tickets?

Check out the sandbox (no sign-in required) at https://demo.runtrellis.com/

Some interesting results from analyzing Enron email can be found at https://demo.runtrellis.com/showcase/enron-email-analysis

You can also run the transformation on a larger amount of data

by signing up here https://dashboard.runtrellis.com/

We would love to hear your feedback and the different use cases that you come up with.

5 comments

5 comments · 3 top-level

noashavit2y ago· 1 in thread

Congrats on the launch!

Doesn't Airbyte handle semi-structured and unstructured data? How are you different?

macklinkachornOP2y ago

Airbyte mostly focuses on data integration. We built an end to end pipeline for unstructured data from extraction with vision model, transformation and validating the output schema.

skeptrune2y ago· 1 in thread

Wow, I have worked with the Enron corpus before and thought data extraction to this level of granularity was near impossible.

It's some truly gross data.

How long does it take to do something like ingest 1M emails with your system?

macklinkachornOP2y ago

Thank you! We distributed the workload across many cluster so it should take roughly the same with 1M emails.

doubloon2y ago

"Something went wrong - Sorry, something went wrong while completing your transformation. Please try again or contact us for assistance."

j / k navigate · click thread line to collapse