Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data

234 pointsmacklinkachorn1y ago116 comments

Hey HN — We're Jacky and Mac from Trellis (https://runtrellis.com/). We’re building AI-powered ETL for unstructured data. Trellis transforms phone calls, PDFs, and chats into structured SQL format based on any schema you define in natural language. This helps data and ops teams automate manual data entry and run SQL queries on messy data.

There’s a demo video at https://www.youtube.com/watch?v=ib3mRh2tnSo and a sandbox to try out (no sign-in required!) at https://demo.runtrellis.com/. An interesting historical archive of unstructured data we thought it would be interesting to run Trellis on top of are old Enron emails which famously took months to review. We’ve created a showcase demo here: https://demo.runtrellis.com/showcase/enron-email-analysis, with some documentation here: https://docs.runtrellis.com/docs/example-email-analytics.

Why we built this: At the Stanford AI lab where we met, we collaborated with many F500 data teams (including Amazon, Meta, and Standard Chartered), and repeatedly saw the same problem: 80% of enterprise data is unstructured, and traditional platforms can’t handle it. For example, a major commercial bank I work with couldn’t improve credit risk models because critical data was stuck in PDFs and emails.

We realized that our research from the AI lab could be turned into a solution with an abstraction layer that works as well for financial underwriting as it does for analysis of call center transcripts: an AI-powered ETL that takes in any unstructured data source and turns it into a schematically correct table.

Some interesting technical challenges we had to tackle along the way: (1) Supporting complex documents out of the box: We use LLM-based map-reduce to handle long documents and vision models for table and layout extraction. (2) Model Routing: We select the best model for each transformation to optimize cost and speed. For instance, in data extraction tasks, we could leverage simpler fine-tuned models that are specialized in returning structured JSONs of financial tables. (3) Data Validation and Schema Guarantees: We ensure accuracy with reference links and anomaly detection.

After launching Trellis, we’ve seen diverse use cases, especially in legacy industries where PDFs are treated as APIs. For example, financial services companies need to process complex documents like bonds and credit ratings into a structured format, and need to speed up underwriting and enable pass-through loan processing. Customer support and back-office operations need to accelerate onboarding by mapping documents across different schema and ERP systems, and ensure support agents follow SOPs (security questions, compliance disclosures, etc.). And many companies today want data preprocessing in ETL pipelines and data ingestion for RAG.

We’d love your feedback! Try it out at https://demo.runtrellis.com/. To save and track your large data transformations, you can visit our dashboard and create an account at https://dashboard.runtrellis.com/. If you’re interested in integrating with our APIs, our quick start docs are here: https://docs.runtrellis.com/docs/getting-started. If you have any specific use cases in mind, we’d be happy to do a custom integration and onboarding—anything for HN. :)

Excited to hear about your experience wrangling with unstructured data in the past, workflows you want to automate, and what data integration you would like to see.

Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data

234 pointsmacklinkachorn1y ago116 comments

Excited to hear about your experience wrangling with unstructured data in the past, workflows you want to automate, and what data integration you would like to see.

116 comments

105 comments · 42 top-level

makk1y ago· 6 in thread

> a major commercial bank I work with couldn’t improve credit risk models because critical data was stuck in PDFs and emails.

Great use case! Worked on exactly this a decade ago. It was Hard™ then. Could only make so much progress. Getting this right is a huge value unlock. Congrats!

harryf1y ago

Make sure you have an on-premise option for this type of customer. I've worked at two software companies in Europe with tangentially similar products related to document analysis. On premise is a key requirement.

Even though it's 2024, banks, financial institutions like insurance companies etc. tend to be _very_ cautious with valuable documents involving customers. There are also regional regulations that prevent things like patient data being shared with _any_ 3rd parties. Even one of the big 4 oil companies that I've dealt with as prospective customer - very strict rules requiring on premise solutions.

The good news is many are using things like Kubernetes and OpenShift internally, so it should be possible to port what you do on AWS to on-premise.

throw031720191y ago

On-premise will be a lot more difficult than just launching a few pods in Kubernetes. These AI tools (LLMs / vision models) will require some high powered gpus as well.

intelVISA1y ago

On-prem is theater if the OS isn't libre.

ace322291y ago

I have just been working through the same problem (though just PDFs). Google DocAI helped enormously after a bit of initial input.

intelVISA1y ago

Who is liable when the ML model hallucinates™ while parsing some critical data?

Better still if it can then become a source of truth for further departures from reality.