Postgres vs. Pinecone (opens in new tab)

(lantern.dev)

119 pointsdiqi1y ago17 comments

17 comments

Vector dbs are quickly becoming a commodity. This was not true when pinecone was founded and received its first few rounds of funding.

At this point, Postgres has clearly caught up and the VCs are going to do everything it takes to hold on.

klysm1y ago

Vectors seem like kinda the easiest thing to store in a lot of ways

beoberha1y ago

The more advanced index algorithms don’t fit nicely into typical database indexes like B-Trees. That’s definitely why there was a lag between dbs like Pinecone and Postgres. However, that moat isn’t significant and being quickly filled in

colordrops1y ago

It was obvious from the beginning that vector databases were commodity. I heard about pgvector before pinecone.

beoberha1y ago

there was a long time that pgvector only had basic similarity algorithms and not HNSW but pinecone did. That plus being “fully managed” made it a super compelling product. Nowadays, much less so, but there is still a market if it is more “enterprise ready” than others like Lantern

ldjkfkdsjnv1y ago

Pinecone just put devops around facebooks FAISS library

cellis1y ago

Wow this should be getting a lot more love. I wonder if you could do a breakdown vs Qdrant?

Rapzid1y ago

I wonder how much index creation degrades if the storage were 4th gen NVMe drives vs other other, more typical storage technologies.

pwmtr1y ago

Author of Ubicloud's managed Postgres service is here. I'm not sure if you refer to SATA SSDs or typical cloud database setups when you said "other, more typical storage technologies". I'll share my perspective on both.

If you compare NVMe SSDs and SATA SSDs, NVMe SSDs are order of magnitude faster. Maximum theoretical limit of SATA III bus is ~6Gbit/s. This number is 32Gbit/s for Gen 3 NVMe, 64Gbit/s for Gen 4 NVMe and 128Gbit/s for Gen5 NVMe.

For typical database setups offered by cloud providers, the situation is different though. Most of the time, network attached storage devices are used in those setups such as EBS on AWS or Premium SSDs on Azure. These setups suffer a lot due to additional network hop. They are also subject to throughput limits (which can be increased in some cases by paying significantly more). No matter what type of SSDs are used at the backend, additional network hop significantly slows down the reads and writes.

At Ubicloud, we use local NVMe SSDs, which is why we are able to achieve high read/write performances. However, as ngalstyan4 suggested, benchmarking is required to make more definitive claims.

ngalstyan41y ago

Author here. We will benchmark this thoroughly in the future for our vector indexes.

But at least anecdotally, it made a ton of difference.

We met <200ms latency budget with Ubicloud NVMes but had to wait seconds to get an answer from the same query with GCP persistent disks or local SSDs

elijahbenizzy1y ago

The venn diagram of "problems that people have" and "problems that postgres solves" is closer to a circle than many would like to admit.

hdhshdhshdjd1y ago

Indexing in Postgres is legitimately painful, I don’t think “get moar ram” is a good response to that particular critique.

deepsquirrelnet1y ago

Neither is it a good demonstration of things that people who currently maintain postgres are experienced in doing. Companies should be reluctant to manage their own vector indexes until this becomes more a more mainstream skillset.

This excellent blog post[1] demonstrates the complexities of scaling HNSW indexes and shows that at a certain point, you need to switch to ivfpq with vastly different performance and accuracy characteristics.

https://aws.amazon.com/blogs/big-data/choose-the-k-nn-algori...

ngalstyan41y ago

Author here.

> I don’t think “get moar ram” is a good response to that particular critique.

I do not think the blog post suggested "get more ram" as a response, but happy to clarify if you could share more details!

> Indexing in Postgres is legitimately painful

Lantern is here to make the process seamless and remove most of the pain for people building LLM/AI applications. Examples:

1. We build tools to remove the guesswork of HNSW index sizing. E.g. https://lantern.dev/blog/calculator

2. We analyze typical patterns people use when building LLM apps and suggest better practices. E.g. https://lantern.dev/blog/async-embedding-tables

3. We build alerts and triggers into our cloud database that automate the discovery of many issues via heuristics.

darby_nine1y ago

Compared to what?

esafak1y ago

How does it compare with LanceDB, and what's interesting about lantern technically?

Labo3331y ago

PostgreSQL is Enough https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

j / k navigate · click thread line to collapse

17 comments

beoberha1y ago

Vector dbs are quickly becoming a commodity. This was not true when pinecone was founded and received its first few rounds of funding.

At this point, Postgres has clearly caught up and the VCs are going to do everything it takes to hold on.

klysm1y ago

Vectors seem like kinda the easiest thing to store in a lot of ways

beoberha1y ago

colordrops1y ago

It was obvious from the beginning that vector databases were commodity. I heard about pgvector before pinecone.

beoberha1y ago

ldjkfkdsjnv1y ago

Pinecone just put devops around facebooks FAISS library

cellis1y ago

Wow this should be getting a lot more love. I wonder if you could do a breakdown vs Qdrant?

Rapzid1y ago

I wonder how much index creation degrades if the storage were 4th gen NVMe drives vs other other, more typical storage technologies.

pwmtr1y ago

At Ubicloud, we use local NVMe SSDs, which is why we are able to achieve high read/write performances. However, as ngalstyan4 suggested, benchmarking is required to make more definitive claims.

ngalstyan41y ago

Author here. We will benchmark this thoroughly in the future for our vector indexes.

But at least anecdotally, it made a ton of difference.

We met <200ms latency budget with Ubicloud NVMes but had to wait seconds to get an answer from the same query with GCP persistent disks or local SSDs

elijahbenizzy1y ago

The venn diagram of "problems that people have" and "problems that postgres solves" is closer to a circle than many would like to admit.

hdhshdhshdjd1y ago

Indexing in Postgres is legitimately painful, I don’t think “get moar ram” is a good response to that particular critique.

deepsquirrelnet1y ago

https://aws.amazon.com/blogs/big-data/choose-the-k-nn-algori...

ngalstyan41y ago

Author here.

> I don’t think “get moar ram” is a good response to that particular critique.

I do not think the blog post suggested "get more ram" as a response, but happy to clarify if you could share more details!

> Indexing in Postgres is legitimately painful

Lantern is here to make the process seamless and remove most of the pain for people building LLM/AI applications. Examples:

1. We build tools to remove the guesswork of HNSW index sizing. E.g. https://lantern.dev/blog/calculator

2. We analyze typical patterns people use when building LLM apps and suggest better practices. E.g. https://lantern.dev/blog/async-embedding-tables

3. We build alerts and triggers into our cloud database that automate the discovery of many issues via heuristics.

darby_nine1y ago

Compared to what?

esafak1y ago

How does it compare with LanceDB, and what's interesting about lantern technically?

Labo3331y ago

PostgreSQL is Enough https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

j / k navigate · click thread line to collapse