I added an edited note to the bottom of the blog post.
The original post and the experiments were created before pgvector 0.5.1 was out, and we had not realized there was significant work to optimize index creation time in the latest pgvector release.
We reran pgvector benchmarks with pgvector 0.5.1.
Now pgvector index creation is on par or 10% faster than lantern on a single core. Lantern still allows 30x faster index creation by leveraging additional cores.
Wiki
Pgvector - 36m
Lantern - 43m
Lantern external indexing (32 CPU): 2m 15s
Sift
Pgvector - 12m30s
Lantern - 7m
Lantern external indexing (32 CPU): 25s
The DB parameters for the above results (both Lantern and pgvector):
shared_buffers=12GB
maintenance_work_mem=5GB
work_mem=2GB
The DB parameters for the previous results were the defaults for both Lantern and pgvector.
Benchmarking was done using psql timing and used a 32CPU/64GB RAM machine (Linode Dedicated 64).
Feel free to reach out if you need anything for benchmarks.