The article suggests IVF for larger datasets - this is the direction I'd certainly explore, but I've not personally had to deal with it. HNSW sharding/partitioning might actually work even for a very large - sharded/partitioned - dataset, where each query is a parallelized map/reduce operation.