Isn’t the embedding step much slower than clustering? How many documents are you dealing with?
For I news aggregator I worked on I disregarded k-means because you have to know the number of clusters in advance, and I think it will cluster every document, which is bad for the actual outliers in a dataset.
Agglomerative clustering yielded the best results for us. HDBSCAN was promising but doing weird things with some docs.