I've found it cumbersome using some of the new vector DBs (chroma, faiss, etc) to make end to end systems, but with Marqo it doesn't seem too hard.
What parts are cumbersome?
Chroma, Pinecone, I guess FAISS/HNSWlib/etc only handle vector operations. Really what I'd want, which Marqo does, is handle everything end to end.
Edit: wording
Producing the transcript?
Being able to classify and search data seems like a pretty big deal these days too.
Is there anything as good ready to use on-prem for the diarization (speaker recognition)?
I've heard good things about whisper(.cpp) for speech recognition and vosk used to be king of that hill...
documentation is poor, and what you find is sparsed outdated shit on the web, so it's really hard to find help.
Pardon the dumb question I only have an elementary understanding