My company has a need to accurately identify, with high precision, records in elasticsearch, but with a bit more of a semantic match that existing elasticsearch plugins don't support. Ideally the best of huggingface on top of elasticsearch.
Has anyone on here tried this out? Curious what your experiences are.
Disclaimer: I am one of the maintainers of Haystack:)
Haystack does a lot more besides just wrapping sentence transformers, and we weren't using the rest of it, so it was just a lot of extra dependencies sitting around taking up disk space and memory (I think we had to go up to a larger instance size). I remember feeling a bit frustrated that the dependencies weren't split up into "core" and "optional" in a more fine-grained way, but maybe most users don't mind and so it doesn't make sense for them to prioritize that?
[edit: looks like there's an open issue related to this: https://github.com/deepset-ai/haystack/issues/1070]
[edit 2: 'JPKab happy to share more about using huggingface and elasticsearch. email is in my profile]
Maybe, there's a way to have something for a specific industry?
"What is the population of Italy?" ...gives the population of Rome as first answer at 78.32 relevance :)
I get similar result for some other countries.
"What is the population of Cambridge?" ...to be fair, this is an ambiguous place name as there are several around the world. However the answer it gives is quite far removed from any of them: "In 1788, Kingston had a population of 25,000", Relevance: 93.14
or like this: https://huggingface.co/spaces/Hellisotherpeople/Unsupervised...
I am trying to find anything better than these two for this task. I feel like Haystack could be an option - but I am not sure.
Probably you are interested in extractive summarization for explainability reasons? To overcome this, the summarizer module will show you the passages that were used for creating the abstractive summarization. Hope this is a potential solution for your case!
Can Haystack be used to index structured data, or just text?
Is it required to use elastic as the backend, or can you use a simpler file-based or in-memory backend?
(Also, latest features highlights here https://www.deepset.ai/blog/new-features-in-haystack-v1.0)