I am looking for a search engine that can search 50,000 documents on an Intranet. Very low activity. Just need it to be lightweight, but it would be awesome if it was embedded.
I need googlish full-text search features like fuzzy text search, "did you mean" and maybe autofill search suggestions based on popular queries.
I can't find it now but I recently read a Stanford paper where they use Wikipedia relationships to do "query expansion," which seems to be along the lines of what you're looking for. I've heard neural nets called overkill but Wikipedia relationships + neural net should yield a pretty interesting product and would allow you to suggest options for queries you've never seen before.
That sounds really interesting! I'd be interested in the link if you find it. I need my stuff to run behind the firewall, in the going out to wikipedia, will that compromise the privacy of the searches?
Have you looked into Solr/Lucene? They're run by the apache foundation -- Lucene is the actual engine, and Solr is kind of a wrapper around it. Pretty easy to setup, will parse numerous input document formats. solr.apache.org i believe.
I interested in this one, but I'm just not sure if it has features like fuzzy text search, do you know if it has all the google-like features I listed in the description?