- How is this better than Rewind, Needl, Mem, etc all the personal search engine that have been doing the rounds lately from various knowledge bases? Is the selling point that it's Open-source? Also if Apple improves spotlight, I wonder how useful this will be.
The way we see it, building in the open is going to be critical for creating an aligned, trustworthy AI assistant.
Note: while all LLM tools look fairly similar on the surface these days, our specific approaches are fairly different. Give us a try and see what you think :-)
From a brief look at the github repo there seems to be need to setup OpenAI API key so not sure if this currently has the ability to chat / search w/o sending or needing a OpenAI API access ?
Isn't this service just a very thin wrapper around chat-gpt? How on earth do you have any influence on alignment or trustworthiness. That's like saying your coffee cup makes your coffee fair trade.
This whole thread is very disingenuous, it's literally a simple interface for the OpenAI-API drenched in fake buzzwords boosted to the top of HN to scam investors.
Curious: What informs reservations about the use of OpenAI models? Their API terms state explicitly that they do not use customer data for training and that they delete it after 30 days, anyway.
> Also if Apple improves spotlight, I wonder how useful this will be.
There are 3x more Android phones and PCs than iPhones and Macs. Just sayin'
Three things. For one, I have no reason to take them at their word that they aren’t saving data to train on. Two is that OpenAI will shut down one day, and thus I would like any services I run to outlive them. Third and finally, I have hardware and it’d be a waste not to use it. As a bonus, I find it hypocritical a company that benefits so heavily from open source would hide away their models as closed source in fear of copycats.
I don't use X, just keep it around, 'just in case' for 30 days.
Do you really not see the usefulness of a solution that caters to the remaining 88% (desktop/notebooks) of the market?
Haven't seen a roadmap on Spotlight to include semantic search across my entire local drive. Maybe if they Integrate Journal/Freeform/Notes into one thing then it is deliberate & works with things I explicitly want it to understand & help me work with rather than the tools that you've listed which just help you find stuff
While I would prefer that I could run the LLM locally, being able to see the code that calls the api is a clear second best. At this point in time, I am not going to trust any black box that can read my data and run "AI" on it because I find the risk too big. If I can self-host something, I might just be willing to try it out.
With that, you should be able to index Gmail over Maildir/POP/IMAP?
This sort of reek of a growth mindset where you are using "open-source" for the purposes of looking cool and gaining users, but you are in fact trying to grow as quickly as possible to prove to investors that they should fund you for your next round.
I have no reason to believe that's the case for you in particular; just letting you know that some people may perceive things that way. Maybe you could make it clearer that it is a GPT-4 frontend of sorts?
This stuff is so incredibly tiring, because it's already all over social media and HN should be a safe space with actual products.
If the devs are still around, I’d love to hear about your experiences with embeddings.
2. We don't use any vector datastores (yet). You can do a lot in memory, it's faster and it does exact matches (no KNN, approx matching)
Feel free to ask if you were looking for something more specific?
1. content / question vector mismatch
2. what types of embedding you experimented with storing per-chunk (text only? Hypothetical question? Metadata?)
3. choice of embeddings model (eg OpenAI vs instructorEmbeddings or an alternative from the MTEB leaderboard)
It’s a great project, going to have a deeper dig today.
No, thanks.
Lots of great discussion going on in this thread. Two things we want to clarify:
1. Search works offline. Chat uses OpenAI.
2. We're working on adding open source LLM support for chat. We're evaluating quality and ease of setup for this.
If you find the project interesting, hop on our Discord and share your thoughts: https://discord.gg/BDgyabRM6e.
We very much want to hear about your experiences and how we can make something more useful for the community.
Feeling for y'all.
https://news.ycombinator.com/item?id=34652921
I think i will have to test both solutions myself...
Is there a way to use a personally owned and hosted LLM? If not, is there an interest in developing such a feature?
For search we already use a offline/self-hosted model from HuggingFace. And you can easily configure it to use other SentenceTransformer models from HuggingFace
For chat, follow this feature: https://github.com/khoj-ai/khoj/issues/201 to see when Khoj gets the ability to use offline/self-hosted chat models
(1) upload the bitcoin white-paper. (2) ask question “What is the contribution of R.C. Merkle to this reasearch?”
The proper answer should mention “Merkle Trees”.