Khoj: An AI personal assistant for your digital brain (opens in new tab)

(khoj.dev)

155 pointsactivatedgeek2y ago92 comments

92 comments

45 comments · 16 top-level

mkumar102y ago· 11 in thread

- Seen a few of these. Are you all working on providing an easy way to maybe use LLMs for chatting/search without sending my data to OpenAI? If yes, how will you verify the quality is "reasonable"?

- How is this better than Rewind, Needl, Mem, etc all the personal search engine that have been doing the rounds lately from various knowledge bases? Is the selling point that it's Open-source? Also if Apple improves spotlight, I wonder how useful this will be.

sabaimran2y ago

Hello! One of the developers of Khoj here.

The way we see it, building in the open is going to be critical for creating an aligned, trustworthy AI assistant.

Note: while all LLM tools look fairly similar on the surface these days, our specific approaches are fairly different. Give us a try and see what you think :-)

badtension2y ago

And yet you didn't answer them at all.

1 more reply

weekay2y ago

>Are you all working on providing an easy way to maybe use LLMs for chatting/search without sending my data to OpenAI?

From a brief look at the github repo there seems to be need to setup OpenAI API key so not sure if this currently has the ability to chat / search w/o sending or needing a OpenAI API access ?

1 more reply

kossTKR2y ago

"The way we see it, building in the open is going to be critical for creating an aligned, trustworthy AI assistant."

Isn't this service just a very thin wrapper around chat-gpt? How on earth do you have any influence on alignment or trustworthiness. That's like saying your coffee cup makes your coffee fair trade.

This whole thread is very disingenuous, it's literally a simple interface for the OpenAI-API drenched in fake buzzwords boosted to the top of HN to scam investors.

3 more replies

ignoramous2y ago

> Seen a few of these. Are you all working on providing an easy way to maybe use LLMs for chatting/search without sending my data to OpenAI?

Curious: What informs reservations about the use of OpenAI models? Their API terms state explicitly that they do not use customer data for training and that they delete it after 30 days, anyway.

> Also if Apple improves spotlight, I wonder how useful this will be.

There are 3x more Android phones and PCs than iPhones and Macs. Just sayin'

tourmalinetaco2y ago

> What informs reservations about the use of OpenAI models?

Three things. For one, I have no reason to take them at their word that they aren’t saving data to train on. Two is that OpenAI will shut down one day, and thus I would like any services I run to outlive them. Third and finally, I have hardware and it’d be a waste not to use it. As a bonus, I find it hypocritical a company that benefits so heavily from open source would hide away their models as closed source in fear of copycats.

1 more reply

jl62y ago

This industry has an atrocious track record of claiming to respect privacy, and then doing something entirely different. I have no reason to think OpenAI are lying, but it would still be wise to be extremely cautious of putting sensitive data in their hands.

1 more reply

homarp2y ago

>they do not use customer data for training and that they delete it after 30 days, anyway.

I don't use X, just keep it around, 'just in case' for 30 days.

1 more reply

petemir2y ago

> Also if Apple improves spotlight, I wonder how useful this will be.

Do you really not see the usefulness of a solution that caters to the remaining 88% (desktop/notebooks) of the market?

villgax2y ago

Reasonable from openAI is again at their whims & changes to what they consider is appropriate for you.

Haven't seen a roadmap on Spotlight to include semantic search across my entire local drive. Maybe if they Integrate Journal/Freeform/Notes into one thing then it is deliberate & works with things I explicitly want it to understand & help me work with rather than the tools that you've listed which just help you find stuff

kristiandupont2y ago

To me, this makes a significant difference.

While I would prefer that I could run the LLM locally, being able to see the code that calls the api is a clear second best. At this point in time, I am not going to trust any black box that can read my data and run "AI" on it because I find the risk too big. If I can self-host something, I might just be willing to try it out.

permalac2y ago· 4 in thread

At this point I'm surprised nobody connects these tools to Gmail, Gsuite, and/or a posix structure. If it has to be my self hosted AI assistant I should be able to provide my documents to it, right?

sabaimran2y ago

We already index org, markdown, and PDF files on your file system. We're adding a text connector soon, which will allow you to index any plain text files you care to.

With that, you should be able to index Gmail over Maildir/POP/IMAP?

moneywoes2y ago

What about images with ocr?

1 more reply

Shawnj22y ago

Microsoft announced they were integrating GPT-4 into the office suite so I wouldn’t be surprised if google does something similar with Bard

JuanPosadas2y ago

Google Docs added an AI a few weeks ago.

ShamelessC2y ago· 3 in thread

Perhaps I'm in the minority, but seeing open-source used in the description made me think you were using or providing an openly available LLM in addition to the chat/search features. Instead it seems this is "merely" (I don't mean to undermine the level of effort involved) using OpenAI's GPT-4 API for its LLM.

This sort of reek of a growth mindset where you are using "open-source" for the purposes of looking cool and gaining users, but you are in fact trying to grow as quickly as possible to prove to investors that they should fund you for your next round.

I have no reason to believe that's the case for you in particular; just letting you know that some people may perceive things that way. Maybe you could make it clearer that it is a GPT-4 frontend of sorts?

sabaimran2y ago

1. Khoj has been around since early 2021, and both of us have been contributing to open source for several years. Being open source for a project like this just makes sense. 2. Search actually is 100% offline. It's using sentence transformer models from huggingface. 3. Chat uses OpenAI's model only because they're currently best in class (and easy to set up). Our plan (more of a 6-month view) is to have our own open source LLM hosted for inference. See issue for reference: https://github.com/khoj-ai/khoj/issues/201

kossTKR2y ago

No, you're not the minority. This is twitter get-rich-quick-guru level lousy and fake, and is clearly boosted to the top of HN.

This stuff is so incredibly tiring, because it's already all over social media and HN should be a safe space with actual products.

kristiandupont2y ago

Jeez, take it easy with the unwarranted hostility!. I get that you disagree with using OpenAI's APIs but clearly this is an "actual product" in every sense of the word and not some snake oil.

darkteflon2y ago· 2 in thread

Just had a look at the code. It’s a cool project that’s clearly had a lot of thought put into it.

If the devs are still around, I’d love to hear about your experiences with embeddings.

1102y ago

1. One of the reasons we created Khoj was being able to do natural language search with embeddings generated offline using open-source models!

2. We don't use any vector datastores (yet). You can do a lot in memory, it's faster and it does exact matches (no KNN, approx matching)

Feel free to ask if you were looking for something more specific?

darkteflon2y ago

Thank you! I’d love to hear more about your experiences with:

1. content / question vector mismatch

2. what types of embedding you experimented with storing per-chunk (text only? Hypothetical question? Metadata?)

3. choice of embeddings model (eg OpenAI vs instructorEmbeddings or an alternative from the MTEB leaderboard)

It’s a great project, going to have a deeper dig today.

thunderbong2y ago· 2 in thread

Khoj means 'search' in some Indian languages

pkoird2y ago

I wonder if there's a .do TLD. If so, they could've done khoj.do

the_common_man2y ago

It's made by indians

two_handfuls2y ago· 2 in thread

This uses ChatGPT, and the article makes no promise that our personal data will not be sent to ChatGPT.

No, thanks.

sabaimran2y ago

The GPT integration only works if you pass Khoj an OpenAI key in your settings, so it's a pretty explicit opt-in. Otherwise, there's no way for Khoj to send data to OpenAI. Does that make sense?

two_handfuls2y ago

But what can Khoj do without that?

1 more reply

sabaimran2y ago· 1 in thread

Hey activatedgeek! Thanks for sharing Khoj. @110 and I are the developers.

Lots of great discussion going on in this thread. Two things we want to clarify:

1. Search works offline. Chat uses OpenAI.

2. We're working on adding open source LLM support for chat. We're evaluating quality and ease of setup for this.

If you find the project interesting, hop on our Discord and share your thoughts: https://discord.gg/BDgyabRM6e.

We very much want to hear about your experiences and how we can make something more useful for the community.

ubertaco2y ago

Ah, this comment puts a lot into context for me. Y'all didn't _intend_ this to be your big PR push here yet, and now you're caught "mid-flight" explaining why your marketing is still aspirational instead of true-to-current-state.

Feeling for y'all.

nnechm272y ago· 1 in thread

Nice work! I think one way I would definitely use it is if I can just ask questions about my downloads folder :) on my mac. If you are like me, you probably have papers, invoices, proof of addresses, passports and stuff like that inside. And would I be able to ask what's the passport number of ... so I can enter it into the web check in for a plane. Or if I need to know what my last electricity bill was ?

sabaimran2y ago

Those are cool use cases! PDFs with text should work. Maybe I should try and index my download folder too :-).

jeleh2y ago· 1 in thread

What is the difference to e.g. KnowledgeGPT?

https://news.ycombinator.com/item?id=34652921

I think i will have to test both solutions myself...

sabaimran2y ago

Quite cool! It looks like this tool is oriented around ephemeral sessions, while Khoj is meant to be personal and local to you.

Mithriil2y ago· 1 in thread

Hi there! To the developers:

Is there a way to use a personally owned and hosted LLM? If not, is there an interest in developing such a feature?

1102y ago

Hi Mithril,

For search we already use a offline/self-hosted model from HuggingFace. And you can easily configure it to use other SentenceTransformer models from HuggingFace

For chat, follow this feature: https://github.com/khoj-ai/khoj/issues/201 to see when Khoj gets the ability to use offline/self-hosted chat models

moneywoes2y ago· 1 in thread

Notion plug-in would be fantastic

1102y ago

We're already on it: https://github.com/khoj-ai/khoj/pull/284 :)

alpaca1282y ago

The example shown doesn't really fit what I associate with "personal assistant". Assistants do tasks, not answer questions like "where do good ideas come from?". I can ask that ChatGPT without any third-party middlemen.

abdullin2y ago

Here is a test to assess quality of these assistants.

(1) upload the bitcoin white-paper. (2) ask question “What is the contribution of R.C. Merkle to this reasearch?”

The proper answer should mention “Merkle Trees”.

madmod2y ago

It is impossible for me To read this site on my iphone because the header size keeps changing with the typing animation so the text is moving up and down every second.

alok-g2y ago

Would like to see this support Word documents also. Does not sound like those are as yet.

meghan_rain2y ago

relies on ClosedAI, what's the point of being the 47373th app that does so?

j / k navigate · click thread line to collapse

92 comments

45 comments · 16 top-level

mkumar102y ago· 11 in thread

- Seen a few of these. Are you all working on providing an easy way to maybe use LLMs for chatting/search without sending my data to OpenAI? If yes, how will you verify the quality is "reasonable"?

sabaimran2y ago

Hello! One of the developers of Khoj here.

The way we see it, building in the open is going to be critical for creating an aligned, trustworthy AI assistant.

Note: while all LLM tools look fairly similar on the surface these days, our specific approaches are fairly different. Give us a try and see what you think :-)

badtension2y ago

And yet you didn't answer them at all.

1 more reply

weekay2y ago

>Are you all working on providing an easy way to maybe use LLMs for chatting/search without sending my data to OpenAI?

From a brief look at the github repo there seems to be need to setup OpenAI API key so not sure if this currently has the ability to chat / search w/o sending or needing a OpenAI API access ?

1 more reply

kossTKR2y ago

"The way we see it, building in the open is going to be critical for creating an aligned, trustworthy AI assistant."

Isn't this service just a very thin wrapper around chat-gpt? How on earth do you have any influence on alignment or trustworthiness. That's like saying your coffee cup makes your coffee fair trade.

This whole thread is very disingenuous, it's literally a simple interface for the OpenAI-API drenched in fake buzzwords boosted to the top of HN to scam investors.

3 more replies

ignoramous2y ago

> Seen a few of these. Are you all working on providing an easy way to maybe use LLMs for chatting/search without sending my data to OpenAI?

Curious: What informs reservations about the use of OpenAI models? Their API terms state explicitly that they do not use customer data for training and that they delete it after 30 days, anyway.

> Also if Apple improves spotlight, I wonder how useful this will be.

There are 3x more Android phones and PCs than iPhones and Macs. Just sayin'

tourmalinetaco2y ago

> What informs reservations about the use of OpenAI models?

1 more reply

jl62y ago

1 more reply

homarp2y ago

>they do not use customer data for training and that they delete it after 30 days, anyway.

I don't use X, just keep it around, 'just in case' for 30 days.

1 more reply

petemir2y ago

> Also if Apple improves spotlight, I wonder how useful this will be.

Do you really not see the usefulness of a solution that caters to the remaining 88% (desktop/notebooks) of the market?

villgax2y ago

Reasonable from openAI is again at their whims & changes to what they consider is appropriate for you.

kristiandupont2y ago

To me, this makes a significant difference.

permalac2y ago· 4 in thread

At this point I'm surprised nobody connects these tools to Gmail, Gsuite, and/or a posix structure. If it has to be my self hosted AI assistant I should be able to provide my documents to it, right?

sabaimran2y ago

We already index org, markdown, and PDF files on your file system. We're adding a text connector soon, which will allow you to index any plain text files you care to.

With that, you should be able to index Gmail over Maildir/POP/IMAP?

moneywoes2y ago

What about images with ocr?

1 more reply

Shawnj22y ago

Microsoft announced they were integrating GPT-4 into the office suite so I wouldn’t be surprised if google does something similar with Bard

JuanPosadas2y ago

Google Docs added an AI a few weeks ago.

ShamelessC2y ago· 3 in thread

sabaimran2y ago

kossTKR2y ago

No, you're not the minority. This is twitter get-rich-quick-guru level lousy and fake, and is clearly boosted to the top of HN.

This stuff is so incredibly tiring, because it's already all over social media and HN should be a safe space with actual products.

kristiandupont2y ago

Jeez, take it easy with the unwarranted hostility!. I get that you disagree with using OpenAI's APIs but clearly this is an "actual product" in every sense of the word and not some snake oil.

darkteflon2y ago· 2 in thread

Just had a look at the code. It’s a cool project that’s clearly had a lot of thought put into it.

If the devs are still around, I’d love to hear about your experiences with embeddings.

1102y ago

1. One of the reasons we created Khoj was being able to do natural language search with embeddings generated offline using open-source models!

2. We don't use any vector datastores (yet). You can do a lot in memory, it's faster and it does exact matches (no KNN, approx matching)

Feel free to ask if you were looking for something more specific?

darkteflon2y ago

Thank you! I’d love to hear more about your experiences with:

1. content / question vector mismatch

2. what types of embedding you experimented with storing per-chunk (text only? Hypothetical question? Metadata?)

3. choice of embeddings model (eg OpenAI vs instructorEmbeddings or an alternative from the MTEB leaderboard)

It’s a great project, going to have a deeper dig today.

thunderbong2y ago· 2 in thread

Khoj means 'search' in some Indian languages

pkoird2y ago

I wonder if there's a .do TLD. If so, they could've done khoj.do

the_common_man2y ago

It's made by indians

two_handfuls2y ago· 2 in thread

This uses ChatGPT, and the article makes no promise that our personal data will not be sent to ChatGPT.

No, thanks.

sabaimran2y ago

The GPT integration only works if you pass Khoj an OpenAI key in your settings, so it's a pretty explicit opt-in. Otherwise, there's no way for Khoj to send data to OpenAI. Does that make sense?

two_handfuls2y ago

But what can Khoj do without that?

1 more reply

sabaimran2y ago· 1 in thread

Hey activatedgeek! Thanks for sharing Khoj. @110 and I are the developers.

Lots of great discussion going on in this thread. Two things we want to clarify:

1. Search works offline. Chat uses OpenAI.

2. We're working on adding open source LLM support for chat. We're evaluating quality and ease of setup for this.

If you find the project interesting, hop on our Discord and share your thoughts: https://discord.gg/BDgyabRM6e.

We very much want to hear about your experiences and how we can make something more useful for the community.

ubertaco2y ago

Feeling for y'all.

nnechm272y ago· 1 in thread

sabaimran2y ago

Those are cool use cases! PDFs with text should work. Maybe I should try and index my download folder too :-).

jeleh2y ago· 1 in thread

What is the difference to e.g. KnowledgeGPT?

https://news.ycombinator.com/item?id=34652921

I think i will have to test both solutions myself...

sabaimran2y ago

Quite cool! It looks like this tool is oriented around ephemeral sessions, while Khoj is meant to be personal and local to you.

Mithriil2y ago· 1 in thread

Hi there! To the developers:

Is there a way to use a personally owned and hosted LLM? If not, is there an interest in developing such a feature?

1102y ago

Hi Mithril,

For search we already use a offline/self-hosted model from HuggingFace. And you can easily configure it to use other SentenceTransformer models from HuggingFace

For chat, follow this feature: https://github.com/khoj-ai/khoj/issues/201 to see when Khoj gets the ability to use offline/self-hosted chat models

moneywoes2y ago· 1 in thread

Notion plug-in would be fantastic

1102y ago

We're already on it: https://github.com/khoj-ai/khoj/pull/284 :)

alpaca1282y ago

abdullin2y ago

Here is a test to assess quality of these assistants.

(1) upload the bitcoin white-paper. (2) ask question “What is the contribution of R.C. Merkle to this reasearch?”

The proper answer should mention “Merkle Trees”.

madmod2y ago

It is impossible for me To read this site on my iphone because the header size keeps changing with the typing animation so the text is moving up and down every second.

alok-g2y ago

Would like to see this support Word documents also. Does not sound like those are as yet.

meghan_rain2y ago

relies on ClosedAI, what's the point of being the 47373th app that does so?

j / k navigate · click thread line to collapse