Once we made Khoj search incremental, I completely stopped using the default incremental search (C-s) in Emacs. Since then Khoj has grown to support more content types, deeper integrations and chat (using ChatGPT). With Llama 2 released last week, chat models are finally good and easy enough to use on consumer hardware for the chat with docs scenario.
Khoj is a desktop application to search and chat with your personal notes, documents and images. It is accessible from within Emacs, Obsidian or your Web browser. It works with org-mode, markdown, pdf, jpeg files and notion, github repositories. It is open-source and can work without internet access (e.g on a plane).
Our chat feature allows you to extract answers and create content from your existing knowledge base. Example: "What was that book Trillian mentioned at Zaphod's birthday last week". We personally use the chat feature regularly to find links, names and addresses (especially on mobile) and collate content across multiple, messy notes. It works online or offline: you can chat without internet using Llama 2 or with internet using GPT3.5+ depending on your requirements.
Our search feature lets you quickly find relevant notes, documents or images using natural language. It does not use the internet. Example: Search for "bought flowers at grocery store" will find notes about "roses at wholefoods".
Quickstart:
pip install khoj-assistant && khoj
See https://docs.khoj.dev/#/setup for detailed instructionsWe also have desktop apps (in beta) at https://github.com/khoj-ai/khoj/releases/tag/0.10.0 if you want to try them out.
Please do try out Khoj and let us know if it works for your use cases? Looking forward to the feedback!
----
What model size/particular fine-tuning are you using, and how have you observed it to perform for the usecase? I've only started playing with Llama 2 at 7B and 13B sizes, and I feel they're awfully RAM heavy for consumer machines, though I'm really excited by this possibility.
How is the search implemented? Is it just an embedding and vector DB, plus some additional metadata filtering (the date commands)?
Khoj is using the Llama 7B, 4bit quantized, GGML by TheBloke.
It's actually the first offline chat model that gives coherent answers to user queries given notes as context.
And it's interestingly more conversational than GPT3.5+, which is much more formal
Llama 2 gives great answers, even the 7B model. There’s an “uncensored” 7B version as well George Sung has fine-tuned for topics that the default Llama2 model won’t discuss - eg I had trouble having Llama2 review authentication/security code or topics: https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GG...
From just playing around with it the uncensored model still seems to know where to “draw the line” on sensitive topics but YMMV
If you do end up checking out Ollama you can try it with with this command or there’s an API too (it’s not in the docs yet)
ollama run llama2-uncensoredHow are you determining what notes (or snippets of notes?) to be injected as context? Especially given the small 2048 context limit with Llama 1.
I am sufficiently uneducated on the ins and outs of AI integrations to always wonder if projects like this one can be used in local-only mode, i.e. when self-hosted ensuring me that never any of my personal information is sent to a remote service. So it would be very helpful to very explicitly give me that assurance of privacy, if that's the case.
Would be hella nice to connect all the scattered lines of thoughts in various notes on a variety of platforms.
I've tried dozens of notetaking apps and that's the only one that truly felt like a second brain.
It's because of the speed. Infuriatingly, Obsidian for example can search just as fast, but they intentionally programmed in a lag after each keystroke... (I know because I removed it.)
It screencaps your desktop every 5 sec so you can watch a timelapse of how you spent your day. (Assuming it was on the computer!)
I did find it heavy on the disk usage so I wrote a ffmpeg script to convert it to video (much more efficient).
If you can collate your notes into markdown or some such, then messy notes can be handled, at least using Khoj with GPT3.5+.
Do let us know how we can help out and what your current biggest pain-points are?
Would some summary of previous day would be helpful to you? Is your memory problem only episodic, or does it extend to factual and kinesthetic as well?
I got really excited about this and fired it up on my petite little M2 Macbook Air only for it to grind it to a halt. Think the old days when you had a virus on your PC and you'd move the mouse then wait 45 seconds to see the cursor move. It honestly made me feel nostalgic. I guess I have to taper performance expectations with this Air, though this is the first time it's happened.
This is getting very close to my ideal of a personal AI. It's only gonna be a few more years until I can have a digital brain filled with everything I know. I can't wait
Having something that indexes all your digital travels and makes it easily digestible will be gold. Hopefully Khoj can become that :)
There was.
It was called Google Desktop Search, it was awesome, and it was axed.
That said, today I wouldn't use it anyway as both I and Google have changed a lot.
Does anyone have recommendations for a tool that does it?
Or, anyone want to build it together?
You'll just need to configure the asymmetric search model khoj uses to paraphrase-multilingual-MiniLM-L12-v2 in your ~/.khoj/khoj.yml config file
See http://docs.khoj.dev/#/advanced?id=search-across-different-l...
It would be awesome if it could also index a directory of PDFs, and if it could do OCR on those PDFs to support indexing scanned documents. Probably outside of the scope of the project for now, but just the other day I was just thinking how nice it would be to have a tool like this.
Khoj can index directory of PDFs for search and chat. But it does not currently work with scanned PDF files (i.e not with ones without selectable text).
Being able to work with those would be awesome. We just need to get to it. Hopefully soon
etc...
I've wanted a "COMPUTER.", uh... I say "COMPUTER!", 'sir, you have to use the keyboard', ah a Keyboard, how quaint.... forever.
Of course, having it be stable enough to not `rm -rf /` soon after is definitely not part of the warranty
A number of apps that are designed for OpenAI’s completion/chat APIs can simply point to the endpoints served by llama-cpp-python [0], and function in (largely) the same way, while using the various models and quants supported by llama.cpp. That would allow folks to run larger models on the hardware of their choice (including Apple Silicon with Metal acceleration or NVIDIA GPUs) or using other proxies like openrouter.io. I enjoy openrouter.io myself because it supports Anthropic’s 100k models.
I'll provide my insight from experimentation integrating Llama V2/GPT4All into Khoj -- Falcon 7b is probably the runner up in models that can be supported on consumer hardware, and it really wasn't good enough (for me) on my machine to be useful. The token consumption with personal notes context is too large, and the content too variable for a small model like that to be able to understand it. It's fine if you're just doing normal question-answering back and forth, but you don't need Khoj for that.
1. If you want better adoption especially among corporations, GPL-3 wont cut it. Maybe think of some business friendly licenses (MIT etc)
2. I understand the excitement about llm's. But how about making something more accessible to people with regular machines and not state of art. I use rip-grep-all (rga) along with fzf [1] that can search all files including pdfs in a specific folders. However, I would like a GUI tool to
(a) search across multiple folders,
(b) provide priority of results across folders, filetypes and
(c) store search histories where I can do a meta-search.
This is sufficient for 95% of my usecases to search locally and I don't need LLM. If khoj can enable such search as default without LLM that will be a gamechanger for many people without a heavy compute machine or who dont want to use OpenAI.[1] https://github.com/phiresky/ripgrep-all/wiki/fzf-Integration
Have a look at how that worked out for the folks who built node and its libraries versus the ones who maintained control of their work (like npm).
For now, local LLMs take up an egregious about of RAM, totally agreed. But we trust the ecosystem is going to keep improving and growing and we'll be able to make improvements over time. They'll probably become efficient enough where we can run them on phones, which will unlock some cool scope for Khoj to integrate with on device, offline assistance.
Or at least models that don’t hog so much RAM.
The RAM usage is kind of the point though; we're trading space for time. It's not a problem that the model is using it, it's just that with the default choice for UI being web based now, the unnecessary memory usage of browsers is actually starting to be a real pain point.
Ideal: 16Gb (GPU) RAM
Less Ideal: 8GB RAM and CPU
What about if I have a GPU with 8GB?
PS. Nice to see an Hindi name for a software. For those who don't speak Hindi: https://en.m.wiktionary.org/wiki/%E0%A4%96%E0%A5%8B%E0%A4%9C...
We use it for understanding usage -- like determining whether people are using markdown or org or more.
Everything is collected entirely anonymized, and no identifiable information is ever sent to the telemetry server.
To opt-out, you set the `should-log-telemetry` value in `khoj.yml` to false. Updated the docs to include these instructions and what we collect -- https://docs.khoj.dev/#/telemetry.
A few observations:
1. Telemetry is enabled by default, and may contain the API and chat queries. I've logged an issue for this along with some suggestions here: https://github.com/khoj-ai/khoj/issues/389
2. It would be advantageous to have configuration in the UI rather than baking it's YAML into the container image. (added a note on that in the aforementioned issue on Github).
3. It's not clear if you can bring your own models, e.g. can I configure a model from huggingface/gpt4all? if so, will it be automatically downloaded based on the name or should I put the .bin (and yaml?) in a volume somewhere?
4. AMD GPU/APU acceleration (CLBLAS) would be really nice, I've logged an issue for this feature request as well. https://github.com/khoj-ai/khoj/issues/390
I responded in the issue, but I'll paste here as well for those also curious:
Khoj does not collect any search or chat queries. As mentioned in the docs, you can see our telemetry server[1]. If you see anything amiss, point it out to me and I'll hotfix it right away. You can see all the telemetry metadata right here[2].
[1]: https://github.com/khoj-ai/khoj/tree/master/src/telemetry
[2]: https://github.com/khoj-ai/khoj/blob/master/src/khoj/routers...
Configuration with the `docker-compose` setup is a little bit particular, see the issue^ for details.
Thanks for the reference points for GPU integration! Just to clarify, we do use GPU optimization for indexing, but not for local chat with Llama. We're looking into getting that working.
This may be more difficult if you are pre-tokenizing the search context.
Very cool project.
My workflow looks like: 1. Search with Khoj search[1]: `C-c s s` <search-query> RET 2. Use speed key to jump to relevant entry[2]: with `n n o 2`
[1]: `C-c s` is bound to `khoj` transient menu [2] https://orgmode.org/manual/Speed-Keys.html
#12 2.017 ERROR: Could not find a version that satisfies the requirement pyside6>=6.5.1 (from khoj-assistant) (from versions: none)
#12 2.017 ERROR: No matching distribution found for pyside6>=6.5.1
------
executor failed running [/bin/sh -c sed -i 's/dynamic = \["version"\]/version = "0.0.0"/' pyproject.toml && pip install --no-cache-dir .]: exit code: 1
I'm particularly interested in your OS/build environment.
The "buildx" flag gets me past that one, and to the next error:
#0 12.37 ERROR: Could not find a version that satisfies the requirement gpt4all>=1.0.7 (from khoj-assistant) (from versions: 0.1.5, 0.1.6, 0.1.7)
#0 12.37 ERROR: No matching distribution found for gpt4all>=1.0.7
lscpu output: Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 36 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8 On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
CPU family: 6
Model: 58
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 9
CPU(s) scaling MHz: 35%
CPU max MHz: 3400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4791.90
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cp
uid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr
_shadow vnmi flexpriority ept vpid fsgsbase smep erms
xsaveopt dtherm ida arat pln pts md_clear flush_l1d L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1 MiB (4 instances)
L3: 6 MiB (1 instance)
NUMA: NUMA node(s): 1
NUMA node0 CPU(s): 0-7And Search can be configured to work with 50+ languages.
You'll just need to configure the asymmetric search model khoj uses to paraphrase-multilingual-MiniLM-L12-v2 in your ~/.khoj/khoj.yml config file
For setup details see http://docs.khoj.dev/#/advanced?id=search-across-different-l...
Irrelevant opinion - The logo is beautiful, I like it and so are the colours used.
Lastly, LLMA2 for such use cases, I think is capable enough that paying for ChatGPT won't be as lucrative especially when privacy is of concern.
Keep it up. Good craftsmanship. :)
You've come a good way in both directions: the messaging is clearer about current state vs aspirations, and you've made good progress towards the aspirational parts.
Really glad to see the warm reception you're getting now. Nice job, y'all.
I tried to run it on a pretty beefy machine (8 core cpu/32 GB RAM) to use with ~40 odd PDF documents. My observation is that the queries (chat) takes forever and also getting Segmentation fault (core dumped) for every other or so query.
We have fixes for the seg fault[1] and improvement to the query speed[2] that should be released by end of day today[3].
Update khoj to version 0.10.1 with pip install --upgrade khoj-assistant later today to see if that improves your experience.
The number of documents/pages/entries doesn't scale memory utilization as quickly and doesn't affect the search, chat response time as much
[1]: The seg fault would occur when folks sent multiple chat queries at the same time. A lock and some UX improvements fixed that
[2]: The query time improvements are done by increasing batch size, to trade-off increased memory utilization for more speed
[3]: The relevant pull request for reference: https://github.com/khoj-ai/khoj/pull/393
Please, someone make a home-assistant Alexa clone for this.
We've just been testing integrating over voice, whatsapp over the last few days[1][2] :)
[1]: https://github.com/khoj-ai/khoj/tree/khoj-chat-over-whatsapp...
[2]: https://github.com/khoj-ai/khoj/compare/master...features/wh...
Is there a way to have this bot read from a discord and google drive?
I tried privategpt but results were not great.
I can't wait for software that will take my notes each day and fine tune a LLM model on them so I can use entire context length for my question/answers.
Problem is finetuning does not work that way. Finetuning is useful when you want to teach a model about a certain pattern, not when you want it output it right. Eg: With enough finetuning and prompts, a model will be able to output the result in a certain format that you need, but it does not guarantee that it would not be hallucination prone. The best way to minimize hallucination is still embedding based retrieval passed along with the question/prompt.
In future, there can be a system where you can build a knowledge base for LLMs, and tell it to access that for any knowledge, and finetune it for the patterns you want the output in.
Could you elaborate on the incremental search feature? How did you implement it? Don't you need to re-encode the full query through a SBERT or such as each token is written (perhaps with debouncing)?
Also, having an easily-extended data connector interface would be awesome, to connect to custom data sources.
Yes, we don't do optimizations on the query encoding yet. So SBERT just re-encodes the whole query every time. It gets results in <100ms which is good enough for incremental search.
I did create a plugin system, so that a data plugin just has to convert the source data into a standardized intermeditate jsonl format. But this hasn't been documented or extensively tested yet.
But that would allow you to access Khoj from the web.