Dot – A standalone open source app meant for easy use of local LLMs and RAG (opens in new tab)

(github.com)

185 pointsirsagent2y ago42 comments

42 comments

34 comments · 11 top-level

reacharavindh2y ago· 11 in thread

I’m curious to try it out. There seem to be many options to upload a document and ask stuff about it.

But, the holy grail is an LLM that can successfully work on a large corpus of documents and data like slack history, huge wiki installations and answer useful questions with proper references.

I tried a few, but they don’t really hit the mark. We need the usability of a simple search engine UI with private data sources.

haizhung2y ago

Differentiale search indices go into this direction: https://arxiv.org/abs/2202.06991

The approach in the paper has rough edges, but the metrics are bonkers (double digit percentage POINTS improvement over dual encoders). This paper was written before the LLM craze, and I am not aware of any further developments in that area. I think that this area might be ripe for some break through innovation.

verdverm2y ago

https://www.kapa.ai/ seems to be the most popular saas for developer tools & docs. I'm seeing it all over the place

victor1062y ago

Used it, it’s just glorified marketing and among all the solutions we tried it ranked in the bottom three.

The best at least for now is to just use OpenAI’s custom gpt and with some clever (but not hard) it’s quite good.

2 more replies

snowfield2y ago

Rag is limited in that sense. Since the max amount of data you can send is still limited by the token amount that the LLM can process.

But if all you wanted is a search engine that's a bit easier.

The problem is often that a huge wiki installation etc will have a lot of outdated data etc. Which will still be an issue for an llm. And if you had fixed the data you might as well just search for the things you need no?

boredemployee2y ago

I think it depends of what they want. Like a search is indeed an easy solution, but if they want a summarization or a generated, straight answer so then things get a little bit harder.

1 more reply

HeavyStorm2y ago

The LLM would have to be trained on the local data. Not impossible, but maybe too costly?

1 more reply

IanCal2y ago

I'd like to play with giving it more turns. When answering a question the note interesting ones require searching, reading, then searching again, reading more etc.

rgrieselhuber2y ago

This gets to the heart of it. Humans are good at keeping a working memory, as a group or individuals, as lore.

cyanydeez2y ago

If anyone wants down this golden path, I'd recommend forking open search server. It's quite a feat and does the crawling part well.

https://www.opensearchserver.com/

andeee232y ago

https://storytell.ai seems to be doing what you’re looking for, especially the part with the linking to proper references

oulipo2y ago

Have you tried https://markprompt.com/ ?

turnsout2y ago· 4 in thread

Curious about the choice of FAISS. It's a bit older now, and there are many more options for creating and selecting embeddings. Does FAISS still offer some advantages?

alexpinel2y ago

Hi! I'm the guy who made Dot. I remember experimenting with a few different vector stores in the early stages of the project but decided to settle with FAISS. I mainly chose it because it made it easy to perform the whole embedding process locally and also because it allows to merge vector stores which is what I use to load multiple types of documents at once. But I am definately not an expert on the topic and would really appreciate suggestions on other alternatives that might work better! :)

simonw2y ago

What options do you think work better?

J_Shelby_J2y ago

I’m trying build an exhaustive list of realistic options. See the spreadsheet here

https://shelbyjenkins.github.io/blog/retrieval-is-all-you-ne...

turnsout2y ago

I don’t have an opinion, just wondering why they didn’t choose other another option such as Sentence Embeddings, OpenAI embeddings, etc.

1 more reply

eole6662y ago· 3 in thread

Looks nice! But some informations about the hardware requirement are often missing in this kind of project :

- how much ram is needed

- what CPU do you need for decent performances

- can it run on a GPU? And if it does how much vram do you need / does it work only on Nvidia?

prosunpraiser2y ago

Not sure if this helps but this is from tinkering with Mistral 7B on both my M1 Pro (10 Core, 16 GB RAM) and WSL 2 w/ CUDA (Acer Predator 17, i7-7700HK, GTX 1070 Mobile, 16GB DRAM, 8GB VRAM). - Got 15 - 18 Tokens / sec on WSL 2 with slightly higher on M1. Can think of that to about 10 - 15 words per second. Both were using GPU. Haven’t tried CPU on M1 but on WSL 2 it was low single digits - super slow for anything productive. - Used Mistral 7B via llamafile cross-platform APE executable. - For local-uses I found increasing the context size increased the RAM a lot - but it’s fast enough. I am considering adding another 16x1 or 8x2.

Tinkering with building a RAG with some of my documents using the vector stores and chaining multiple calls now.

spxneo2y ago

how does 7b match up to Mistral 8x7B?

coming from chatgpt4 it was a huge breath of fresh air to not deal with the judeo-christian biased censorship.

i think this is the ideal localllama setup--uncensored, unbiased, unlimited (only by hardware) LLM+RAG

1 more reply

alexpinel2y ago

Right now the minimum amount of RAM I would recommend is 16gb, I think it can run with less memory but that will require a few changes here and there (although they might reduce performance). I would also strongly recommend using a GPU over CPU, in my experience it can make the LLM run twice as fast if not more. Only Nvidia GPUs are supported for now and the CUDA toolkit 12.2 is required to run Dot.

pentagrama2y ago· 2 in thread

Not sure if install the Windows GPU or CPU app version [1].

I have:

Processor: Ryzen 5 3600

Video card: Geforce GTX 1660 TI 6Gb DDR6 (Zotac)

RAM: 16Gb DDR4 2666mhz

Any recommendations?

[1] https://dotapp.uk/download.html

alexpinel2y ago

With those settings I would recommend GPU. CUDA acceleration really makes it faster, but keep in mind the CUDA toolkit 12.2 install will be a some 3-4gb

pentagrama2y ago

Thank you!

logro2y ago· 1 in thread

I have a reasonably wast library of technical/scientific epubs/documents. Could I use this to import them and the quiz the books?

alexpinel2y ago

Yes! Of course because the LLM is running locally it is not as advanced as bigger models like Claude or GPT, but you can definately quiz the documents. From my experience it performs better with specific questions rather than more ambigous questions that require extensive understanding of the whole document.

MasterYoda2y ago· 1 in thread

I have collected so much information in text files on my computer that it has become unmanageable to find anything. Now with local AI solutions, I wondered if I could create a smart search engine that could provide answers to the information that exists on my personal data.

My question is.

1 - Even if there is so much data that I can no longer find stuff, how much text data is needed to train an LLM to work ok? Im not after an AI that could answer general question, only an AI that should be able to answer what I already know exist in the data.

2 - I understand that the more structured the data are, the better, but how important is it when training an LLM with structured data? Does it just figuring stuff out anyways in a good way mostly?

3 - Any recommendation where to start, how to run an LLM AI locally, train on your own data?

0x0082y ago

/r/localllama is probably the place where you want to ask your questions. They are very up to date and lots of good recommendations there.

gavmor2y ago· 1 in thread

Thanks for sharing! I look forward to playing with this once I get off my phone. Took a look at the code, though, to see if you've implemented any of the tricks I've been too lazy to try.

`text_splitter=RecursiveCharacterTextSplitter( chunk_size=8000, chunk_overlap=4000)`

Does this simple numeric chunking approach actually work? Or are more sophisticated splitting rules going to make a difference?

`vector_store_ppt=FAISS.from_documents(text_chunks_ppt, embeddings)`

So we're embedding all 8000 chars behind a single vector index. I wonder if certain documents perform better at this fidelity than others. To say nothing of missed "prompt expansion" opportunities.

0x0082y ago

Of all the off the shelf text splitters I have tried, the recursive character splitter actually performs really well. Especially if the chunk size is so large you will likely have more than the actual needed context in a chunk anyway.

Regarding the index usually a mix of BM25 and vector index seems to perform best for most generic data.

NKosmatos2y ago

Looks promising, especially if you can select just your docs and avoid interacting with Mistral. I’ll give it a try to see how it performs. So far I’ve had mixed results with other similar solutions.

https://news.ycombinator.com/item?id=39925316

https://news.ycombinator.com/item?id=39896923

PhilippGille2y ago

Previous submission from 20 days ago: https://news.ycombinator.com/item?id=39734406

mdrzn2y ago

Tried giving it a folder with a bunch of .pdfs, it takes soooo long to index them (and there's no progress bar or status indicator anywhere), and once I ask a question it's just stuck on "Dot is typing" for an hour. Maybe add an option to stream the output, at least I understand if it's doing something or not?

bee_rider2y ago

Imagine the marketing coup, when we’re all saying “Machine learning? Eh, it’s all just a bunch of Dot’s products.”

j / k navigate · click thread line to collapse

42 comments

34 comments · 11 top-level

reacharavindh2y ago· 11 in thread

I’m curious to try it out. There seem to be many options to upload a document and ask stuff about it.

But, the holy grail is an LLM that can successfully work on a large corpus of documents and data like slack history, huge wiki installations and answer useful questions with proper references.

I tried a few, but they don’t really hit the mark. We need the usability of a simple search engine UI with private data sources.

haizhung2y ago

Differentiale search indices go into this direction: https://arxiv.org/abs/2202.06991

verdverm2y ago

https://www.kapa.ai/ seems to be the most popular saas for developer tools & docs. I'm seeing it all over the place

victor1062y ago

Used it, it’s just glorified marketing and among all the solutions we tried it ranked in the bottom three.

The best at least for now is to just use OpenAI’s custom gpt and with some clever (but not hard) it’s quite good.

2 more replies

snowfield2y ago

Rag is limited in that sense. Since the max amount of data you can send is still limited by the token amount that the LLM can process.

But if all you wanted is a search engine that's a bit easier.

boredemployee2y ago

I think it depends of what they want. Like a search is indeed an easy solution, but if they want a summarization or a generated, straight answer so then things get a little bit harder.

1 more reply

HeavyStorm2y ago

The LLM would have to be trained on the local data. Not impossible, but maybe too costly?

1 more reply

IanCal2y ago

I'd like to play with giving it more turns. When answering a question the note interesting ones require searching, reading, then searching again, reading more etc.

rgrieselhuber2y ago

This gets to the heart of it. Humans are good at keeping a working memory, as a group or individuals, as lore.

cyanydeez2y ago

If anyone wants down this golden path, I'd recommend forking open search server. It's quite a feat and does the crawling part well.

https://www.opensearchserver.com/

andeee232y ago

https://storytell.ai seems to be doing what you’re looking for, especially the part with the linking to proper references

oulipo2y ago

Have you tried https://markprompt.com/ ?

turnsout2y ago· 4 in thread

Curious about the choice of FAISS. It's a bit older now, and there are many more options for creating and selecting embeddings. Does FAISS still offer some advantages?

alexpinel2y ago

simonw2y ago

What options do you think work better?

J_Shelby_J2y ago

I’m trying build an exhaustive list of realistic options. See the spreadsheet here

https://shelbyjenkins.github.io/blog/retrieval-is-all-you-ne...

turnsout2y ago

I don’t have an opinion, just wondering why they didn’t choose other another option such as Sentence Embeddings, OpenAI embeddings, etc.

1 more reply

eole6662y ago· 3 in thread

Looks nice! But some informations about the hardware requirement are often missing in this kind of project :

- how much ram is needed

- what CPU do you need for decent performances

- can it run on a GPU? And if it does how much vram do you need / does it work only on Nvidia?

prosunpraiser2y ago

Tinkering with building a RAG with some of my documents using the vector stores and chaining multiple calls now.

spxneo2y ago

how does 7b match up to Mistral 8x7B?

coming from chatgpt4 it was a huge breath of fresh air to not deal with the judeo-christian biased censorship.

i think this is the ideal localllama setup--uncensored, unbiased, unlimited (only by hardware) LLM+RAG

1 more reply

alexpinel2y ago

pentagrama2y ago· 2 in thread

Not sure if install the Windows GPU or CPU app version [1].

I have:

Processor: Ryzen 5 3600

Video card: Geforce GTX 1660 TI 6Gb DDR6 (Zotac)

RAM: 16Gb DDR4 2666mhz

Any recommendations?

[1] https://dotapp.uk/download.html

alexpinel2y ago

With those settings I would recommend GPU. CUDA acceleration really makes it faster, but keep in mind the CUDA toolkit 12.2 install will be a some 3-4gb

pentagrama2y ago

Thank you!

logro2y ago· 1 in thread

I have a reasonably wast library of technical/scientific epubs/documents. Could I use this to import them and the quiz the books?

alexpinel2y ago

MasterYoda2y ago· 1 in thread

My question is.

2 - I understand that the more structured the data are, the better, but how important is it when training an LLM with structured data? Does it just figuring stuff out anyways in a good way mostly?

3 - Any recommendation where to start, how to run an LLM AI locally, train on your own data?

0x0082y ago

/r/localllama is probably the place where you want to ask your questions. They are very up to date and lots of good recommendations there.

gavmor2y ago· 1 in thread

Thanks for sharing! I look forward to playing with this once I get off my phone. Took a look at the code, though, to see if you've implemented any of the tricks I've been too lazy to try.

`text_splitter=RecursiveCharacterTextSplitter( chunk_size=8000, chunk_overlap=4000)`

Does this simple numeric chunking approach actually work? Or are more sophisticated splitting rules going to make a difference?

`vector_store_ppt=FAISS.from_documents(text_chunks_ppt, embeddings)`

So we're embedding all 8000 chars behind a single vector index. I wonder if certain documents perform better at this fidelity than others. To say nothing of missed "prompt expansion" opportunities.

0x0082y ago

Regarding the index usually a mix of BM25 and vector index seems to perform best for most generic data.

NKosmatos2y ago

https://news.ycombinator.com/item?id=39925316

https://news.ycombinator.com/item?id=39896923

PhilippGille2y ago

Previous submission from 20 days ago: https://news.ycombinator.com/item?id=39734406

mdrzn2y ago

bee_rider2y ago

Imagine the marketing coup, when we’re all saying “Machine learning? Eh, it’s all just a bunch of Dot’s products.”

j / k navigate · click thread line to collapse