Is anyone using self hosted LLM day to day and training it like a new employee

100 pointsreachableceo2y ago72 comments

I have this idea to use LLM daily. Train it on my emails / notes / chats . Have it draft replies and I edit them as needed and it learns from that.

Is anyone doing anything like that? I have all of the open source stuff downloaded (models , lollms-webui , promptfoo, etc ) and have been experimenting with the interactive chat stuff . Also txtai to make semantic search.

That all seems pretty mature / progressing nicely . A few more months and I expect a clear reference stack will emerge .

What about the assistant stack ? I invest all these resources to self host and feed in all my data. I want to maximize the ROI.

Is anyone using self hosted LLM day to day and training it like a new employee

100 pointsreachableceo2y ago72 comments

I have this idea to use LLM daily. Train it on my emails / notes / chats . Have it draft replies and I edit them as needed and it learns from that.

That all seems pretty mature / progressing nicely . A few more months and I expect a clear reference stack will emerge .

What about the assistant stack ? I invest all these resources to self host and feed in all my data. I want to maximize the ROI.

72 comments

60 comments · 16 top-level

abstrct2y ago· 13 in thread

The most limiting factor I’ve come across is hitting the context window. Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.

ijwann2y ago

> Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.

Seems very realistic!

jacquesm2y ago

No, it would be realistic if after two weeks on the job they start telling you how to run the company.

1 more reply

bentt2y ago

Are there methods to "summarize what they've learned" and then replace the context window with the shorter version? This seems like pretty much what we do as humans anyway... we need to encode our experiences into stories to make any sense of them. A story is a compression and symbolization of the raw data one experiences.

filterfiber2y ago

Yeah that's a fairly well studied one. Most of these techniques are rather "lossy" compared to extending the context window. The most likely "real solution" is going to be using various tricks and finetuning on higher context lengths to just extend the context window.

Here's a bunch of other related methods,

Summarizing context - https://arxiv.org/abs/2305.14239

continuous finetuning - https://arxiv.org/pdf/2307.02839.pdf

retrieval augmented generation - https://arxiv.org/abs/2005.11401

knowledge graphs - https://arxiv.org/abs/2306.08302

augmenting the network a side network - https://arxiv.org/abs/2306.07174

another long term memory technique - https://arxiv.org/abs/2307.02738

2 more replies

abstrct2y ago

I’ve absolutely explored this idea but, similar to lossy compression, sometimes important nuance is lost in the process. There is both an art and science to recalling the gently compacted information and being able to recognize when it needs to be repeated back.

1 more reply

nonameiguess2y ago

The animal brain equivalent isn't summarize a context window to account for limited working memory. It's to never leave training mode to go into inference-only mode. The learned models in animal brains never stop learning.

There is nothing stopping someone from keeping an LLM in online-training mode forever. We don't do that because it's economically infeasible, not because it wouldn't work.

abdullin2y ago

Putting too much information in the context window is counter-productive in my experience. Low signal/noise ratio tends to increate the likelihood of model hallucinations, and we don't want that!

What works in my experience - structuring the task similar to a human-driven workflow, breaking it down into small steps is needed. Each step could be driven by a small prompt, relevant document fragments (if RAG is used) and condensed essays/tutorials/guides that were written by a powerful LLM (ideally, GPT-4 pre-Turbo).

Using this approach, you could stay well below 8k token limit even on the most demanding tasks.

(Big size contexts are leaky on all LLMs anyway)

kqr2y ago

What about some generation-augmented retrieval augmented generation set-up where all your conversations are indexed for regular text search, and then you use the LLMs language knowledge to generate relevant search phrases the results of which are included in the current prompt?

nerdponx2y ago

I would imagine that daily "training" here involves something more like RLHF than just appending to a big prompt.

coryrc2y ago

I think you'll need to save good responses (and bad responses that you fixed?) and regularly run more training passes.

abstrct2y ago

Yeah, especially with a large knowledge base I find it important to keep a log of prompts/responses and perform team reviews of both. It’s honestly making more work than it’s saving at the moment with the hope that it’ll be more helpful down the road. On the plus side it’s made the team more interested in tasks around technical documentation and marketing material, so still a win!

more_corn2y ago

The solution is RAG

Belomolo2y ago

Funny but also true in real life :-(

I start to feel like a one eye king under blind people.

I even remember sometimes when I told people specific things.

TrevorJ2y ago· 10 in thread

I'm pretty interested in this as well. I have moved from Notion to Obsidian for my personal notes, to-do lists and errata in preparation for this since obsidian uses local plaintext files.

What I would love to get working at some point is allowing an LLM access to my schedule, notes and goals and then have it help prompt me at appropriate times. "Hey, TJ I noticed you haven't worked out this week, it's sunny today this might be a good time". That sort of thing.

There seem to be good tooling around agents, prompt engineering, RAG etc. The 'glue' around getting the LLM to help figure out when appropriate time(s) to check in with me is the bit I am missing, but that's probably mostly down to me being an artist and only a very very JR hobbyist programmer though.

Casteil2y ago

Having come from Notion also, I LOVE Obsidian for its non-proprietary markdown file structure. Incredible powerful plugins, too.

FWIW, it's not really what you're seeking... but there is a plugin that allows you to invoke an LLM from within Obsidian (via Ollama): https://github.com/hinterdupfinger/obsidian-ollama

In short, it allows you to set up prompts to act on selected text directly within a file, e.g. 'Summarize this selection as a markdown formatted list of key points', 'Write a PRD', 'Translate to [Language]', 'Run this as a prompt', etc.

TrevorJ2y ago

That's a pretty slick plugin I will have to check it out, thanks for the suggestion.

unoti2y ago

> What I would love to get working at some point is allowing an LLM access to my schedule, notes and goals and then have it help prompt me at appropriate times. "Hey, TJ I noticed you haven't worked out this week, it's sunny today this might be a good time". That sort of thing.

If you work in a Microsoft world, this is what GraphAPI is all about: enabling access to all the things using your personal authentication token. This includes emails, calendar, OneNote, One Drive, and essentially everything. I've been working on making an easy to use OneNote provider with GraphAPI underneath it that I can use to work with the LLM.

https://learn.microsoft.com/en-us/graph/overview

https://learn.microsoft.com/en-us/graph/use-the-api

TrevorJ2y ago

interesting, thank you.

willsmith722y ago

I've been thinking about the 'glue' bit too.

I think a cron job run say every hour would be good enough. It would just have to collect all the inputs and make a decision about how "valuable" it would be to check in (haven't worked out in 3 days, very valuable), then it's just about connecting it to the right outputs (email, push noti, ...)

The job itself would be cheap and trivial to host on something like lambda

kolinko2y ago

I’m working on this right now, will be open sourcing soon probably :)

TrevorJ2y ago

That's pretty cool - let us know if you do!

sureglymop2y ago

Just be aware of the security implications. Maybe it's unrealistic but what if someone sends you a calendar invite containing a prompt injection? It may not seem like a big issue but at worst (e.g. with github copilot) something like this may lead to remote code execution.

TrevorJ2y ago

Yeah, it's a good point. Have a feeling we will see a lot of sneaky security issues around LLM's over the next few years to say the least.

pcdoodle2y ago

I like it.

valine2y ago· 4 in thread

Training a local LLM on individual facts is a tricky one. Typically it’s not possible to train with a limited quantity of data and expect the model to generalize on that data well. In context learning generalizes well, but it’s a bad fit for an “employee” model that’s supposed to learn over a long stretch of time.

If your goal is to bake new concepts into the model weights, your only real option is a dataset with that concept being used in a wide variety of contexts.

A more feasible approach I think would be retrial augmented generation. You’d essentially store your conversations in a database and calculate embeddings as you go. This would allow you to later do a natural language search of the database, and insert the most relevant portion of the conversation into your context window.

hushpiper2y ago

Yeah, I think training for facts in general is kind of problematic since you often have to overfit and the model may lose capability in other areas. I suspect that the only situations where it really makes sense to train on facts are where the facts are very nuanced and require a lot of interpretation, or more often where the facts are just so extensive that they can't be crammed effectively into the context window you're working with. Otherwise, you're better off with a vector db and a well-written prompt.

qup2y ago

What if we just train it to respect facts in general, then couldn't we just supply it a list of facts?

Sort of how they made chatGPT way more likely to obey requests?

1 more reply

abdullin2y ago

Embeddings can be tricky. They are just an average semantic vector over a chunk of text.

There is a high chance that a plain similarity search (dot product or cosine distance) will bring a lot of noise and junk into the request. And high noise/signal ratio in the context tends to lead to hallucinations.

valine2y ago

It’s not perfect, if you know of a better alternative I would genuinely love to hear about it.

1 more reply

semireg2y ago· 4 in thread

As a solo developer answering emails that basically point people to various guides and FAQs I’ve published … I need this. Zendesk claims to have an AI component but forces you to input all training data into their own wiki knowledge base. I can see why they don’t want to use prior responses as training (pii concerns), but at least give me some boilerplate responses that I can use to get a head start and further train the model(s).

mtlynch2y ago

Why not do it the old fashioned way and hire a human for this? Humans also have the advantage that they don't just make up answers when they don't know something (or at least if you hire good ones).

I've had good experience hiring support folks and working with them on a shared inbox (we use HelpScout).

nicbou2y ago

I considered both options for allaboutberlin.com. I want to offer either better search, or personalised advice.

Cost is a huge problem. An immigration lawyer would continuously eat into my personal income. If I don't get state funding to cover it, it makes zero business sense. It would also come with all the liability of a first employee.

A GPT that cites my website, German law and a dozen official websites would be a game changer. It could not give advice, but it could find answers like Phind does. It's just tech, with lower running costs and virtually no obligations (unlike an employee).

I just don't know if the result would be useful and trustworthy enough, and it's very expensive to try.

My conclusion is that I should focus on building a good knowledge base, and when the time is ripe, I can augment it with fancy tech.

LegitShady2y ago

he is the human that does this.

1 more reply

htrp2y ago

You just need embedding based search.

MyFirstSass2y ago· 4 in thread

Local models have taken a mind boggling leap over the past months so i'm sure we'll be able to add layers soon by ourselves even on a laptop?

Seriously this is not far from Chat GPT 3.5 in only 6.7GB's and runs on a Macbook Air:

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF

But yeah current context windows are limiting.

rodrigodlu2y ago

I was testing this one days ago. It seems fine to use as a base for extra finetuning, but failed hard questions that chatgpt nailed.

One example was trying to use as a assistant to beat long games, without immediate rewards.

I was trying to log and simultaneously get feedback playing Stardew Valley. gpt-3.5-turbo-1106 basically went along with me and my daughter in a coop session giving nice suggestions, sometimes with huge gaps, but easy enough to ask more about after giving more context.

Mistral 7b and 13B was basically mixing up stardew valley with WoW and Genshin Impact, even giving a lot of context about the day I was, what the npcs answered, or things that I know on how to solve a certain quest. It straight made up non existing towns (stardew valley only has one) etc, etc.

I was running the model on a separate gaming notebook, with nvidia, while playing the game on the one I'm using now.

MyFirstSass2y ago

True, and makes sense that the logic is closing in but the breadth of the data is too narrow in 7GB's to ask questions about niche topics.

Mistral hasn't released their own official 13B/30B's yet, but i'm really looking forward to what they can do.

What is crazy is that Ultrafastbert, Speculative, Jacobi, or lookahead decoding could potentially speed up by up to 80x depending on size which could make GPT-4 like models feasible on entry level macs / Phones if similar wizardry is done memory wise.

..Yes im very optimistic after the insane progress over the last months with models like Mistral, Deepseek etc.

nerpderp822y ago

With RAG and fine tuning (which is cheap), you can fine tune a model on a daily basis so that one isn't trying to stuff everything in the context window.

TrevorJ2y ago

I think with RAG it's pretty reasonable. Put your corpus in pinecone or some other vector store and relevant sections get injected along with your prompt which lessens the burden on context window.

hathawsh2y ago· 3 in thread

I would like to extend the question: is anyone building a homelab for the specific purpose of training a LLM on their personal info? The choice of hardware (for speed, cost, and noise concerns) seems important.

rodrigodlu2y ago

For me, nothing fancy, I just added extra ram to a gaming notebook to get enough speed on answers, since it already as a good nvidia card, keeping the api open for access from another laptop I have, via api inside my local network.

I have an extra computer for services like filesharing, samba, nfs, git, firewall, etc, for instance caching the models I'm downloading with a squid proxy, so I can test several UIs downloading the same model over again. Not every UI is offering an easy way to set a single folder to store all gguf files, or it's lacking documentation.

I'm already having a lot of fun. There's people already doing much more than this. I'm more worried about integrating and gluing in a way that will become transparent after the new year, local models or not.

Also how to glue this with obsidian/logseq/neovim/etc in a way that I can use with fewest keystrokes possible, instead just uploading a gigantic context or sensible source code files.

hushpiper2y ago

I admittedly have not done this myself, but the hardware choices are less complex than you might think. The only truly important part is the GPU, and the number of GPUs on the consumer market that can handle a local LLM is presently quite small. It's pretty much dual-wielding 3090's or 4090's or bust. If you're running but not training a (relatively) small one, you can do well with just one. But if you want to run inference _and_ training on one large enough to be a consistent daily assistant (read: 70b+), cloud hardware for the training is just far more practical and economical. You're generally not gonna wanna drop the $$ for an A100 or two just to do a small training run every couple of weeks on a dataset made up of the amount of RLHF'd conversations one person might have with an LLM in that time.

spaceywilly2y ago

I think a good use case for this would be auto generation of code documentation. There are many reasons to not want to upload your source code to a cloud AI service, but having an AI that was trained on your local code base so you could ask it “what does this foobar function do anyways?” would be killer.

abdullin2y ago· 2 in thread

I've been building workflow assistants that make existing employees more productive or enable entirely new business models. Some of these assistants use selected local models (due to cost or privacy factors)

Currently the stack is gravitates around:

- GPT-4 - either to drive the entire workflow OR generate prompts, plans and guidelines for the local models to execute.

- structured knowledge bases (either derived from existing sources OR curated manually by companies to drive AI assistants).

- embedding search indexes, augmented by full-text search. Usually LLM has access to the search engine and can drive the search as needed, refining the queries if results aren't good enough.

All of that is instrumented with logic to capture user feedback at every single step. This is crucial for the continuous improvement of the model!

Bigger model can use this information once in a while, to improve plans and workflow guidelines to make the overall process more efficient.

AMA, if needed!

Havoc2y ago

Do you use a framework to pull it all together? Like Langchain etc

abdullin2y ago

LangChain is good for the demos and learning, but it is too complex and brittle for my taste.

Using a bit of boilerplate code (a couple of python files) that I copy to new projects.

willsmith722y ago· 2 in thread

Everyone I know just uses the hosted ones, because of the sheer performance gap.

For now, you can do all the custom/manual training you want, but gpt4 will almost always outperform it with the right context.

Hopefully that will change in the future. Even then, I don't expect people to want to self-host as in on their own machines. More like custom training, then host either on SAAS or PAAS, or their own on-prem if they have it. Spending the performance of a personal laptop isn't worth the reduction of performance on other tasks. Again, maybe that will change.

abdullin2y ago

It doesn't need to be an exclusive choice. Hosted and local models can complement each other.

On one of the projects I used Chat GPT-4 to write instructions/tutorials that were then executed by local model on a large chunk of data (cleaning up product catalogues).

GPT-4 then reviewed some results and fine-tuned the instructions.

willsmith722y ago

interesting, what does the local model do?

1 more reply

dmezzetti2y ago· 2 in thread

Cool use case, glad to see txtai [1] is helping (I'm the main dev for txtai).

Since you're using txtai, this article I just wrote yesterday might be helpful: https://neuml.hashnode.dev/build-rag-pipelines-with-txtai

Looks like you've received a lot of great ideas here already though!

1 - https://github.com/neuml/txtai

tibanne2y ago

Thanks. I build something similar (didn't know it was called RAG) about 6 months ago. I found the most difficult part was interfacing with the existing systems and extracting the documentation out of those. Google docs, Notion, Slack. Do you have any advice on easier ways to do this? Are there any libraries around that make this task a bit lighter?

dmezzetti2y ago

Well for Google Docs, Notion, and Slack they all have APIs that are pretty straightforward to use.

xtracto2y ago

I was thinking on doing something among the same lines but with code:

An LLM that is specifically trained for Software Development, and to which I feed the code of all my company's repositories. And I keep feeding commits/pull requests.

The idea is that I can query it about architectural issues, code improvements, and other technical aspects at different levels of abstraction (code, architecture, business, etc).

So far, I've played a bit with CodeRabbit and it's "just ok" but it is more of a very small windows to what "could be" than being actually useful.

shanelleroman2y ago

I've been using & contributing to Lightrail (https://github.com/lightrail-ai/lightrail). Each instance comes with a local vectorDB and integrates with apps like Chrome & VSCode, so I can read in content like my notes, emails, etc. It doesn't support self-hosted LLMs yet unfortunately!

gumboshoes2y ago

There's Rewind.ai for macOS, which tracks all audio, video, and text it can see as you work, then lets you query it via its local LLM chat. Works pretty well. Also can summarize meetings and work with your calendar in certain ways. It does not use local documents you have not viewed on-screen; it doesn't index your file directories.

johntash2y ago

I have a similar goal/desire as you. My current project is ingesting this type of data into Elasticsearch with vector embeddings and using a normal search+knn to generate some context when creating a prompt.

This works reasonably well with gpt4, but my context is almost always too large for self-hosted models so far.

TheCaptain48152y ago

This is something I imagine coming out of Autogen or OpenAi Assistants in a few months. You really need multiple agents (as of now) most of the time. IMO multiple GPT4 agents ARE smart enough to accomplish a lot, it's getting them working together and setup that's the issue.

croes2y ago

But you would lose the benefit of self hosting

ChrisArchitect2y ago

Ask HN:

j / k navigate · click thread line to collapse

72 comments

60 comments · 16 top-level

abstrct2y ago· 13 in thread

The most limiting factor I’ve come across is hitting the context window. Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.

ijwann2y ago

> Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.

Seems very realistic!

jacquesm2y ago

No, it would be realistic if after two weeks on the job they start telling you how to run the company.

1 more reply

bentt2y ago

filterfiber2y ago

Here's a bunch of other related methods,

Summarizing context - https://arxiv.org/abs/2305.14239

continuous finetuning - https://arxiv.org/pdf/2307.02839.pdf

retrieval augmented generation - https://arxiv.org/abs/2005.11401

knowledge graphs - https://arxiv.org/abs/2306.08302

augmenting the network a side network - https://arxiv.org/abs/2306.07174

another long term memory technique - https://arxiv.org/abs/2307.02738

2 more replies

abstrct2y ago

1 more reply

nonameiguess2y ago

There is nothing stopping someone from keeping an LLM in online-training mode forever. We don't do that because it's economically infeasible, not because it wouldn't work.

abdullin2y ago

Putting too much information in the context window is counter-productive in my experience. Low signal/noise ratio tends to increate the likelihood of model hallucinations, and we don't want that!

Using this approach, you could stay well below 8k token limit even on the most demanding tasks.

(Big size contexts are leaky on all LLMs anyway)

kqr2y ago

nerdponx2y ago

I would imagine that daily "training" here involves something more like RLHF than just appending to a big prompt.

coryrc2y ago

I think you'll need to save good responses (and bad responses that you fixed?) and regularly run more training passes.

abstrct2y ago

more_corn2y ago

The solution is RAG

Belomolo2y ago

Funny but also true in real life :-(

I start to feel like a one eye king under blind people.

I even remember sometimes when I told people specific things.

TrevorJ2y ago· 10 in thread

I'm pretty interested in this as well. I have moved from Notion to Obsidian for my personal notes, to-do lists and errata in preparation for this since obsidian uses local plaintext files.

Casteil2y ago

Having come from Notion also, I LOVE Obsidian for its non-proprietary markdown file structure. Incredible powerful plugins, too.

FWIW, it's not really what you're seeking... but there is a plugin that allows you to invoke an LLM from within Obsidian (via Ollama): https://github.com/hinterdupfinger/obsidian-ollama

TrevorJ2y ago

That's a pretty slick plugin I will have to check it out, thanks for the suggestion.

unoti2y ago

https://learn.microsoft.com/en-us/graph/overview

https://learn.microsoft.com/en-us/graph/use-the-api

TrevorJ2y ago

interesting, thank you.

willsmith722y ago

I've been thinking about the 'glue' bit too.

The job itself would be cheap and trivial to host on something like lambda

kolinko2y ago

I’m working on this right now, will be open sourcing soon probably :)

TrevorJ2y ago

That's pretty cool - let us know if you do!

sureglymop2y ago

TrevorJ2y ago

Yeah, it's a good point. Have a feeling we will see a lot of sneaky security issues around LLM's over the next few years to say the least.

pcdoodle2y ago

I like it.

valine2y ago· 4 in thread

If your goal is to bake new concepts into the model weights, your only real option is a dataset with that concept being used in a wide variety of contexts.

hushpiper2y ago

qup2y ago

What if we just train it to respect facts in general, then couldn't we just supply it a list of facts?

Sort of how they made chatGPT way more likely to obey requests?

1 more reply

abdullin2y ago

Embeddings can be tricky. They are just an average semantic vector over a chunk of text.

valine2y ago

It’s not perfect, if you know of a better alternative I would genuinely love to hear about it.

1 more reply

semireg2y ago· 4 in thread

mtlynch2y ago

Why not do it the old fashioned way and hire a human for this? Humans also have the advantage that they don't just make up answers when they don't know something (or at least if you hire good ones).

I've had good experience hiring support folks and working with them on a shared inbox (we use HelpScout).

nicbou2y ago

I considered both options for allaboutberlin.com. I want to offer either better search, or personalised advice.

I just don't know if the result would be useful and trustworthy enough, and it's very expensive to try.

My conclusion is that I should focus on building a good knowledge base, and when the time is ripe, I can augment it with fancy tech.

LegitShady2y ago

he is the human that does this.

1 more reply

htrp2y ago

You just need embedding based search.

MyFirstSass2y ago· 4 in thread

Local models have taken a mind boggling leap over the past months so i'm sure we'll be able to add layers soon by ourselves even on a laptop?

Seriously this is not far from Chat GPT 3.5 in only 6.7GB's and runs on a Macbook Air:

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF

But yeah current context windows are limiting.

rodrigodlu2y ago

I was testing this one days ago. It seems fine to use as a base for extra finetuning, but failed hard questions that chatgpt nailed.

One example was trying to use as a assistant to beat long games, without immediate rewards.

I was running the model on a separate gaming notebook, with nvidia, while playing the game on the one I'm using now.

MyFirstSass2y ago

True, and makes sense that the logic is closing in but the breadth of the data is too narrow in 7GB's to ask questions about niche topics.

Mistral hasn't released their own official 13B/30B's yet, but i'm really looking forward to what they can do.

..Yes im very optimistic after the insane progress over the last months with models like Mistral, Deepseek etc.

nerpderp822y ago

With RAG and fine tuning (which is cheap), you can fine tune a model on a daily basis so that one isn't trying to stuff everything in the context window.

TrevorJ2y ago

I think with RAG it's pretty reasonable. Put your corpus in pinecone or some other vector store and relevant sections get injected along with your prompt which lessens the burden on context window.

hathawsh2y ago· 3 in thread

rodrigodlu2y ago

Also how to glue this with obsidian/logseq/neovim/etc in a way that I can use with fewest keystrokes possible, instead just uploading a gigantic context or sensible source code files.

hushpiper2y ago

spaceywilly2y ago

abdullin2y ago· 2 in thread

Currently the stack is gravitates around:

- GPT-4 - either to drive the entire workflow OR generate prompts, plans and guidelines for the local models to execute.

- structured knowledge bases (either derived from existing sources OR curated manually by companies to drive AI assistants).

- embedding search indexes, augmented by full-text search. Usually LLM has access to the search engine and can drive the search as needed, refining the queries if results aren't good enough.

All of that is instrumented with logic to capture user feedback at every single step. This is crucial for the continuous improvement of the model!

Bigger model can use this information once in a while, to improve plans and workflow guidelines to make the overall process more efficient.

AMA, if needed!

Havoc2y ago

Do you use a framework to pull it all together? Like Langchain etc

abdullin2y ago

LangChain is good for the demos and learning, but it is too complex and brittle for my taste.

Using a bit of boilerplate code (a couple of python files) that I copy to new projects.

willsmith722y ago· 2 in thread

Everyone I know just uses the hosted ones, because of the sheer performance gap.

For now, you can do all the custom/manual training you want, but gpt4 will almost always outperform it with the right context.

abdullin2y ago

It doesn't need to be an exclusive choice. Hosted and local models can complement each other.

On one of the projects I used Chat GPT-4 to write instructions/tutorials that were then executed by local model on a large chunk of data (cleaning up product catalogues).

GPT-4 then reviewed some results and fine-tuned the instructions.

willsmith722y ago

interesting, what does the local model do?

1 more reply

dmezzetti2y ago· 2 in thread

Cool use case, glad to see txtai [1] is helping (I'm the main dev for txtai).

Since you're using txtai, this article I just wrote yesterday might be helpful: https://neuml.hashnode.dev/build-rag-pipelines-with-txtai

Looks like you've received a lot of great ideas here already though!

1 - https://github.com/neuml/txtai

tibanne2y ago

dmezzetti2y ago

Well for Google Docs, Notion, and Slack they all have APIs that are pretty straightforward to use.

xtracto2y ago

I was thinking on doing something among the same lines but with code:

An LLM that is specifically trained for Software Development, and to which I feed the code of all my company's repositories. And I keep feeding commits/pull requests.

The idea is that I can query it about architectural issues, code improvements, and other technical aspects at different levels of abstraction (code, architecture, business, etc).

So far, I've played a bit with CodeRabbit and it's "just ok" but it is more of a very small windows to what "could be" than being actually useful.

shanelleroman2y ago

gumboshoes2y ago

johntash2y ago

This works reasonably well with gpt4, but my context is almost always too large for self-hosted models so far.

TheCaptain48152y ago

croes2y ago

But you would lose the benefit of self hosting

ChrisArchitect2y ago

Ask HN:

j / k navigate · click thread line to collapse