Beyond what's noted there (contemporary business jargon), English is diffused across the globe and has many regional variations that are different than class-signalling/formal American and British usage. As we all encounter each other online, it's not always worth over-analyzing word choice when you can understand the intent.
https://www.opensourceshakespeare.org/views/plays/play_view....
From a correctness stand-point, I think a descriptionist would be satisfied with an attested usage, especially from such a source. From a style point of view, I still find myself feeling embarrassed for the author when I encounter this usage (which is my own problem).
IMO in this context it is basically shorthand for “things I learned/lessons learned while tuning LLM…,” and either would be fine. It is sort of an informal list of stuff the author learned.
In my experience (nothing special, just another native speaker) “lessons from <event>” is the more typical American (at least) English phrase. But it is sort of close to “Lessons on.” “Lessons on” would imply more refined material that is more narrowly focused on teaching. So I wonder if the author decided they just didn’t want to worry about any confusion, or the possibility that they might misuse a phrase.
;-)
But where the Swedish word sounds natural in that language, "learnings" just sounds wrong in English, even though it apparently is technically correct.
"Using a half-precision FSDP full shard with a 1024 sequence length and a micro batch size of 2 required 63GB of VRAM on each of the eight A100 80 GB GPUs. The training, lasting three epochs, took just 20 minutes. The total cost for the VM was $8.88 per hour, resulting in $3, not including the time for experiments and bug fixes."
I wondered where you could rent cycles on a machine like that, a quick Google found that p4d.24xlarge on AWS is available, while the on-demand cost is $20.1755 per hour, the Spot is only $8.99 (I guess it's gone up?)
Cool to know I could fine-tune for only ~$3.
I've been looking for an answer to this every time I check out the current vast.ai console.
My goal is to fine-tune a model on our codebase. I find RAG to be too orthopedic, I'd really would like to train the model on what is each part of the code and how we do things and see how it responds with a more complete perspective that goes beyond context.
The options I've considered for pre-fine-tuning:
- using a service like vast.ai, runpod, gradient or similar
- use Google Collab
- getting a more powerful MacBook, M3max with plenty of RAM
Disclosure: I collected the data and built the site, but it has a ton of comparison data for GPU clouds.
Except for encrypted chats (which have bad UI and only work on one device) your messages are stored unencrypted on their servers (handed over to authorities, etc.).
WhatsApp uses end to end encryption by default. In fact, it uses the library that Signal developed. It is much more secure than Telegram, unless proved otherwise (which would need some backdoor in the application code to change its behavior).
But that's not what he says.
>I don’t want to use any third-party fine-tuning services
He might be okay with a third-party storing his messages but not using them in their models etc.
If it's so easy, then you don't need to remove it. The model will solve it easily and focus on everything else. At best, you save some parameters and compute, at worst, you are damaging its ability to learn important things like conversational skills or modeling people. When it comes to LLMs, more is more, and trying to hand-engineer the dataset or think for the LLM can backfire in very subtle and difficult to diagnose ways.
> Ok, it is capable of forming coherent sentences. The most noticeable problem is its lack of awareness regarding the context of the conversations which leads to bland and generic replies. The messages lacked any distinct style, feeling quite basic... > > Conversations have become more interesting and engaging, although there’s still a risk of losing context. Russian language performance has improved, but errors still occur. I believe that before fine-tuning for a specific task with limited data, like mine, it would be beneficial to first fine-tune the model unsupervised on a large corpus of Russian texts. Additionally, incorporating common conversation partners’ names as separate tokens might enhance the quality. I wouldn’t say it has turned out to be significantly better than LoRA. It might be more effective to focus solely on a single person and calculate the loss based only on my responses (or someone else’s), instead of trying to learn about each and every conversational partner.
That doesn't make any sense when you're dealing with a model which is so hugely over-parameterized. The model will learn the easy data that you are removing just fine. There's no 'limited data' there.
> If the model quickly reduces loss by picking up predictable phrases, it's hard to tell if it's genuinely learning or just echoing these predictable elements.
You can't interpret the loss qualitatively anyway. It's totally dependent on the details of tokenization, formatting, corpus size, etc. You still have to look at the samples or a downstream task to see if it's working well. Even quantitatively, the loss is only meaningful if you're comparing to a heldout sample or something, and then it doesn't matter if you were screwing with it like OP.
Back when the "I forced a bot to watch 1000 hours..." memes were popular (https://knowyourmeme.com/memes/i-forced-a-bot), ages ago in AI/ML time, I tried to do something similar by fine-tuning GPT-2 on messages from a group chat of my friends. Since there were years of chat data, it seemed like a really good opportunity to test whether the language model would capture everyone's personality and generate funny, uncanny-valley versions of our banter.
Turns out that the group chat was used nearly exclusively for sending funny pictures and videos (that the language model obviously couldn't see), and for making plans to meet up. The generated conversations almost exclusively consisted of a random group chat member starting with "there is a party tonight, who wants to go?" and others saying "I'm down" or "when?" or "where?" It was 0% banter, and 100% logistics.
It was pretty hilarious in its own way, but not for the reasons anyone expected! I didn't learn very much about language models with that experiment, but I did learn that my friends' group chat is actually pretty boring.
I guess the best banter happens in real life. Glad to see it worked out somewhat more interestingly for this person, even if they did allude to some similar results in their closing thoughts section.
Additionally - it's why it's important to think ahead of what you're training your model on because the model will always regress to the training data itself even if that means going backwards in ability.
It's also interesting that бля was translated to 'damn'. :)
Mark: wassup
Andy: just chilling
It simulated our conversational style and topics quite well, though GPT-2 reads like a glorified Markov chain. Sometimes the outputs were absolutely hilarious and inappropriate. GPT-2 was peak comedy.
My friend described GPT-2 as "like watching a toddler learning how to walk. When it stumbles it's cute and funny." GPT-3, not so much...
Also, it was oddly (painfully) accurate as far as personality goes... like looking into a mirror. For one thing, I talk way more, and this was reflected in the model's output. For another, I am constantly trying to turn my life around and failing, but ever optimistic... and talking about creative plans endlessly without much execution. (So GPT-2-andai ended up the same way...)
I've wondered since fine-tuning started being a thing how long it'd be before somebody makes a utility where you can dump a giant chat export into it and an API key and then it fine tunes a Telegram bot that can imitate any of your friends - would be fun to play with and even create a group chat with multiple friend-bots talking to each other to see how long until it goes off the rails.