Ask HN: What's the coolest non standard application of LLMs you've seen?

68 pointsiceman_w2y ago61 comments

Something other than the chat bots.

61 comments

55 comments · 17 top-level

GolDDranks2y ago· 8 in thread

I'm attempting to create a frequency list of words for language learners. (In Japanese.)

Commonly, these lists are based in just what word appears in the text at "surface" level. However, words commonly have multiple "senses" or nuances of meaning in which they are used. Dictionaries list these senses, but it has been traditionally hard to disambiguate which sense the word is used in, given an usage in text.

LLM's make this feasible, so I'm attempting to create a word sense/usage frequency list.

interloxia2y ago

Consider using fastText's word vectors. They have a bunch of languages that come pre sorted by frequency and are sufficient for basic word sense. Perhaps use a LLm to automate some of the disambiguation.

https://fasttext.cc/docs/en/crawl-vectors.html

https://news.ycombinator.com/item?id=13771292 (6 years ago)

Aligning the fastText vectors of 78 languages

https://github.com/babylonhealth/fastText_multilingual/blob/...

GolDDranks2y ago

Thanks, I look into these.

tkgally2y ago

That’s a great idea. I hope it can be done for other languages, too.

I used to help prepare study materials for Japanese learners of English. The other editors and I would try to adjust the vocabulary to keep it at an appropriate level for the target learners. Word-frequency lists provided some guidance, but they showed only how often words appeared in the surveyed texts, not the meanings in which they were used. The word “medium,” for example, might have a fairly high frequency, but could we expect the learners to know the meanings “a substance through which a force travels” or “someone who claims to have the power to receive messages from dead people”?

A similar problem was with multiword idioms. The verb “make” is one of the most common words in English, but how common are “make it,” “make do,” “make up,” “make away with,” or “make out”? Ten years ago, I was unable to find any reliable answers. We had to rely on our gut feelings.

Good luck with your project. LLMs should be a big help.

GolDDranks2y ago

Thanks you! Yep, multi-word idioms are tough. How do you quantify whether a phrase is just a "sum" of it's words, or is there some additional meaning, "idiomness" to it. I haven't thought a lot about that yet, but it's a problem that I need to solve for this.

1 more reply

wodenokoto2y ago

Can you talk a little more about the process? I’m guessing you’re not just prompting gpt to list most common words.

Are you asking the LLM to annotate text and then count number of annotations?

How do you make sure that each disambiguation has a stable label throughout?

GolDDranks2y ago

Basically, I have a big corpus of text (novels, as I'm interested in getting the learners to read) and a dictionary. I annotate the words using the dictionary, and then give the text context, the target word and the possible dictionary definitions as input to LLM, and I let it select or score, which definitions could be considered to "apply" given the context. Finally, I tally the counts.

The disambiguated senses are provided by the dictionary. Does that answer your question?

wenc2y ago

How about the highest frequency phrases and variations?

As a language learner, I’ve found that high frequency word lists to not be that useful. It’s too atomic of a unit devoid of context. Memorizing word lists don’t lead to speaking a language — but learning phrases often do. Even better is to learn phrases within a context, like a restaurant or a lecture.

LLMs might actually add value. Word frequencies are simply statistical counts, but finding common phrases is a more co more complicated problem — and the LLMs structure (attention) might actually be the solve.

(I actually ask this if ChatGPT 4 today. I ask it to tell me the highest value phrases I should learn if I’m in a restaurant. I also ask it to break down phrases for me, and give me a lesson on conjugations etc.)

GolDDranks2y ago

Ah, yeah, totally! The whole point of this excercise is to ascend the level of "words" to get to level of "units of meaning". These commonly consist of not single words but phrases.

Also, you are absolutely correct that learning "atomic units" in isolation is not good practice. What I'm thinkin here is to get some tools to collect the data for "what". The "how" of the learning needs to happen in context.

Maxion2y ago· 6 in thread

Jira issue generator.

Custom GPT with instructions that outputs issues according to our issue templates in markdown.

Allows me to write horribly typoed bullet point lists and get out surprisingly good issues.

Gets me 80-90% done in a fraction of the time. I can then just edit them to get them to be what I need.

What I'd really want to get working is a PR desription generator.

matsemann2y ago

Why can't the bullet points just be used as is? Either they contain enough signal, or they don't and llm won't help anyways.

I fear everything will be expanded by llms soon. "write an email, three paragraphs, about X", instead of just sending X directly. Then the receiver gets a wall of text, and uses an llm to distill it back to X' before reading. Just hope too much didn't get lost in the inverse compression through llm.

Maxion2y ago

> Why can't the bullet points just be used as is? Either they contain enough signal, or they don't and llm won't help anyways.

Because it is a lot messier and harder to understand when it's not structured. Having clearly structured tickets lessens the cognitive load.

IanCal2y ago

Short is not the same as clear and concise.

kelvie2y ago

I use copilot in emacs, and running "git commit -v" puts the diff in my emacs(client) buffer with copilot on and it's not terrible at describing the changes.

A lot of times it'll even guess the JIRA ticket number from the diff or the branch name.

fl0id2y ago

Gitbutler is doing that iirc and replit also might have sth like that

videlov2y ago

Currently GitButler only generates commit messages in this way (with some config options for style e.g. semantic commits). With that said, generating PR descriptions is something I was tinkering with this morning.

(disclaimer: I'm a GitButler co-founder)

ilovefood2y ago· 5 in thread

I'm using it to filter out the content that's displayed in my browser screen as I browse: https://karimjedda.com/llms-in-the-middle-content-aware-clie...

Essentially, I wrote a small browser extension, that takes the content of LinkedIn, Twitter, YouTube posts/titles, and filters them out based on if they are clickbait, low effort, etc.

It's liberating :D

wood_spirit2y ago

The thing that made the initial chatgpt refreshing was the lack of ads - it wasn’t trying to sell you anything. This obviously will not continue; commercial pressures will direct AI efforts towards being a better ad pusher.

So the AI of the social media sites will end trying to get the crap past your local AI filters, in a big AI arms race :)

ilovefood2y ago

I would say, bring it on! Nothing will make it past my phi-2 or mistral-7B-v0.1 ^^, at least for now.

I think what this could lead to is homogenization of the content serving layer, since all you'd really need is to get content to the user that can move their filters from one site to the other, the display layer being less relevant (and differentiating). But let's see, exciting times.

firtoz2y ago

That's awesome, I want to do something similar: categorize the content in social media, so I can choose what to see when I want. Sometimes I want to avoid politics, sometimes I'm ok with it, for example. Sometimes I want to see only content about game development.

What's your plan with your project, will you turn it into a product for others, open source it, or neither? I would love it if it was either of the former!

ilovefood2y ago

Thank you very much for the supporting words, I've been getting lots of positive feedback from this. The end of year workload has made it such though that I need to be mindful of time. I think one of the two first options will be the way to go. I'll post an update here as I always do with my small projects.

gardenhedge2y ago

A video of before and after would do wonders.

Also, if this could show stats and graphs on the topics the user has been exposed to and what has been blocked out it would be amazing.

lysecret2y ago· 5 in thread

Counting my calories.

joisig2y ago

Me too, initially in a chat with GPT-4 [0] and then in a (private for now) wrapper that sends me a text message when analysis is complete, sums up the day's meals, and compares to my total calories burned per Apple Watch.

[0] https://joisig.com/gpt-4-passable-personal-nutritionist

lysecret2y ago

Very cool!:) Just went over the article and this is close to how I use it. I implemented it in an Iphone app and added some Rag tricks. Let me know if you want to try it out.

gokhan2y ago

Details pls.

maxlamb2y ago

How? From meal description?

lysecret2y ago

Yes from descriptions. ALso, often I have a rough idea of how many calories something has. So one of the main features is you can say: Protein shake with 140 cals and 35 grams of protein remember as P1, and then whenever I have the same thing again I just type P1.

I have an Iphone app I have been using for half a year (and lost 10kg), if there is interest write me Email in bio I might release it then ;)

atarian2y ago· 5 in thread

Summarizing the legal terms for a new job offer

nvy2y ago

Trusting an LLM for anything to do with legal agreements seems like a terrible idea.

orangepanda2y ago

I'd trust an LLM more than my own interpretation.

acosmism2y ago

trusting any idea that floats into your head is a terrible idea. you have to vet the ideation.

2 more replies

hereonout22y ago

Behold the wonders of the modern age!

throwup2382y ago

Fine print with prompt injection. Thank god for this modern age!

LinasKo2y ago· 3 in thread

It has to be the auto-playing Tomb Raider agent, where LLMs were used to give Lara self-awareness. I've never seen anything like it.

It starts off with some classical computer vision shenanigans to understand the character movement, map layout, and to create the 'desire' to explore. Then the LLM is given input of images, sound descriptions and prior thoughts, lettting Lara remark on the situation, which feels very surreal and, at least for me - very unexpdcted. E.g. she hears the wolves howl and wonders how they survived in this environment. Or meta-remarks on game music changes.

https://youtu.be/0wTf_bbkW2U?si=tsWJpyLrRpRDSXD9

grumbel2y ago

Worth pointing out that most of that video is fake[1]. Though it and its debunking video are still a great example how to make entertaining fictional content with a little help of AI. It probably won't be too long before somebody builds something like that for real, similar AI mods[2] for Skyrim are already out.

[1] https://www.youtube.com/watch?v=bmqUUb80ApQ

[2] https://www.nexusmods.com/skyrimspecialedition/mods/98631

ipsum22y ago

The description says its faked?

> This video may be inaccurate and is made for entertainment.

splatzone2y ago

I'm watching it now, it's quite entertaining. But there are quite a few comments on YouTube suggesting it's a hoax?

hiAndrewQuinn2y ago· 3 in thread

I've recently been experimenting with training LLMs on the personal corpus of a dear family friend who passed recently, with the intent to eventually embed the device in his tombstone up north so that people can come and commune with him.

He was a well-known tarot reader, mystic and Haskeller in the northern Finnish community; without his help it's very likely I would have been deported from the country before I could get my passport sorted out. We came up with this plan together before he passed mostly out of a really weird shared sense of humor.

_giorgio_2y ago

Can you share the size of the datasets, and some parameter used for training?

I'm trying to understand what is the minimum size of a corpus and the architecture size too.

vunderba2y ago

Yeah, I imagine it won't be long before we'll have virtual seances online for people who have passed.

Black mirror really popularized this idea too.

tomcam2y ago

Brilliant. Cannot wait to see the v1.0 announcement here on HN

hubraumhugo2y ago· 1 in thread

I was overwhelmed by the pace of AI news and papers coming out, so I built an automated HN news monitoring service that delivers relvant news straight to my inbox or my RSS feed: https://www.kadoa.com/hacksnack

It uses LLMs to extract, summarize, and tag the front page articles and classify the different perspectives in the comments.

No more FOMO :)

imarkphillips2y ago

Is this just for monitoring HN or other sites throughout the world?

I'd love to build a niche news service for a small market.

tobr2y ago· 1 in thread

Maggie Appleton shared some interesting ideas a few months ago[1]. I especially find the ”Branches” concept interesting, just the idea of exploring multiple paths from a starting point in parallel.

1: https://maggieappleton.com/lm-sketchbook

_boffin_2y ago

Shit... this is exactly what i've been thinking of for the past few months! Those examples with the UI solve so many problems and I just love the ideas!

link: https://news.ycombinator.com/item?id=38398563#38407664

Have anything more like this?

hahn-kev2y ago· 1 in thread

Honeycomb is an open telemetry tool that has a complicated search UI. They also have a text box you can use to have it query your data for you, it basically just drives the filtering and group by UI. It's really cool because it just makes the UI simpler to use, worse case it might set the wrong filter.

phillipcarter2y ago

Coming in late here, but I'm glad you like the feature! We built it exactly for the reason you describe.

devnull902y ago

Simulating The Sims with LLMs and observe their behavior https://dl.acm.org/doi/abs/10.1145/3526113.3545616

oliwary2y ago

I use GPT-4 to generate a poem as a reward for solving the daily puzzle at https://squareword.org The poem is based on words from the puzzle.

It usually manages to create a reasonably coherent and amusing poem from up to 10 completely random words, something would struggle to do myself. People tell me they enjoy them, although some of the poems turn out a bit odd haha.

Here is an example: https://x.com/SquareWordOrg/status/1660702885154377730?s=20

oliverbennett2y ago

We’re playing with embodied LLMs that can externalise thoughts in a virtual environment. The idea is to help facilitate knowledge work.

It’s not our main area of interest, but it’s been interesting to experiment with how human/machine and machine/machine interactions work in real-time when you limit how fast agents can move or write. It's much easier to engage in a dialogue with agents that can't create / move tens of sticky notes and graphics faster than you can create one.

You can see a short, old video of the environment at https://www.temin.net

swah2y ago

edit: misread the title

The uncensored one [1] - finally gave me instructions for making crack and a bomb. It felt cool that it would answer everything, like a 90s zine.

[1] https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF

Note on that page: This model is uncensored. I have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant to any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.

cyode2y ago

For me, json and yaml formatting and analysis. ChatGPT is pretty decent at the following real work tasks I used to use less robust tooling for:

- pretty print and indent “json-like” string (ex. Python object str) from a log, or json with typos (extra commas, wrong quotes, imbalanced brackets…) with a summary of errors at the end.

- verbal description (numerically listed) of the changes between two commits of a yaml file, esp when order has changed making git diff hard to read.

magicseth2y ago

Well, it's part chat bot, so I don't know if it meets your criteria. But we're using them for a LOT of things behind the scenes to help kids find content they love that their parents approve of.

[HelloWonder.ai](Hellowonder.ai)

The front end looks like a chat bot, but on the backend we're using LLMs to find, parse, rate, classify, and rephrase content on the fly for individuals.

selalipop2y ago

What about chatbots that understand products? https://notionsmith.ai

j / k navigate · click thread line to collapse

61 comments

55 comments · 17 top-level

GolDDranks2y ago· 8 in thread

I'm attempting to create a frequency list of words for language learners. (In Japanese.)

LLM's make this feasible, so I'm attempting to create a word sense/usage frequency list.

interloxia2y ago

https://fasttext.cc/docs/en/crawl-vectors.html

https://news.ycombinator.com/item?id=13771292 (6 years ago)

Aligning the fastText vectors of 78 languages

https://github.com/babylonhealth/fastText_multilingual/blob/...

GolDDranks2y ago

Thanks, I look into these.

tkgally2y ago

That’s a great idea. I hope it can be done for other languages, too.

Good luck with your project. LLMs should be a big help.

GolDDranks2y ago

1 more reply

wodenokoto2y ago

Can you talk a little more about the process? I’m guessing you’re not just prompting gpt to list most common words.

Are you asking the LLM to annotate text and then count number of annotations?

How do you make sure that each disambiguation has a stable label throughout?

GolDDranks2y ago

The disambiguated senses are provided by the dictionary. Does that answer your question?

wenc2y ago

How about the highest frequency phrases and variations?

GolDDranks2y ago

Ah, yeah, totally! The whole point of this excercise is to ascend the level of "words" to get to level of "units of meaning". These commonly consist of not single words but phrases.

Maxion2y ago· 6 in thread

Jira issue generator.

Custom GPT with instructions that outputs issues according to our issue templates in markdown.

Allows me to write horribly typoed bullet point lists and get out surprisingly good issues.

Gets me 80-90% done in a fraction of the time. I can then just edit them to get them to be what I need.

What I'd really want to get working is a PR desription generator.

matsemann2y ago

Why can't the bullet points just be used as is? Either they contain enough signal, or they don't and llm won't help anyways.

Maxion2y ago

> Why can't the bullet points just be used as is? Either they contain enough signal, or they don't and llm won't help anyways.

Because it is a lot messier and harder to understand when it's not structured. Having clearly structured tickets lessens the cognitive load.

IanCal2y ago

Short is not the same as clear and concise.

kelvie2y ago

I use copilot in emacs, and running "git commit -v" puts the diff in my emacs(client) buffer with copilot on and it's not terrible at describing the changes.

A lot of times it'll even guess the JIRA ticket number from the diff or the branch name.

fl0id2y ago

Gitbutler is doing that iirc and replit also might have sth like that

videlov2y ago

(disclaimer: I'm a GitButler co-founder)

ilovefood2y ago· 5 in thread

I'm using it to filter out the content that's displayed in my browser screen as I browse: https://karimjedda.com/llms-in-the-middle-content-aware-clie...

Essentially, I wrote a small browser extension, that takes the content of LinkedIn, Twitter, YouTube posts/titles, and filters them out based on if they are clickbait, low effort, etc.

It's liberating :D

wood_spirit2y ago

So the AI of the social media sites will end trying to get the crap past your local AI filters, in a big AI arms race :)

ilovefood2y ago

I would say, bring it on! Nothing will make it past my phi-2 or mistral-7B-v0.1 ^^, at least for now.

firtoz2y ago

What's your plan with your project, will you turn it into a product for others, open source it, or neither? I would love it if it was either of the former!

ilovefood2y ago

gardenhedge2y ago

A video of before and after would do wonders.

Also, if this could show stats and graphs on the topics the user has been exposed to and what has been blocked out it would be amazing.

lysecret2y ago· 5 in thread

Counting my calories.

joisig2y ago

[0] https://joisig.com/gpt-4-passable-personal-nutritionist

lysecret2y ago

Very cool!:) Just went over the article and this is close to how I use it. I implemented it in an Iphone app and added some Rag tricks. Let me know if you want to try it out.

gokhan2y ago

Details pls.

maxlamb2y ago

How? From meal description?

lysecret2y ago

I have an Iphone app I have been using for half a year (and lost 10kg), if there is interest write me Email in bio I might release it then ;)

atarian2y ago· 5 in thread

Summarizing the legal terms for a new job offer

nvy2y ago

Trusting an LLM for anything to do with legal agreements seems like a terrible idea.

orangepanda2y ago

I'd trust an LLM more than my own interpretation.

acosmism2y ago

trusting any idea that floats into your head is a terrible idea. you have to vet the ideation.

2 more replies

hereonout22y ago

Behold the wonders of the modern age!

throwup2382y ago

Fine print with prompt injection. Thank god for this modern age!

LinasKo2y ago· 3 in thread

It has to be the auto-playing Tomb Raider agent, where LLMs were used to give Lara self-awareness. I've never seen anything like it.

https://youtu.be/0wTf_bbkW2U?si=tsWJpyLrRpRDSXD9

grumbel2y ago

[1] https://www.youtube.com/watch?v=bmqUUb80ApQ

[2] https://www.nexusmods.com/skyrimspecialedition/mods/98631

ipsum22y ago

The description says its faked?

> This video may be inaccurate and is made for entertainment.

splatzone2y ago

I'm watching it now, it's quite entertaining. But there are quite a few comments on YouTube suggesting it's a hoax?

hiAndrewQuinn2y ago· 3 in thread

_giorgio_2y ago

Can you share the size of the datasets, and some parameter used for training?

I'm trying to understand what is the minimum size of a corpus and the architecture size too.

vunderba2y ago

Yeah, I imagine it won't be long before we'll have virtual seances online for people who have passed.

Black mirror really popularized this idea too.

tomcam2y ago

Brilliant. Cannot wait to see the v1.0 announcement here on HN

hubraumhugo2y ago· 1 in thread

It uses LLMs to extract, summarize, and tag the front page articles and classify the different perspectives in the comments.

No more FOMO :)

imarkphillips2y ago

Is this just for monitoring HN or other sites throughout the world?

I'd love to build a niche news service for a small market.

tobr2y ago· 1 in thread

1: https://maggieappleton.com/lm-sketchbook

_boffin_2y ago

Shit... this is exactly what i've been thinking of for the past few months! Those examples with the UI solve so many problems and I just love the ideas!

link: https://news.ycombinator.com/item?id=38398563#38407664

Have anything more like this?

hahn-kev2y ago· 1 in thread

phillipcarter2y ago

Coming in late here, but I'm glad you like the feature! We built it exactly for the reason you describe.

devnull902y ago

Simulating The Sims with LLMs and observe their behavior https://dl.acm.org/doi/abs/10.1145/3526113.3545616

oliwary2y ago

I use GPT-4 to generate a poem as a reward for solving the daily puzzle at https://squareword.org The poem is based on words from the puzzle.

Here is an example: https://x.com/SquareWordOrg/status/1660702885154377730?s=20

oliverbennett2y ago

We’re playing with embodied LLMs that can externalise thoughts in a virtual environment. The idea is to help facilitate knowledge work.

You can see a short, old video of the environment at https://www.temin.net

swah2y ago

edit: misread the title

The uncensored one [1] - finally gave me instructions for making crack and a bomb. It felt cool that it would answer everything, like a 90s zine.

[1] https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF

cyode2y ago

For me, json and yaml formatting and analysis. ChatGPT is pretty decent at the following real work tasks I used to use less robust tooling for:

- pretty print and indent “json-like” string (ex. Python object str) from a log, or json with typos (extra commas, wrong quotes, imbalanced brackets…) with a summary of errors at the end.

- verbal description (numerically listed) of the changes between two commits of a yaml file, esp when order has changed making git diff hard to read.

magicseth2y ago

Well, it's part chat bot, so I don't know if it meets your criteria. But we're using them for a LOT of things behind the scenes to help kids find content they love that their parents approve of.

[HelloWonder.ai](Hellowonder.ai)

The front end looks like a chat bot, but on the backend we're using LLMs to find, parse, rate, classify, and rephrase content on the fly for individuals.

selalipop2y ago

What about chatbots that understand products? https://notionsmith.ai

j / k navigate · click thread line to collapse