Essentially, I wrote a small browser extension, that takes the content of LinkedIn, Twitter, YouTube posts/titles, and filters them out based on if they are clickbait, low effort, etc.
It's liberating :D
So the AI of the social media sites will end trying to get the crap past your local AI filters, in a big AI arms race :)
I think what this could lead to is homogenization of the content serving layer, since all you'd really need is to get content to the user that can move their filters from one site to the other, the display layer being less relevant (and differentiating). But let's see, exciting times.
What's your plan with your project, will you turn it into a product for others, open source it, or neither? I would love it if it was either of the former!
Also, if this could show stats and graphs on the topics the user has been exposed to and what has been blocked out it would be amazing.
It starts off with some classical computer vision shenanigans to understand the character movement, map layout, and to create the 'desire' to explore. Then the LLM is given input of images, sound descriptions and prior thoughts, lettting Lara remark on the situation, which feels very surreal and, at least for me - very unexpdcted. E.g. she hears the wolves howl and wonders how they survived in this environment. Or meta-remarks on game music changes.
[1] https://www.youtube.com/watch?v=bmqUUb80ApQ
[2] https://www.nexusmods.com/skyrimspecialedition/mods/98631
> This video may be inaccurate and is made for entertainment.
Commonly, these lists are based in just what word appears in the text at "surface" level. However, words commonly have multiple "senses" or nuances of meaning in which they are used. Dictionaries list these senses, but it has been traditionally hard to disambiguate which sense the word is used in, given an usage in text.
LLM's make this feasible, so I'm attempting to create a word sense/usage frequency list.
https://fasttext.cc/docs/en/crawl-vectors.html
https://news.ycombinator.com/item?id=13771292 (6 years ago)
Aligning the fastText vectors of 78 languages
https://github.com/babylonhealth/fastText_multilingual/blob/...
I used to help prepare study materials for Japanese learners of English. The other editors and I would try to adjust the vocabulary to keep it at an appropriate level for the target learners. Word-frequency lists provided some guidance, but they showed only how often words appeared in the surveyed texts, not the meanings in which they were used. The word “medium,” for example, might have a fairly high frequency, but could we expect the learners to know the meanings “a substance through which a force travels” or “someone who claims to have the power to receive messages from dead people”?
A similar problem was with multiword idioms. The verb “make” is one of the most common words in English, but how common are “make it,” “make do,” “make up,” “make away with,” or “make out”? Ten years ago, I was unable to find any reliable answers. We had to rely on our gut feelings.
Good luck with your project. LLMs should be a big help.
Are you asking the LLM to annotate text and then count number of annotations?
How do you make sure that each disambiguation has a stable label throughout?
The disambiguated senses are provided by the dictionary. Does that answer your question?
As a language learner, I’ve found that high frequency word lists to not be that useful. It’s too atomic of a unit devoid of context. Memorizing word lists don’t lead to speaking a language — but learning phrases often do. Even better is to learn phrases within a context, like a restaurant or a lecture.
LLMs might actually add value. Word frequencies are simply statistical counts, but finding common phrases is a more co more complicated problem — and the LLMs structure (attention) might actually be the solve.
(I actually ask this if ChatGPT 4 today. I ask it to tell me the highest value phrases I should learn if I’m in a restaurant. I also ask it to break down phrases for me, and give me a lesson on conjugations etc.)
Also, you are absolutely correct that learning "atomic units" in isolation is not good practice. What I'm thinkin here is to get some tools to collect the data for "what". The "how" of the learning needs to happen in context.
He was a well-known tarot reader, mystic and Haskeller in the northern Finnish community; without his help it's very likely I would have been deported from the country before I could get my passport sorted out. We came up with this plan together before he passed mostly out of a really weird shared sense of humor.
I'm trying to understand what is the minimum size of a corpus and the architecture size too.
Black mirror really popularized this idea too.
It uses LLMs to extract, summarize, and tag the front page articles and classify the different perspectives in the comments.
No more FOMO :)
I'd love to build a niche news service for a small market.
link: https://news.ycombinator.com/item?id=38398563#38407664
Have anything more like this?
It usually manages to create a reasonably coherent and amusing poem from up to 10 completely random words, something would struggle to do myself. People tell me they enjoy them, although some of the poems turn out a bit odd haha.
Here is an example: https://x.com/SquareWordOrg/status/1660702885154377730?s=20
It’s not our main area of interest, but it’s been interesting to experiment with how human/machine and machine/machine interactions work in real-time when you limit how fast agents can move or write. It's much easier to engage in a dialogue with agents that can't create / move tens of sticky notes and graphics faster than you can create one.
You can see a short, old video of the environment at https://www.temin.net
The uncensored one [1] - finally gave me instructions for making crack and a bomb. It felt cool that it would answer everything, like a 90s zine.
[1] https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF
Note on that page: This model is uncensored. I have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant to any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
Custom GPT with instructions that outputs issues according to our issue templates in markdown.
Allows me to write horribly typoed bullet point lists and get out surprisingly good issues.
Gets me 80-90% done in a fraction of the time. I can then just edit them to get them to be what I need.
What I'd really want to get working is a PR desription generator.
I fear everything will be expanded by llms soon. "write an email, three paragraphs, about X", instead of just sending X directly. Then the receiver gets a wall of text, and uses an llm to distill it back to X' before reading. Just hope too much didn't get lost in the inverse compression through llm.
Because it is a lot messier and harder to understand when it's not structured. Having clearly structured tickets lessens the cognitive load.
A lot of times it'll even guess the JIRA ticket number from the diff or the branch name.
(disclaimer: I'm a GitButler co-founder)
- pretty print and indent “json-like” string (ex. Python object str) from a log, or json with typos (extra commas, wrong quotes, imbalanced brackets…) with a summary of errors at the end.
- verbal description (numerically listed) of the changes between two commits of a yaml file, esp when order has changed making git diff hard to read.
[HelloWonder.ai](Hellowonder.ai)
The front end looks like a chat bot, but on the backend we're using LLMs to find, parse, rate, classify, and rephrase content on the fly for individuals.
I have an Iphone app I have been using for half a year (and lost 10kg), if there is interest write me Email in bio I might release it then ;)