Rather than doing the work of curating stories with specific educational goals, chosen or crafted by native speakers and perhaps even noted authors, which might be an ongoing process with months of preparation and numerous humans working together towards a common vision, one hobbyist can spend $.035 at a time to accumulate a cache of "good enough" stories in countless languages.
For some, it's exciting to see the barrier to entry lowered so that one hobbyist can create a tool with so much content behind it and dreams of social uplift ask us to cheer when stuff gets cheaper because it becomes more broadly available. So there are real upsides here, but of course those upsides are coming as part of a tradeoff against quality when we look at current/near LLM technology. Those quality tradeoffs aren't going to be suited to everyone, especially those who have the luxury to pay more for better things.
If I'm going to pick up speech patterns at all, I would really rather pick them up from a native speaker of the language, since at the very least I'll make the sorts of mistakes that a human might make. I want to sound like human, not like a language model. Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.
Once I'm fluent in a language, enough to recognize when the language model itself is probably making a mistake, then I might become comfortable using it. But not as my first introduction to the nuances of the language, when I'm still building my own internal representation up from scratch. After all, my goal is to converse with other human speakers. Shouldn't that be my personal training corpus? I'm a neural network too, and I don't want to feed myself bad data.
There are thousands of short stories at every level of language understanding for nearly every language in existence. I would be more interested in using AI for the languages that don't have these. (say, endangered/extinct languages, oral languages, etc)
This seems like a werid hangup for what is essentially a substitute for graded readers for language learners. You're not getting any of the things you mentioned going the "human" route.
No one is saying go read these stories over full blown novels. There's no complexity difference between full blown novels and most native short stories either, just length so that's not really an option.
If you could read at that level, you wouldn't be using this or the non-LLM alternative anyway.
An AI will generate the kind of simplistic stories I need for my current level of proficiency in my target language.
An actual human will more often than not slip some complex grammar and vocabulary before I am ready for it.
When you are at B2-C1 level, then you can appreciate the nuances of human generated stories, but for A1-A2, what difference would it make?
The AI is better because it will make all the content I can consume, immediately and ready.
"learn from an LLM who learned from other humans"
I feel like ChatGPT plus voice input (speech to text) plus voice output at different playback speeds (text to speech) would be a fine way to learn some English.
Maybe it shouldn't be the only method one uses, but I think it is a useful method.
So why not for other languages?
I'd love to use a Japanese LLM with voice capability. It could even write stories for me that I could read and listen to.
First, and maybe up for discussion, "gnocchi" in spanish is written "ñoqui". See [2] for a commercial example or check the story's title.
Second, the sentence "Después de aprender a hacer gnocchis con su abuela, Hendrik nunca le gusta los gnocchis de nadie más" is wrong: the part after the comma should read "a Hendrik nunca le gustaron los ñoquis de nadie más". I'm also unhappy with "Siempre dice que faltaba algo" as it makes a funny mix of present and past tenses.
Third, I think the last paragraph is incoherent as Hendrik learns the nutmeg trick twice (learns from grandma about nutmeg -> finds other gnocchis lacking -> learns about nutmeg).
The well-known LLMs are surprisingly bad in languages that are not English. I'm not sure I would trust them just yet.
[1] https://webbu.app/l/spanish/story/los-%C3%B1oquis-de-la-abue...
[2] https://www.pastasgallo.es/productos/noquis-de-patata-seca/
But I have found a spell checking error, inconsistent use of formal and informal voice and an expression that was not quite right in the context. I wonder about the quality assurance part in the setup.
Details are here: https://paste.chapril.org/?01537019b7d59be5#FVNAMqzsWGmtQTpM...
I agree, though, that a human writer would have probably made this clearer. It would have been made explicit that Hendrik missed the trick the first time around.
>> "Hendrik nunca le gusta los gnocchis de nadie más" will add the "a" at the begginning! thanks for spotting that.
>> "Siempre dice que faltaba algo" This is a common way of saying things, at least in my region. That makes Spanish harder, there are so many "versions".
>> "he well-known LLMs are surprisingly bad in languages that are not English" So true.
That makes me sad...
> You should be able to tap on a word and get a translation.
This functionality is available already on Kindle though :) you have the option of uploading your own dictionaries to it too.
That, combined with using https://www.clippings.io/ to manage highlighted text, makes the kindle an all round great tool for learning languages from books, or any text really. (you can use calibre to convert into and between most ebook formats)
I look forward to seeing where this goes! I imagine the really difficulty will be applying it to more than just German. Especially when you branch outside of indoeuropean languages
Beyond Kindle capabilities, it also tracks the words, kanji, sentences you read to show you analytics based on that, chart your progress against JLPT levels simply by reading, and uses that data (all in your own iCloud/device storage) to coordinate flashcard review. I recently added early Anki integration. Still working hard on it, bringing more languages soon and cross-plat via SwiftWASM a bit later.
Ultimately, I got sick of splicing together different home-grown tools and different web sites that I started to build my own to provide a more integrated learning experience that adapts to how students want to learn.
I am working on a Japanese course with a well-known Japanese teacher, and we're hoping to start rolling that out soon. Feel free to DM me if you're interested in that or want to chat about learning Japanese.
LingQ's killer feature for me is that as you click on words (or phrases - which I find really helpful btw) to translate them, they are added to your vocabulary list. It will automatically create flashcards for you from this list for SRS. Plus when you're reading a new story, words that are in your vocab list are highlighted yellow and new words are highlighted blue.
(2) it had a Show HN a few months ago:
Show HN: Learn German with Short Stories - https://news.ycombinator.com/item?id=35713852 - April 2023 (119 comments)
I'm currently working on German vocabulary learning app, https://vokabeln.io/, which uses LLMs in a similar way. The app allows you to paste text and extracts the vocabulary to learn. The vocabulary is then repeated with spaced repetition, audios are generated and users can generate infinitely more examples.
I might have made something that works only for me, as getting users seems to be extremely difficult, but I enjoy it much more than anything else I've tried.
It's quite difficult to get attention just off flashcards. Anecdotally my flashcard app gets 1/10 the traffic/interest of my "learn by reading native texts" app which integrates with the flashcard app but also lets people use their preference; a lot of people like to avoid using more than one flashcard app and might already have settled on one, but may be open to other learning techniques besides flashcards or that can extend their existing flashcards system/data
It's also hard to charge money just for flashcards when there are so many free options and the paid options can be pretty good already
For example, I use it to learn from youtube videos, by copying the transcript to the app, which then processes the words, and creates flashcards for the words that I don't already know.
I'm thinking of integrating youtube transcripts, song lyrics, open books such as Grimms stories directly into it. Should not be that much work. But I needed to release and instead of infinitely adding features to my dev build.
Might be worth a pass through to proof-read; I saw another couple of typos, but the page is now throwing a Wordpress database error, so that might be more urgent to look at.
I make a tool in the same space (readlang.com). I started it before the current LLM wave but I've recently added LLM-generated explanations and several users have been uploading LLM generated texts to read (including me!). I've been considering adding LLM based practice too, similar to what you've done with the comprehension questions, but haven't got around to that yet.
Feel free to reach out if you'd like a chat :-)
In terms of the content that you're creating, I always thought it would be very interesting to have leveled plays written in the target language. I always thought that reading how people actually speak might be more useful than reading prose.
It's illustration heavy picture book with multiple difficulty levels and recordings by native speakers for each line of text. The goal is for students to be able to learn languages while they read. Ideally, without having to translate words - though we do provide a lookup tool. Each line in the story is read by a native speaker, and the reader has the option to record themselves and play it back to check their pronunciation.
Our first story "Ari & Chali go to the Market" teaches Thai classifiers and is perfect for someone who has completed Reading Thai Made easy or is just learning how to read.
We hope to develop more content like this for Thai, as well as other languages.
Reading Thai Made Easy: https://emurse.io/course/TdXFGTB/lessons Ari & Chali go to the Market: https://emurse.io/presentation/BnkcnXB