I've had great success using it to read more while reducing eye strain and computer usage. I think I've probably read 30 or so books this way now over the past year. Being able to listen to any content you want in audio form free and offline while going for a walk is extremely handy.
I hope it helps you as well!
Cheers
- ebook-convert is not a small dependency, it seems that it only comes bundled with calibre software. And calibre has huge number of python dependencies (>400 packages on OpenSuse) - don't know about you, but I'm not polluting my install with that for a small tool. So, I've grabbed appimage version of calibre, extracted it and added symlink to the bundled ebook-convert. It is still around ~500mb of wasted space, but atleast it's local to a single folder.
Could you replace it with another tool/library, or include only necessary stuff with binary?
- Then I've encountered another problem. I have no piper installed on my system, but readme says:
> You don't need to have piper installed. This program manages piper and the associated models.
It didn't download piper release and proceeded without errors. Then it did download some models. After that it errored out on trying to change directory to non-existent "~/.config/QuickPiperAudiobook/piper" So naturally, I looked in source code, found link to piper tarball and extracted it myself.
A-ha! Now it works. Until..
- Done. Saved audiobook as /home/archargelod/Audiobooks/text.wav
You could try to guess what was the problem, but I'm going tell you right away: it didn't create "Audiobooks" folder and again there were no errors.
Thankfully, that was the last issue and after I created ~/Audiobooks manually, my generated wav was there.
Your feedback on ebook-convert is very valid. I can take a look at breaking it up. (Granted I am not sure how much of a lift that would be)
https://gist.github.com/avelican/8602b417e810f8dd4e31e8e3fbb...
...at which I did some more digging and realized that (for my purposes anyway -- operating on txt files), QPA can simply be replaced with piper itself!
cat book.txt | piper --model [model] --output_file book.wav
(which I found kind of funny)Re: the ebook-convert dependency, I wonder if there are any feasible alternatives? My first thought was pandoc, which is ~140MB, but I guess that's smaller than Calibre's ~1400MB (!!!).
In this case you already have the input file, and the audio output file but I guess there would be an app that takes these two files to provide a good reading experience. As they are based on the same source it should be possible to keep the reading progress matched between them.
Use Calibre's e-book viewer[^0] which uses Piper for text-to-speech.
[^0]: https://manual.calibre-ebook.com/viewer.html#read-aloud
On a more serious note, this is a cool application of the technological advancement in AI voice models, and inevitable in today's society. It just really sucks to watch this race to the bottom actively put people out of work.
But hey, at least we can save a few bucks on an audiobook, right?
the entire progress of civilization has depended on putting people out of work by increasing productivity and efficiency. Subsistence hunter-gatherers and subsistence farmers were put out of work by cheaper agriculture systems, and some of those unemployed realized they could support themselves by reading books to other people, a task they enjoyed much more.
The replacement of hunter-gatherers by farming is a change that took centuries to take hold. Nobody lost their ability to feed their family because their ability to hunt and gather was automated away. Ironically, the move away from hunter/gatherer subsistence took free time away (for things like storytelling) instead of adding to it, in exchange for greater reliability in their sustenance.
The loss of entire swaths of employment is a fairly new development. As is the lack of safety nets (US Centric for obvious reasons) for those who become injured or otherwise unable to sustain themselves.
this superficial thinking is full of holes from the first examination, and, actively harms others.. and is an excuse to ignore the statements of a audio book narrator here.
This will only do narration, and the engagement is probably still not 100% there yet (sorry cant try it right now).
This kind of thing is very useful to consume high-level information on the side, while driving, cooking, gardening or doing exercise. So it can be useful to make previously curated and written content more accessible. Including content people have curated themselves, or got a bot to curate for them.
For example, I listened to the entire FT weekend edition while cycling on the weekend, using their text-to-audio function. This allowed me to take in even parts of the paper I normally do not have time to read. Before the advent of the text-to-speed function, I would have to chose between health and information. Now I can have both.
The ability to change voices to one that suites a person's taste is hardly a race to the bottom. It is a HUGE value add.
I am sure lamplighters were not happy about the light bulb either.
c'est la vie.
Breaking the audible monopoly sounds like a nice side effect too.
I'll be writing some music for the intros of chapters and some special sections for suspense.
There's a few good folks on youtube that discuss some of the more nitty gritty details, if you're interested.
But to this date I cannot use Apple's Books app on the watch to listen to audiobooks I have on mp3/mp4a/... It only works with audiobooks you have purchased in their walled garden.
[1] https://rhasspy.github.io/piper-samples/ [2] https://huggingface.co/spaces/coqui/xtts [3] https://github.com/rhasspy/piper [4] https://github.com/coqui-ai/TTS
Looks like there's an effort to keep an actively maintained fork here, though: https://github.com/idiap/coqui-ai-TTS
What would it take to add a specific language to piper? And do you know a good speech to text model?
With regards to adding languages, first check if support already exists [0]. Then there are a few tutorials that might be relevant [1] [2] [3]. Once you have the onnx model you can just put it in the QuickPiperAudiobook model directory and specify it via the cli args.
[0] https://rhasspy.github.io/piper-samples/ [1] https://github.com/rhasspy/piper/issues/51 [2] https://github.com/rhasspy/piper/blob/master/TRAINING.md [3] https://www.youtube.com/watch?v=b_we_jma220
OpenAI's whisper, code+model are available, and multiple projects have built on it. You could try this wrapper: https://github.com/m-bain/whisperX -- or for short utterances on a smart-phone https://github.com/futo-org/whisper-acft
I have just one question/note to make: I tried a book in the Mexican Spanish language and noticed that it fails to catch the accents on the words (emphasis on words with tildes and strong accents on that syllable) and I am thinking it is because of the .pdf parsing since the Piper Voice Sample on their webpage example does it properly (on both avbailable voices).
Do you have an idea of what could exactly be happening and how I can try to solve it?
Thank you very much for the tool again!!!
Update: Ohh ok I just checked the repo Issues and found the one about polish accents, I tried "--speak-diacritics" but got the same "Error: failed to read file passed as input to piper: read /tmp/ebook-convert-xxxxxxx.txt file already closed". If I skip the diacritics option it converts fine.
I realized the removal of diacritics was happening at the function RemoveDiacritics inside lib/textProcessing.go on line 26 and modified the definition(?) to not modify special characters, compiled again and voila! It worked great.
After that I used Calibre to convert a couple .pdfs to .txt and with a pretty simple python script got rid of page footnotes/headers/page_numbers and I just ended up with pretty decent Audiobooks.
Thanks again for the great tool!