This is my use-case:
Kindle has a feature called "Audible Narration." You buy a Kindle book, and the Audible audio book, which allows you to play the audio book while it highlights the words on the Kindle book as you're listening. This effortless switching between audio and text enables some interesting reading behavior. Certain books become easier to read. Note taking also gets much easier (Highlighting text is much easier than bookmarking timestamps on an audio book).
The problem is, getting your annotations and highlights and other data out of Kindle is very difficult, because Kindle does not have a public API. Same with Audible.
So I'm thinking of emulating Audible narration with a hybrid ebook/audiobook reader app. The ebook would be a simple HTML page (converted from epub, formatting be damned) and a simple audio player. As the audio plays, the HTML page would scroll and words would be highlighted.
Challenge is to timestamp tag the HTML with the audio track. I'd guess I could TTS the audio track and then somehow diff the generated text with the epub content. Given that some audiobooks are abridged, some read the footnotes on each mention, and some explain the visuals, I would assume diffing would not be very straightforward.
Do you know of any solutions I could look into?