Fun tidbit: for TV actors, regularly reading pilot scripts and then watching the produced pilot for comparison is a huge common educational technique. You get to imagine what kind of acting and directorial choices you'd make, and then see what was actually done. Often times you'll realize you had totally misinterpreted what a scene was even about.
It's also fun to see how every script is filled with lines that are "unactable" -- there's just no way any real person would ever say anything like that. Then nine times out of ten, those lines are cut from the final product, because even the best actors couldn't make them work.
House (the character) is often being plainly racist and sexist. The fact that he presents it as sarcasm is a vehicle for him, used to make his racism more difficult to challenge.
Also check out LanguageLearningWithNetflix [0] which lets you watch videos with two subs in different languages, displays the subs as HTML so select/copy/define will work (and it has a built-in dictionary too). It also allows you to quickly jump to the beginning of each sentence so you can hear it multiple times, which helps improve your listening skills. For me, it has been a fun way to improve my German.
On a side-note, please notice how none of these great features are available to mobile users. iOS for example, is technically perfectly capable of supporting this kind of extensibility, but the App Store model limits it to a few narrow and specific use-cases.
Injecting third-party code into a third-party app that has to deal with DRM sounds like a recipe for disaster.
But this particular use-case would still play well with DRM as it's implemented today. Netflix on the browser still has DRM, but since the <video> element is standard, it can still be hooked into and decorated.
Regarding your comment about extensibility, this applies to mostly every software platform other than the web, unfortunately. And even there, it feels like a happy accident. There's much work to be done in this area.
[ Dear Netflix, let's be friends. There's a lot of work to do still, and we can go further, faster, with some small helps. Can we get a test account? Regards, David. languagelearningextension@gmail.com ]
TL;DR: ScreenplaySubs fetches the subtitles from Netflix, parses the PDF-formatted screenplays into JSON, and syncs by calculating the sentence similarities between subtitle and screenplay dialogue.
In particular, we use the Universal Sentence Encoder for deciding whether a subtitle matches with a screenplay dialogue. If a screenplay dialogue is similar enough with the subtitles, the former will be tagged with the timestamp provided by the latter.
A lot of the underlying problems presented with each step sounds deceptively simple at first, but turns out to be quite challenging and fun to research. E.g. Parsing PDFs in general are not straightforward (https://filingdb.com/b/pdf-text-extraction), and there’s only a handful of resources on parsing PDF screenplays beside a handful of research papers (https://github.com/drwiner/ScreenPy/blob/master/INT17_screen...), which lead us to create our own open source repo for this (https://github.com/SMASH-CUT/screenplay-pdf-to-json).
Our screenplay-pdf-to-JSON converter is able to contain all dialogues, transitions, actions within a particular screenplay scene. With this, we’re treating scenes as atomic, being able to detect changes in scene ordering based on the tagged scene timestamps. This also means if dialogues are swapped within a scene in the movie, there will be some syncing inconsistencies.
Some scenes do have little to no dialogues, which would pretty much cause the extension to work on a best-effort basis. E.g. The opening scene of There Will Be Blood has very minimal if not no dialogue at all. This is the case where I need to jump in and sync up the screenplay manually. OTOH, the opening scene of Inglourious Basterds will work very well, since there are tons of dialogues in it. This is the reason why I can’t just add movies and instantly upload it to the site.
Would you be interested for me to get into more details? I was thinking of writing a series of technical blog posts if there are enough interests!
Over the last several years I've imagined a lot of projects (both serious utilities, and the absurd/artistic) in roughly the territory you're exploring...
- For my MFA thesis (2012) I used plaintext (thankfully, though they had plenty of their own problems) transcripts of a TV show as a corpus for generating poems from, and at the time I thought it would be an interesting follow-up project to turn them back into video clips.
- Mapping film quotes/citations back to the script/film and accuracy-checking movie quotes. (can imagine both of these being useful for film forums like the movies/sci-fi stack-exchange sites).
- Generating script-cuts of movies that re-order/drop scenes and just show the printed script on-screen where scenes were cut.
- A film-analysis/screenwriting-class sort of interface oriented around reading a segment and then playing it (could be particularly interesting when there happen to be multiple known script drafts?)
- Re-constructing a character monologue from lines spoken by an actor that turned down the role.
- Generating a super-cut of actor X saying Y.
- Generating focused cuts of a film that cover, say, every scene a given character does/doesn't appear in, or every scene that mentions X.
This is my use-case:
Kindle has a feature called "Audible Narration." You buy a Kindle book, and the Audible audio book, which allows you to play the audio book while it highlights the words on the Kindle book as you're listening. This effortless switching between audio and text enables some interesting reading behavior. Certain books become easier to read. Note taking also gets much easier (Highlighting text is much easier than bookmarking timestamps on an audio book).
The problem is, getting your annotations and highlights and other data out of Kindle is very difficult, because Kindle does not have a public API. Same with Audible.
So I'm thinking of emulating Audible narration with a hybrid ebook/audiobook reader app. The ebook would be a simple HTML page (converted from epub, formatting be damned) and a simple audio player. As the audio plays, the HTML page would scroll and words would be highlighted.
Challenge is to timestamp tag the HTML with the audio track. I'd guess I could TTS the audio track and then somehow diff the generated text with the epub content. Given that some audiobooks are abridged, some read the footnotes on each mention, and some explain the visuals, I would assume diffing would not be very straightforward.
Do you know of any solutions I could look into?
How is this done? Isn't everything on Netflix protected by DRM?
Anyway I see you have a comment here where you say you use the closed captions to figure out where to staple in the script. Would be cool to be able to staple in arbitrary other media - text audio video whatever.
I'm certainly not suggesting this is done by this author and I applaud the creation of the tool, but I'd be interested to hear opinions as to whether my interpretation above is correct or if I'm overly cautious/overlooking something.
For what it's worth, we can confidently say that our extension does UI modifications without ever being involved with user sensitive info. Regardless, will definitely open source the extension. Hopefully this will win some user's trust. Stay tuned!
The demo on the page looks great, and this is stuff which should be automatable at some point by AI.
Maybe you could implement smooth scroll and some sort of an overlay mode.
One of the reasons we decided not to implement that for now is to provide a bigger room for error since our algorithm is still not perfect. Sometimes the extension choose to focus on 1 or 2 sentences next to the accurate dialogue. Having an entire viewport height to show the screenplay means even if some inconsistencies occur, the user may still be able to see the accurate dialogue.
yea I like this example better, but thought of an actual overlay.
Nerd out with your word out.