So I wrote some code to scrape his nearly 8000 reviews from rogerebert.com and then import them to letterboxd:
(I only put the first two paragraphs of his review on letterboxd then link to his full review on his site.)
The hard parts of this were:
- Extracting the text of his reviews correctly from his site's HTML. That wasn't too terrible though.
- Matching his reviews to the correct movies on TMDB. This just required a bunch of trial and error and about 20-30 manual corrections. I employed various strategies to match by using movie title, year of review, year of movie release (if on his review, but often off by a year or two), director, producer, cast if on his review.
I also built this for myself:
https://github.com/jaysoffian/eap_proxy
I should put my bin directory full of random scripts up on GitHub. I tend to build them as I need them. They're often very simple things like:
- jqpaste -- which is just "pbaste | jq"
- jsonl [jq|gron --stream] which takes it input and if it isn'v valid JSON, converts it to a JSON string so that I can paste random log output which is sometimes a mix of JSON and not into jq or gron.
Those are just a couple off the top of my head.