I'm thinking about a script that receives the URL and saves its main content (e.g., using Mozilla's Readability) in a text file. It also stores the URL on the first line and maybe sends a request to Archive.org to take a snapshot of the page and adds its URL to the second line in the file. Then, whenever I need something, I can search the content of those files (I use the Silver Searcher) and find what I'm looking for. If the main content stored in the file is not enough, I can open the original URL or the Archive's snapshot.
I think I won't need to categorize or tag them; searching seems enough to me.
The only difficulty I can think of is that extracting the main content of pages is not easy, and, for example, Mozilla's Readability doesn't work well all the time. It may be required to have a manual process for copying and pasting the data.
No comments yet.