Fastly Engineer 2: I have some very bad news...
With Reddit however, these days almost all comments are locked behind “view entire discussion” or “continue this thread”. In fact, just now I searched for something for which the most relevant discussion was on Reddit; Reddit was down so I opened the cached version, and was literally greeted by five “continue this thread”s and nothing else. What a joke.
Arrange the html so that the list of comments is at the end (via css). Keep the http connection open, have the show more button send some of request, and when you receive that request send the rest of the page over the original http connection.
As usual, solve people problems via people, not tech.
“View entire discussion” couldn’t be implemented perfectly with <details> in its present form, but you can get quite close to it with a couple of different approaches.
I think the infinite scrolling of subreddits is about the only thing that would really be lost by shedding JavaScript. Even inline replies can be implemented quite successfully with <details> if you really want.
https://old.reddit.com/robots.txt
is very different from this:
I guess there is a market for search engine (maybe accessed through tor) which does not care about robots.txt, DMCAs, right to be forgotten etc. Bootstrapping it should not be that hard since it can also provide better results for some queries since nobody is fighting about the position until it's widely known.
I'm not sure how far are we from being able to do full text internet search. Or rather even quote search, preferably some fuzziness options. That would be cool, Google's quotation marks were really neat back when they were working.
User-Agent: bender
Disallow: /my_shiny_metal_ass
User-Agent: Gort
Disallow: /earthThat’s not going to happen before Cloudflare is dethroned. See this recent thread for some perspective: https://news.ycombinator.com/item?id=27153603
And even if there’s no Cloudflare, large sites that people want to search will always find ways to block bad bots.
The only thing I can think of that might work is using crowd-sourced data, with all the problems that come with crowdsourcing.
/etc/hosts
reddit.com old.reddit.com
www.reddit.com old.reddit.com
np.reddit.com old.reddit.com
$ curl https://old.reddit.com/robots.txt
User-Agent: *
Disallow: /
Also, even if search engines are allowed, old.reddit.com pages are not canonical (<link rel="canonical"> points to the www.reddit.com version, which is actually reasonable behavior), so pages there would not be crawled as often or at all.int main() { int arr[100][200][100]; // allocate on the stack
return 0;
}It is an open-source software that allows you to keep and read offline static versions of websites in a specialized archive format (zim-files)
It was originally designed to allow you to read wikipedia offline, but there are also dumps of stackoverflow available on the relevant page : https://wiki.kiwix.org/wiki/Content_in_all_languages