pg, what's up?
Grumble...
For example, check the headers sent back for:
http://news.ycombinator.com/no/way/this/is/possibly/valid
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Connection: close
If a search engine sees that, it will periodically revisit it forever.At the very least, add a search engine to HN itself so I can search for old links.
Why produce a great site and then not let your users access its valuable content?
Things scroll by the front page so fast it's not really fair not to provide a search feature.
I don't care if they let external search engines (Google et al) index it, but they should at least provide /some/ kind of internal search feature themselves.
IMO providing a search feature is probably at least as effective, if not more so, than the "noprocrast" flag.
After all, if I know I can search past items at any point in the future, I don't feel a pressure to load up the front page several times a day.
Whereas if the fast-scrolling front page is the only way to access items, it encourages that sort of rat-hitting-the-bar behavior I learned about in college psych classes ;)
Much better to set a temporary Crawl-Delay directive. Otherwise you're not just telling the engines to pause crawling you, you're telling them "take all of my pages out of your index."
I know that this is one of the reasons that vote-links look the way they do.
If that's not the reason, then my #2 guess is that getting the whole site crawled is moving too many comments into the cache. If I remember correctly, whenever an uncached comment gets accessed, it gets cached (basically adding it to a hash), and thus added to the in-memory ram. That gradually raises the memory requirements for running the app, until pg does a restart, which resets the cache as well.
Talking with the entire world by having my comments associated in ways I don't expect with keywords: less comfortable.
,[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>++++++++++++++<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>>>+++++[<<----->>-]<<<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>++++++++++++++<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>++++++++++++++<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>>>+++++[<<----->>-]<<<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>+<<-[>>++++++++++++++<<-[>>+<<-]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]>>.[-]<<,]
Maybe this will finally bring about an HN API.
What about the "official" HNSearch (http://www.webmynd.com/html/hackernews.html )?
I just installed it again for a laugh, and it actually redirects to searchyc.com results when you click on its "Hacker News" header — even though pg staunchly refuses to ever link to searchyc. I made fun of him for it at his party before Startup School and he tried to laugh it off with "they're a YC company!" as if that weren't obvious on the face of it.
1. I use Google site: searches a LOT. I can remember reading about something I need now a long time ago on HN and I go find it.
2. One item on my "if I have time" projects list is to do "something" "fun" with HN submissions. Now I'm officially barred, regardless of what I would have thought up.
HN is a good source of google juice for interesting new startups, and it would be a shame to see that go away...
There's a lot of reasons I'm bummed about this, but I have to say: startups missing out on free PageRank is not one of them.
This is actually a very neat and simple way to exercise editorial control to remove the nofollow: if enough people in the community like it, there is something useful about it and it is vetted.
edit: Oh wait, seems that only comment links are nofollow'd.
The only reason to do this is if you don't want your site indexed by google. Which I really can't think of a legitimate reason to do so.
User-Agent: * Disallow: /x? Disallow: /vote? Disallow: /reply? Disallow: /submitted? Disallow: /threads?
This just disallows those pages... not the home page, and not the /item? action (note the url of this page).and found this: http://news.ycombinator.com/item?id=165279
one of the top replies:
My vote is to constrain growth as much as possible, at least that which comes from stupid sources. Smart hackers will find this site just fine without Yahoo or MSN, probably even Google. As "evil" as blocking sites and crawlers may sound, I think these types of measures will be necessary to preserve the quality of content here. Whatever actions further that objective have my vote.
perhaps smart hackers shouldn't block everything out like this without any discretion whatsoever?
HN just needs a super simple caching proxy in front of it to reduce load on the app server. Alternatively, just generate static pages for all topics older than 5 days.
Will sysadmin for YC dinner invites.
User-Agent: *
Disallow: /x?
Disallow: /vote?
Disallow: /reply?
Disallow: /submitted?
Disallow: /threads?Also things like http://hacker-newspaper.gilesb.com/ and http://hnsort.com/ become less legitimate due to this. If the reason is to reduce the SEO benefits of getting a link on HN, just "nofollow" everything instead..
(Update: Googling on this topic brought up a page of my own where a Google Reader engineer explained how Google Reader deals with robots.txt - http://www.petercooper.co.uk/google-reader-ignores-robottxt-... - though their definition of Web robot is far from universal)
On the other hand, if the goal is to push HN back into semi-obscurity by making it harder to find, it might work.
I can't await to hear the reason behind that decision.
Long term PG signs a deal with Bing to be the exclusive search engine for Hacker News that pays for the servers and bandwidth.
Down mod me if you will but it's simply brilliant.