undefined | Better HN

0 pointsdetaro7y ago0 comments

Do you have a robots.txt entry that's stopping Google from fetching them? That can counter-intuitively cause Google to index pages.

0 comments

5 comments · 2 top-level

scalesolved7y ago· 2 in thread

I do not have a disallow set for this page but based off of this page it seems perhaps I might have to set one:

https://www.deepcrawl.com/blog/best-practice/noindex-disallo...

Specifically the part:

>Noindex (robots.txt) + Disallow: This prevents pages appearing in the index, and also prevents the pages being crawled. However, remember that no PageRank can pass through this page.

detaroOP7y ago

Google specifically warns that having a Noindex header doesn't work if there is robots.txt disallow (since the crawler never sees the noindex, since it obeys the disallow), that's why I asked.

r1ch7y ago

There's a non standard Noindex: /URL that Google accepts in robots.txt. Unfortunately it breaks many other crawlers which don't understand it, so you have to rely on user agent sniffing.

stevenicr7y ago· 1 in thread

In my limited experience, the robots.txt is helpful, but not a stop all. The big G still indexed a bunch of my pages because a certain group was creating (off of my site) links to the spammy pages - which makes G index it; however if you have like: Disallow: spamresult Disallow: search

They can still end up in the index, just with a not that says "no description is available for this page"

I remember years ago the debate Matt Cutts asked if G should index and pointed out that other engines were indexing pages that were robots.txt blocked.. meh.

I had to setup a 301 to homepage redirect system to zap all the pages I took out... although some other engines still spider looking for those pages even though I removed them with 301s over a year ago - perhaps the spammers still have links going to them?

I started just blocking all indexing from sogu or whatever it's called and similar bots in the robots.txt and then started to look at ip / cidrs to block further after thinking they would get the hint after several months.

Hope your situation is different.

stevenicr7y ago

Just realized that I had put the asterisk * in front of and after the two words I had with Disallow up above, but that kicked in HN formatting instead of showing Disallow: *search with another asterisk after it is what I mean to show.

j / k navigate · click thread line to collapse