It seems like a solvable problem. Why don't they let webmasters implement some kind of time-based cryptographic signature?
It seems so lame that his problem has gone on so long, especially when there must be some kind of technical solution.
For real businesses spending a few days implementing some authentication protocol would not be particularly burdensome.
Site 1:
* Content ABC
* Content DEF
* Content GHI
Site 2:
* Content JKL
* Content MNO
* Content PQR
Site 3:
* Content STU
* Content VWX
* Content YZ0
Site 4:
* Content ABC
* Content DEF
* Content MNO
* Content PQR
* Content STU
Which of these is a scrapper?It just seems like a purely algorithmic solution doesn't really scale and some human intervention is really necessary.
Google cares a great deal about putting the original source of a piece of content first. If we're doing that incorrectly, it's because we screwed up, not because that's how things are designed. It's a hard problem and an area we are still working on intensely. It would be great if someone involved could post the queries on which we are screwing up so we can debug what's causing it.
The search:
["a superb app for iPad and iPhone that lets you quickly and easily transfer photos and videos between iOS devices and computers – has been updated this week, to Version 2.3."]
returned results from content scrapers above the original content.
For me, the original content doesn't even show up in search results, even though it's in Google's index:
http://ipadinsight.com/ipad-apps/photo-transfer-app-updateda...
Further, Google wouldn't let the site owner buy AdWords to drive traffic to their site.
Google owns both search and AdWords. This makes the headline here:
Google penalizes original content site because of scrapers
accurate, as far as I'm concerned.The headline implies that he's penalized in search due to scrapers, which isn't happening.
I personally don't see what is news about this. It has been known for a long time that newer or less frequently updated sites can get beaten by scrapers, though it usually resolves itself later unless the scraper is a decently reputable source like the Huffington Post.
The headline conflates the two things, so is inaccurate.
I've never seen an actual instance of a site with original content being penalized because it is getting scraped (though it is theoretically possible.) Our systems for this are robust and quite conservative. When a scraper outranks the original site it's because we weren't aggressive enough in demoting the scraper, or don't have enough data about it, not because the original was penalized.
https://encrypted.google.com/search?q=Quick+Look+%E2%80%93+W...
It appears he has more details about specific queries on this page: http://www.google.com/support/forum/p/AdWords/thread?tid=0bb...
Soon after all that hassle, my site suddenly lost 60% of its traffic. From what I can gather, mine is one of the quality sites that produce original content that has been mistakenly penalized in the Panda / Farmer / Whichever Other updates.
Among the reasons I say my site is a quality site that produces original content, in accordance with this post at the Google Webmaster Central Blog (http://googlewebmastercentral.blogspot.com/2011/05/more-guid...) and with all the logic I can apply to the subject, are:
-- The site contains over 1,700 posts published in the last 15 months. I wrote around 1,550 of them myself. The remainder are written by three other occasional authors, who are colleagues and friends of mine. There's no 'outsourcing' of content creation or anything of that ilk.
-- I spend tons of hours every day researching and writing the content that appears on my site. Every app review on the site is 100% original content (http://ipadinsight.com/category/ipad-app-reviews), as are all posts published.
-- I do consider myself an expert on the subject my site covers - the iPad. I have been writing app reviews,accessory reviews tips, how-to posts on it ever since it launched. I've appeared on ABC World News and numerous radio programs as an iPad and Apple expert. I've been a contributing author for iPhone and iPad Life magazine (printed publication) since their debut issue - writing expert tips and tricks posts, buyer's guide articles, and more. I'm listed in Robert Scoble's Twitter list of best tech people to follow. Blue-chip app publishers and accessory vendors approach me to write about their products. The Daily (the first iPad only newspaper) contacted me before their app even hit the App Store, as do many leading publishers. I've been a beta tester for many top iOS apps for years. I participate regularly at several leading iPad and iOS forums. I'm not saying any of this to boast, but in an effort to establish that I'm a blogger who is enormously passionate about the subject I cover, and someone who is respected in the area (mobile tech) that I write on.
-- My site is a long-standing member of the Got-OATS group of sites (http://www.gotoats.org/) that seek to uphold and promote the highest ethics in app reviews. We never accept money for reviews or coverage, and add disclosure statements to our reviews to indicate whether we received a promo code for an app reviewed, or a sample unit of an accessory reviewed.
-- I spend a lot of time on every single post, on researching, on testing apps and whatever else I'm covering, on ensuring that spelling and grammar are spot-on, on providing good screencaps of apps in action, and every other detail I can think of.
-- I use a great cache-ing plugin on my site and do my best, with help from a few Wordpress experts, to keep the site fast and clean.
-- I currently have close to 4,500 RSS subscribers and over 3,000 Twitter followers for the site's account.
-- Before my recent sudden traffic fall off a cliff due to Panda, my site had around 80-100,000 unique visitors per month.
As for search results and scraper sites, I am still often seeing horrendous spam sites ranking above me for recent posts. Here is just one quick example on a recent post I wrote about iPad rivals, where several scraper sites rank above mine, including one (ipads101.com) which I have submitted 3 spam reports on via Google Webmaster over the last two months, and had zero response:
http://www.google.com/search?sourceid=chrome&ie=UTF-8...
I run a good site. I pour hours of effort and my heart and soul into it. And I think it has been very wrongly assessed by whichever new algorithm.
You say you wrote 100 articles a month for over a year. Were these all quality posts? Even a site like smashingmagazine with a very general subject and a team of writers can't output 3 articles a day. If you were to put the time and effort of 3 articles into 1 article, does that help in the number of links the piece attract and the times it is shared?
I understand that you spend time each day to update your blog, but if that is not original research, do you believe to be rewarded more than "amateur" status? How many people do you think write about Justin Bieber? How many of those blogs should rank in the top 100 for Justin Bieber terms?
Next up: Could you become an affiliate of Apple for your Apple product site? Really unlikely as your domain name violates their TOS and trademark. Why should Google allow a site to advertise for Apple products, when Apple wouldn't allow that same site to do the same? Apple doesn't consider you an expert, they consider you a legal hassle. And are you really an Apple expert? Or do you just own some Apple products and keep up-to-date?
Also your shareasale footerlink should disqualify you for Adsense, as dofollowing an non-editorial affiliate link is against the Quality Guidelines.
My 100 articles a month and were they all quality posts: Yes they were and are. Is every single one of them a new in-depth app review or how-to post, No. I sometimes write shorter pieces offering my opinion on a major bit of iPad news or similar. I sometimes write lighter pieces, about anything from iPad-related humor to how my big goofy Labrador is my work colleague. I think that provides a nice mix of content for readers.
If Apple announces the date that a new iPad is going to be released, or that they are finally going to support subscription plans for iPad magazines and newspapers, that's of interest to me and to my readers, so I mix in a small percentage of original news coverage as well. The fact that many sites may cover the same story does not mean that mine is not original or professionally written content. When there is a major news story, it is covered by The New York Times, The Washington Post, and other quality broadsheet papers. When there are major political and economic stories, they are all covered by Time, by Newsweek, by The Economist, and so on. Does that mean that none of these titles are producing original or professional content? When any of these big titles pick up stories from the AP, Reuters, and smaller local news outlets, does that mean they are suddenly low-quality titles? I'd say absolutely not. And when a small %age of my posts cover major news, it doesn't mean I'm a low quality site either. I always have my own take any iPad related news or rumors that I choose to write on. I don't 'borrow' anybody else's take - I write my own thoughts on the very few news or rumor items that interest me.
On the how many people write about Justin Bieber question, there are not a lot of sites that focus solely on the iPad, as mine does. And very few indeed that do so and are quality sites - the majority are scrapers that just continually ripoff content from sites like mine. Mine is one of the very few sites that covers only the iPad and does so with 100% original content.
Could I become an affiliate for Apple? I have no desire to be one. That's not at all what my site is about and it's not at all relevant to this discussion. "Why should Google allow a site to advertise for Apple products?" I don't do that at all. I am often critical of Apple, of App Store policies, of iPad related decisions etc. When I post App Store links for apps that I cover, I don't even use the affiliate links that many sites do, I just use straight-up links.
Am I really an Apple expert? Yes. And particularly an iPad and iOS expert. Again, I've written for the leading print title (iPhone and iPad Life Magazine) in this space since its debut issue, I've appeared on ABC World News and various radio programs and podcasts as an iPad and iPhone expert, I've been working with mobile devices since back in the days of the Palm Pilot, many leading publishers come to me to beta test and assess pre-release builds of their apps, Robert Scoble lists me among his most influential tech writers. I've also worked in tech support, network management, and IT consulting for over 15 years. So yes, I'm very confident in saying I am an expert on the very tightly focused subject I cover.
On the footer links, I honestly didn't even realize what kind of links they were. I have used the Thesis theme for years and Rackspace for hosting for years. I think very highly of both of them, so I was happy having a small link to them. I've made exactly zero dollars via those links. I took them down when I saw it mentioned here that they are not a good idea.
This is a quite common problem actually when webmasters have reason to hate certain sites. They click on those sites in search results a lot and often see them promoted above their own, even though they're the only one who sees them promoted.
1. usedipadforsale.net/ipad-rivals-this-year-the-year-of-the-copycats-or-the- clueless.htm
2. www.ipads101.com/ipad-rivals-this-year-the-year-of-the-copycats-or-the- clueless/
3. ipads2nd.com/.../ipad-rivals-this-year-the-year-of-the-copycats-or-the- clueless/
4. catsmakemebats.micasaessucasagermania.com/.../ipad-rivals-this-year-the- year-of-the-copycats-or-the-clueless/
5. ipad.thedailyglobe.com/.../ipad-rivals-this-year-the-year-of-the-copycats-or- the-clueless/
6. ipadinsight.com/category/ipad-rivals-2
That looks pretty bad to me.
My main point here is that my site doesn't match any of the criteria for getting slapped by Panda. It's a site that has its content ripped off a ton, and every time I report the offending spam sites to Google there is no response at all.
If the sites that scrap your content do it from servers with fixed IPs, then you could go hunting and try to block them.
As long as there are chumps like me that write content for our own personal reasons, or who drive traffic through sites like HN and twitter referrals, scrapers can piggyback on my work and Google can do whatever it likes. It won't affect my motivation to create content.
And oh, your post pushed a hot button of mine: comments by people who think they're being incredibly insightful by saying "the world is not fair" in different ways. On HN, I think we can assume that people are adults and understand such things. Sorry about the flame.
For an analogy, consider a murder trial. Someone stands up and says, "The accused stands to profit from the victim's death." Isn't that suggestion more likely to come from the prosecution than from the defense?
Mmmm... no, unfortunately Google is supposed to make money, and they make money mainly from adsense. I'm not prepared to accept that random algo modifications having big impact on their revenue are done without concerns.
And when you are at the same time the same company driving people to web sites (search) and getting profits from ads showed in web pages (adsense) something bad can happen as it is not a free market setup.
Our site went from ranking #8 for our target search "artist websites" to PAGE 440 of the results. Our listing for "how to sell art" just went away. There's been nothing but original content on our site for 10 years, and, among artists, we're considered one of the best sources of art marketing information, given that I owned an art gallery for 20 years and all of our other writers are professional artists. (and yet Google still has ehow ranked for "how to sell art".....yeah, I'm sure ehow knows a whole lot more than we do).
I'm saying this not to vent, but to concur with Aaron and others that there is something wrong. It may not hurt Google's business....the latest algorithm's probably improve adsense revenue, and that's fine, it's their business. Fortunately I've read HN long enough to know not to build my entire company on top of someone else's platform and, as much as it upsets me, we don't need Google. Bing (and Yahoo) have us at #3 for that same search ("artist websites"). We don't depend on search engines as our only source of marketing leads....nor even our main source.
The most frustrating thing is not even that it happens, but that they do not communicate. There's no way to find out WHAT happened. Nothing in Webmaster Tools. No way to pay for search support. I read the Google blog post with guidelines on how to structure content after Panda and, none of that applied to us, at least not that I could tell.
They say "just focus on users" and that's what we do, but I guess, that's BS.
I, frankly, think Google's gotten to big for their britches and as unlikely as it is to happen, I hope Bing, Blekko and yes, DuckDuckGo take some market share away. Windows is better for having OSX and Linux to compete with. Maybe Google would be a bit less "evil" with more competition too.
Sorry for the bit of the rant, I'm usually only a lurker here, but this article of Aaron's really hit close to home this week. At least there are a couple of relevant points buried in my little rant....I hope ;-)
There's no reason to jump to that conclusion. We don't make ranking changes to improve adsense revenue, and don't use it as a metric to evaluate ranking algorithms. We don't even have a mechanism to collect the data.
I realize you can't guarantee any results, but honestly, we really have absolutely no clue what to do next. Every change we've made to our site in the past 2 years has been to try to do everything we read that Google wants from official Google channels. We truly just don't know what else to do.
Edit to add: we did 3 months ago change from a very long domain name to a short one (faso.com) because it is shorter and we own a federal trademark on the word "FASO" and thought that Google wanted to place more emphasis on brands - our rankings stayed the same and even improved....until Monday.
I did ask a Google employee, who said it was because we weren't using canonical tags, but this doesn't make much sense and fixing this doesn't seem to have done anything to improve the situation.
I really don't think it's being done of malicious intent. I think it's very likely that it's just being done because of negligence, since service/app reviews happen to be frequently scraped.
Google puts some effort into making sure that (i) people aren't annoyed by ads directing them to content they didn't want, and (ii) the top ads are good quality and likely to have a good click-through rate. Ads that have a high quality score cost less to place for that reason, ads with a low quality score don't get placed, or get placed on the second page of ads.
Google could issue verified sites. If someone copied a verified site the content would be automatically removed from the index.
Now they would have to hire some staffers to research the applications and handle complaints. But this is beginning to cost them far more than adding a few more staffers.
A couple of days ago I did some searching online and found that a fair number of websites had copied some of my articles in their entirety. And sadly, a lot of these 'websites' were actually Google Blogger (Blogspot) blogs. And whilst some of these copied articles weren't appearing in Google search (I guess since the entire site contained copied/scraped content, thus giving them a Google SERP penalty?), some of the copied articles were appearing in the SERPs. And a couple of these websites even had Google AdSense on them.
So there was the crazy situation whereby my content had been stolen/scraped illegally, and put on a Google Blogger blog with Google Ads on it, and (in some cases) that blog then received traffic from Google Search. Hrmph.
In the interest of balance, I will point out that I filed Google DMCA requests after finding these scraped articles, and Google did promptly reply (a non-automated reply around 30 hours later, which is quick considering how many DMCAs Google must get).
They only removed the individual blog post (and not the blogs overall, even though they were clearly spam blogs), but nonetheless I am happy with Google's quick response.
I just wish that content scraping isn't (in some cases) a profitable endeavor..
I'd agree that the denying of Adsense was (obviously?) wrong, if this is all there is to the picture. As for looking at ipad information on the internet, after a manual inspection of that site, I, as a user that cares for quality and relevance, don't need any of the results on justanotheripadblog.com in my top 100.
The order of relevance, discovery and editorial quality seems to flow from:
http://reviews.cnet.com/8301-19512_7-20023976-233.html
>
http://ipadinsight.com/ipad-tips-tricks/how-to-make-airprint...
>
http://www.info4arab.com/how-to-make-airprint-work-with-just...
With a lot of intermediate steps.
iPadInsight.com is not a cheap scraper site, but is it a site that does original research, beyond rehashing what is hot in the industry? I think Panda might have judged correctly in not assigning higher rankings to this site.
The site seems to have had a canonical problem with the comments in 2010, inflating the site size in index to * 10. The depth of these comments is usually not much more than: "Great! Interesting Article! Love this! Thank you!" and might just as well have been auto-generated.
Also the shareasale footerlink "Thesis Theme for WordPress" alone, might disqualify you for running Adsense, as you dofollow an affiliate link (and this is not allowed in the webmaster quality guidelines).
The trademark inside domain name might be another issue.