Google has turned into a cesspool. Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov to search strings, searching by time periods to eliminate new "articles" that have been SEOed to the hilt, or taking out yelp and other chronic abusers that hijack local business results.
That's a bit harsh but I agree that it is starting to fail to live up to the expectations I had with Google when it came out and destroyed Altavista in a spectacular shower of sparks.
Could I tender: "uBlacklist" as a stop gap, amongst others as we await Google being given a right old kicking?
Despite being a staunch Arch Linux user I have to deal with rather a lot of MS Windows related stuff. Being able to filter out that bloody awful Microsoft Social thing gets me closer to decent results. The majority of the next 10-100 results will be CnP clones of someone's blog but a human is able to get in reasonably quickly. I'm toying with blocking Stackoverflow and other cough slatwarts to see if results get better for me.
In my opinion: the www has hit a crossroads or perhaps a Spaghetti Junction or a Magic Roundabout for the last five years or so and continuing. However the exits are connected to the entrances on these road systems (take a look at them - they are real junctions. The MR is particularly terrifying but it works really well.)
I still won't use words like cesspool for this but I am increasingly losing my patience over the standard of results from Google. Those featured things (not the Ads - that's fine) at the top which add #blah_blah to the URL to colour search terms yellow is not working for me. The quality of the returns featured in a box are often rubbish too. It would be nice to be able to turn all that stuff off.
I understand that Google are trying to "be" the internet to try and keep the stock ticker pointing north but there seems to be a point when they have overreached themselves and I think that was passed several years ago. I also increasingly feel that Google thinks that it knows best and has removed many choices from their various UIs - that comes across as a bit arrogant.
Many years ago I left Altavista behind for Google. I will move again if I feel I have to. Of course that's not much in the grand scheme of things and I'll probably only take around 100,000 people with me but they have friends - still probably not a big deal.
> not the Ads - that's fine
In my strongly held opinion, push advertising is not fine and it's the root cause of all the problems you are discussing. We will only exit this mess that the web has become when everyone blocks push advertising by default. People should only see advertising when they are interested in being advertised to, e.g. sites you consciously choose to go to that advertise products & services, like the old Yellow Pages phonebooks.
We’re moving to the vision of information services that were pioneered by AOL, Prodigy, etc. Honestly, we’re there already.
! Hide low-quality results on DuckDuckGo
duckduckgo.com##[data-domain="w3schools.com"]
duckduckgo.com##[data-domain$=".w3schools.com"]
duckduckgo.com##[data-domain="w3schools.in"]
duckduckgo.com##[data-domain$=".w3schools.in"]
duckduckgo.com##[data-domain="download.cnet.com"]
!! Stack Exchange mirrors
duckduckgo.com##[data-domain="exceptionshub.com"]
duckduckgo.com##[data-domain="intellipaat.com"]Let's say, if I search for a python builtin library, I want to go to the python website, not some "Python 101" blog post about it.
Honestly, I don't believe for a minute they "can't fix it." They do this sort of thing all the time, for instance when ML shows dark skinned people for a search for gorilla, they obviously have recourse.
Yes, they can. They should simply stop measuring only positives, and start measuring negatives - e.g. people that press the back button of their browser, or click the second, third, fourth result afterwards...which should hint the ML classifiers that the first result was total crap in the first place.
But I guess this is exactly what happens if you have a business model where leads to sites where you provide ads give you a weird ethics, as your company profits from those scammers more than from legit websites.
From an ML point of view google's search results are the perfect example of overfitting. Kinda ironic that they lead the data science research field and don't realize this in their own product, but teach this flaw everywhere.
Makes it really difficult to find old pages about something that recently exploded in popularity, because the age filter just doesn't work.
I was briefly going to write "I'm surprised that DMOZ[1] still exists" but it says "Copyright 2017 AOL" at the bottom so maybe it doesn't.
Edit: ...and using the search box results in a 404 so I guess it's really dead huh.
Edit 2: Apparently this is the successor! https://curlie.org/en
[1]: https://dmoz-odp.org
Halt and Catch Fire [1] (As a nerd, I can say it's one of the few TV series that got the hackers spirit correctly) had a few episodes about the Google disruption.
Like some people often say here, things come and go in circles...
[1]: https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(TV_series...
I am in the pre-release program. The hardest initial thing to get used to was not immediately scrolling down to the bottom to avoid all of the spam.
I suspect that their methods are not much different than Google, but the experience has been so much better.
*: and themes for other web applications, but mostly WordPress these days
I bet it's that we do different types of searches.
It's literally never the original source for anything, but you can bet it's most of the first 10 pages of results. Then it doesn't even let you right click to open the image file, and dumps you to a login prompt if you click on anything. THAT'S NOT EVEN YOUR IMAGE STOP TELLING ME WHAT I CAN DO WITH IT.
All these same sites appear near the top of Bing searches too. There's nothing particularly Google-specific to this story. It's about SEO hacking that will work against anyone with a PageRank-style system.
(I just checked and this copycat documentation site has, thankfully, now been pushed down a bit in DDG results.)
Those fake shops are part of discussions in politics right now. Usually they're registered in Ireland or Malta as companies due to their specific banking laws. They make millions with those scams and people can't differ between legit online shops and fake ones - because the legit ones actually look crappier than the fake ones when it comes to the website designs.
In Germany, we have at least for hardware the "geizhals" website which is kind of an index for all kinds of electronics shops and they try to verify as much as possible.
But for other online shop sectors (e.g. clothing or home stuff) I wouldn't trust anything. Even on Amazon I got scammed a lot and heard absurd things from others...like getting packages with no content in them and Amazon refusing to see that the seller is a scammer etc.
The only time something will change is when traffic starts decreasing to their site, but it's good enough such that people won't change. Look at Facebook, I don't know anyone who uses it as much as they used to 10 years ago, but it's making the most money it ever has. Why on earth would any behavior change? From their points of view, everyone is happy with it!
being willing to make other things in order to have more money always creates cesspools.
Of course there are scammers, that’s part of what makes organizing so hard.
Cynically, I think that Google is worse as filtering scammers is because they care less now. Half the page is ads so they make money either way.
basically, like everything in modernity, its a race to the bottom of the infinite dullards of popular
A great opportunity for students and public servants to sell premium URLs.
It's the same thing as the tweaks you have to perform for SEO optimisation, some have questionable value to the end user but you jump through the hoops anyway because it's what is done, by pleasing the robots you're rewarded with a higher search position.
I mean it's pretty reasonable, if a site has been around a long time it's going to be generally 'good'.
The "fiddle with H1" or "write X amount of words" or "buy Y number of links with a % of anchor text" is silly.
Semantic HTML has been created to help screen readers and browsers understand content organization, it having been hijacked by SE is just a side-effect.
Now, when the website needs to not only contain content, but also be its own advertisement, writing it in a way that will maximize virality is the natural course of action to make sure the site actually gets seen.
This will likely be true until a method of finding webpages that is not based on automated scraping or the page itself.
The result is that ACTUALLY USEFUL articles are buried on page 5. Any slightly helpful bit of content in the top articles are repeated (using different grammar of course) in all the other "top" articles.
The format goes like this: Lately people are searching for XYZ but is it safe to search for XYZ? What experts say for XYZ? To find out continue to read our article.
Then it's followed by wall of text made of keywords(in sentences that don't make sense), if you are lucky there would be the opening hours(which are often not accurate) somewhere down the text.
But that doesn't stop there. Even actual news articles are written for the consumption of the Google bot, the sentences often don't make sence, they are repeated multiple times with the synonyms of one of the words, making it into a lengthy article that doesn't have any meat beyond the title.
I argue that the problem is not SEO experts with low ethics, the problem is the way the business is structured. SEO experts don't do it for the sake of the art but because they are paid to do it. They are paid to do it because it has a positive ROI on bringing eyeballs and people pay Google for eyeballs, then Google pays those who generate the eyeballs.
Isn't it better for Google and everyone involved if you can't find what you are looking for, continuing your search brings more eyeballs? It's not like you are going to switch to Bing? You are also not going to abandon the internet and go to a library.
Entertainment/news sites are chock full of pages like "<whatever>, what we know so far, release date, cast, will it be renewed, has it been cancelled..." pages that spend many paragraphs saying "we know nothing, randomly plucking crap out of thin air we could guess something-or-other but that remains to be confirmed". A new news story, film, show, or even just a hint of something, and the pages go up to try capture early clicks. Irritatingly they are often not updated quickly when real information becomes available or that information changes (particularly over the last year that has affected release dates). I have several sites DNS blocked because that annoys me less than getting one of these useless/out-of-date pages more often than not when I follow one of their links.
From personal experience, I switched to another tool (DDG) a couple of years ago. When I occasionally try Google, for 95% of common requests I'm appalled by the results: the top is only SEO garbage. For very specific and precise searches (where people are not trying to game the system), Google is still the best, though.
I've noticed a rise of that as well. With some searches such spam is all I've received. But that's really a problem in all languages Google supports I think.
There's even malware that infects websites and generates such content, not sure what's the point of that. Anyone knows?
I changed the default search engine from Google to Bing and DDG in all browsers. Google does have better results, so sometimes I still need to use them. But for 90% of generic queries such as the weather, product information, or finding a company's website, Bing is good enough.
It would need an option to ignore any form of news media in search results.
SEO used to be extremely gameable (seniority of site, keyword stuffing, backlinks), but these levers aren't as obvious now, if at all.
A decade from now, Google will have made no improvement.
One day Google may introduce multiple search rankings, where one of them is SEO and another is the "useful things". But I don't hold my breath.
Maybe it's just because I'm searching for technical stuff but DDG and Google are both a big source of frustration for me,
DDG thinks I mistype most of my queries and will desperately try to correct my 'mistake' because "surely nobody is really searching for documentation about ARM32 bootloaders, they just mistyped when they were really trying to look for a webshop that sells 32 different ARMchairs and ARMy boots.".
Google will understand my input at least half of the time but uses that power to show me the power of websites that do some article/keyword scraping and run GPT on it, or this great new Medium blogpost with two paragraphs of someone copying a Wikipedia summary of what ARM is and copy pasting build instructions from a GitHub README.
I've tried searching github.com itself but that's just a nice way to find out that apparently most of the data they store is just scraped websites, input for ML models or dictionaries and they will happily show me all 9K forks of the one repo that contains the highest density of these keywords.
/rant
Good thing /etc/hosts has no size limit.
Using google search console you can determine if a manual action has been applied to your own website: https://support.google.com/webmasters/answer/9044175?hl=en
Rather than determine the ranks, these actions remove / punish offending websites from the ranks, effectively making room for 'good' actors.
Manual actions often come after a a significant change in ranking algorithm or policy, and can be reverted / resolved in some cases. This usually requires removing or disavowing (in the case of unauthorized or unresponsive sites) the links pointing to a website.
wow that's amazing, I guess I sort of quit reading blogs like this when all the RSS readers died.
>Google issues a manual action against a site when a human reviewer at Google has determined that pages on the site are not compliant with Google's webmaster quality guidelines. Most manual actions address attempts to manipulate our search index. Most issues reported here will result in pages or sites being ranked lower or omitted from search results without any visual indication to the user.
Doesn't say much for Google's ability to determine relevancy in linking or recognizing suspicious link growth. Or perhaps it just takes some time ...
Nowadays, unnatural links are mostly ignored.
And, buying or otherwise, I am not sure what the mechanism is for bringing this to Googles attention.
I doubt there is another acquisition channel for a project like this that would compare to SEO (and not just Google).
quite a strange think to say about a company whose bussiness is based on selling links (to ads)
If they're not owned by the same entity, then this blog post is rather odd: https://html-online.com/articles/scoreboard/
(To be fair, that entire blog seems odd...)
I've been making websites for 24 years. Making a website has always been quite hard, especially for a nontechnical user, and there has always been scammers happy to take their money. What's worse is that a lot of the time the scammers believe they're actually selling a good service. There have always been people happy to chuck any old rubbish up on a domain and call it a website, even if it was full of scammy links, stuffed keywords the same color as the background or in tiny text, with JS that overwrote your browser history and blocked the back button, with no context menu, etc etc.
Its annoying, and sad, for those of us who care and consider ourselves professional. But it definitely wasn't any better years ago.
Why is Shopify worth $150 billion? Well, other than the bubble, this effect is why. People can't easily build their own ecommerce sites, can't integrate everything they need to, in a way that doesn't cost them a small fortune.
Wix is a pretty mediocre service, clunky and slow. It's worth $15 billion? How in the world does that happen. Well, building sites is super difficult for most people. The opportunity to make that problem better is, apparently, huge.
Could someone inject links into content in such a way that you cannot find the link in your own source or even your hosting stack?
But even more imaginative would be to work it into the kernel or the ssl layer somehow.
Use other search engines is the only way to do something.
https://www.researchsquare.com/article/rs-8615/v1
(It's on page 24, at the bottom of the References section.)
The network tab in devtools isn't loading Google Analytics on you site. I think the bigger conspiracy is that Google isn't giving high search result rankings to websites that don't include Google Analytics. Part of the reason is they use time on site after following through a search result link as a dimension of quality for that search result. If that makes sense? They give 10 search results and their algorithm can tell if the search result satisfies the end user's request if they don't go back to the search results but rather continue on that site.
Lastly, clicking through a search result to your site might not give the searching user what they are looking for. Amazon discovered every time a person has to click they are far less likely to purchase an item so they created one click. Your competition makes it visually clear what their site does. You probably would get far more retention on the original click to your site if you have an image of what the end product looks like in a hero, front and center (with all the meta tags described in Google's document on SEO of course.) That way people won't click back to the search results page which Google is tracking as a dimension.
[0] https://static.googleusercontent.com/media/www.google.dk/en/...
While I like the thought progression you're going through, this is a "not really." Google has confirmed a number of times over the past 15+ years (going back to the Matt Cutts era) and even in the document you linked that the meta description does nothing to influence ranking in the SERPs. However, the meta title does influence ranking.
I'm on mobile, so unable to dig in right now - but my guess is either this has something to do with the meta title, or the specific anchor text of the backlinks that are getting inserted via the app in question.
Aside from that, agree 100% with your other assessments.
It's 2021 and surprisingly for all the billion dollar A.I. it can still be gamed with a bunch of unrelated links with little or no connection from the article to the site.
Also it's pretty unnatural and shady to get these backlinks. For my own SaaS site almost every blogger I contacted for a review just straight up asked me money in exchange for link. What the software did was of no consequence to this exchange. Most sites which have these "list of 10 XYZ" are just similar money making scams yet they rank so highly on Google.
P.S. And likewise I too get dozens of emails daily with "offers" from free article to actual dollar amounts just for putting a paid link. These SEO guys are just relentless because such shenanigans are working great at beating Google so far.
Google (and others) keep up the narrative that they're important so that black and grey hat SEO folks keep focusing effort in the wrong places.
Source: ran the web spam detection team on a different well known search engine
I was just talking to my SO about this the other day when we were trying to find an air purifier for allergies. I'm the kind of person that likes to compare products a ton before dropping more than about ~$100 on anything. The way the internet has become in the last 10-15 years has made this increasingly more difficult. You really have to dig to find in-depth unbiased content on anything someone stands to make money from. For every 1 good review there are 100 'top 10 best ranked' blogspam sites..
Wow, embarrassing for Kaspersky as a computer security focused site to be a victim of this.
When I searched for "Rubiks" as it said to do, I couldn't find it though. Has the Kaspersky post been changed?
But in the meantime, yep... It sucks.
Or paid the entity running the malware HTML editor. It's probably injecting links to a variety of sites who paid them for placement.
Edit: Wow, this is much bigger than just those two sites. Looks like half the internet is down. https://downdetector.com/
In 2015 I was fired because some issues on a site that I was working on because some friction with the company owner. Two months before I was fired I reported that some links to others sites non related to our service was on the initial page (some porn and some scams pages). After that I heard from my ex-coworkers that a manager from another area from the company told that I was fired because I was linking porn on some pages from our service. I didn’t knew at the time that those tools existed, but only today I realized that it is an option.
I was really sad with that manager and didn’t understood the reason to lie to my friends the reason of my demission. But is nice to know what may have caused the issue. Better late than never hahaha.
I also wrote a tutorial on how you can build an infecting proxy too [2]. Doesn't work anymore though since HTTPS is everywhere. Thank god
[1] https://blog.haschek.at/2015-analyzing-443-free-proxies [2] https://blog.haschek.at/2013/05/why-free-proxies-are-free-js...
Google's old link-based authority algorithm, pagerank, isn't alaysing the same web anymore. I think there's barely any signal in links these days.
The first major event was Google itself. Once you use something as a metric, it becomes currency. SEO vs anti-spam became a defining cat and mouse game. This kind of stuff was born then, and antispam was meant to curb it.
The second major event was user generated content. The old link pages and blogrolls die slowly. Comments, twitter, and such become the way links are shared. High signal, but extremely spam prone. Google tapped out of this early, and mostly ignore user generated content.
The third major event is facebook, and facebook like ways of doing things. This made most regular people's content unindexable. Search for esoteric keywords used to return a lot of forum results. Still does, to an extent. The thread is usually years, or decades old. What's left on the open web is a subset, a non random subset.
Wikipedia is one of the last sites that does "hypertext" the way pagerank assumes the web works.
In any case, I feel like search (or what search used to be) is in decline. There isn't as much web to search anymore, in a sense. The broad brush way of doing antispam (eg user generated content is just ignored) makes more sense. Why deal with all that noise/spam, just to search what's left of the old web.
What's left? User behaviour, a la analytics. That's makes for more feedback loops and winner takes most dynamics. Localisation became localisation to your bubble. Meanwhile "officialness" measures aren't against google's ethic/aesthetic anymore. They got burned by the "fake news^" crisis, and the quick fix was officialness. In for a penny. In for a pound.
Meanwhile, web search is increasingly just another thing that google search does. It searches "your" data, content of your devices, search history and NN generated whatnot. It searches news, ads, returns answers to questions, does math... There's nothing new about seo scams, antispam just isn't Google's primary solution anymore. Just default to other ways of returning results.
I'm calling it. Web search is dead. Long live the new websearch.
^Circa 2015 usage, not the current
IIRC with PageRank there were very specific values associated with 'toolbar PageRank', e.g. a PR7 link could be sold for $1K a month. Understandable because at that time there was no context to PageRank at all, it was simply about being linked to by an "authority". This was 20 years ago though.
Let’s face it... the early internet was interesting because the only people who could use it (and publish on it) were smart eccentrics. That was its charm. The technological hurdle served as the curator: you might have been a crazy white supremacist, anarchist, conspiracy theorist, or ‘expert’ in how to grow radishes or some other bizarrely eclectic field... but all of them were necessarily a bit smarter than the average bear just by virtue of knowing how to host content and access it; not a trivial task in the late 90’s.
Maybe it’s time to think up some convoluted alternate network that is a royal pain-in-the-ass to use. Perhaps there the eclectic and useful content creators will once again arise (and searching their trove will be a snap as most everything there will be fresh, unique, and interesting.) It will exist, I suppose, for a few years before tools are made to enable grandma to easily use it.
By the way, I develop proprietary software. Hope that someone reads at Google and stop indexing all those pirate websites where people steal from others. Not torrents, talking about those websites where they even sell you paid access to stolen stuff.
Serously Google? You can't filter "nulled"?
Once I discovered that everything I would ever need was better explained on the MDN my life as a webdeveloper strongly improved.
Surely some kind of fairly trivial NN/Not very deep learning system can classify HTML content so that out of context links (like "Learn how to solve a Rubic Cube" in a Seventh Day Adventists sabbath lesson) and content that is copied is ignored or marked down.
Whilst I'm sure GPT-3 could be used to create more realistic looking fake content - this would eliminate 99% of the script kiddies creating low value SEO spamming sites.
https://en.wikipedia.org/wiki/Accelerated_Mobile_Pages
* forgive my RAS syndrome
As others have pointed out and the author acknowledges, he is technically injecting links when his users embed their scoreboard on their website through an auto-included link-back to his site.
Now, I don't frown upon this. It is not deceptive and its placement is more than relevant.
The same cannot be said for the scheme the author uncovered. But whether it is violating Google's TOS is another question. I'm not sure of the answer.
Any notes on how to reproduce?
Maybe clear cookies and try from a diffferbroawer?
Here's an example of one https://html-cleaner.com/
I hope someone figures out which other campaigns were run with these tools. Also, whether you can find output with the link injections in source code, like on GitHub or distro packages.
I found uBlacklist from this thread, and the subscription functionality enables some collaborative effort.
So I've started making a list, but unfortunately there aren't many uBlacklist subscription lists out there yet.
Be interested to see how far this could go: https://github.com/rjaus/awesome-ublacklist/
Google just seems to give way too much weight to domain name matches with the search keyword.
And then you get re-direct to some prize-winning spam site.
I love getting a search result that includes Google Books because those are usually useful. That’s what Google was best at, bringing in things that weren’t regular web pages.
This unknown exchange of value for “free” products and services is what everyone from Facebook and Google down to malware-like browser extensions do to extract difficult-to-acquire resources.
People don’t understand how their personal data, internet connection (residential proxy network node), or in this case, publicly displayed website are being monetized or used indirectly for monetization.
People don’t know or are tricked into allowing themselves or their resources to serve as an ugly cost externality to some other clean-looking business endeavor.
Well, unfortunately this is basically how every freemium tool works. They have some way of advertising, in exchange for free use of the tool.
Even reputable CMS tools like WordPress include back links to wordpress on a new site and themes.
Although, this is much less common with open-source free tools, as the community resists these kinds of changes.
No such thing as a free lunch!
https://html-online.com/editor/
In case you cannot view it the banner across the site says now "Goodbye!
This site has been penalized for unnatural link building and will be removed from Google Search
Please bookmark if you wish to continue use of the site.
We are sorry and are working on fixing the problem to recover from the penalty. "
They are only sorry they got caught
more importantly, what ive also learned is that Bing search results are less of an affiliate link cesspool because fewer SEO spammers are working at gaming Bing's results.
Great.
Nobody cares about the content apparently. Nobody checks if the generated HTML makes sense. It's all about spinning the wheel.
Sigh.
A year later Google's John Mueller, a trends analyst who often also acts as a liasion between Google and the webmaster community, stated that Google might automatically apply a 'nofollow' attribute to these types of links, effectively killing their ability to siphon SEO link value to improve themselves: https://www.seroundtable.com/google-auto-nofollow-widget-lin...
We have noted in our agency research for clients several similar usages over the past few years that appear to be giving websites positive value instead of either being ignored or penalized, including a WordPress plugin that injects links on government and collegiate websites. The way Google assigns value based on links has changed quite a bit over the past 5 years and there is a chance they no longer penalize for widget links (unlikely) OR that their ability to detect them has degraded significantly (my guess is the later).
One thing is for certain, Google absolutely retains the ability to manually devalue links and penalize a website for violating their guidelines. They do not enjoy negative press or communinity discussions on search quality like this one and in the past have taken swift action when such issues arised in the media.
At our agency we advise clients against this type of link building as it has no long-term value for a brand and could cause long-term pain instead. SEO should be used to help new brands gain a competitive advantage against more established incumbents such as a startup taking on Amazon or a new SaaS tool providing valuable data to an industry.
Developers paste their data to online websites too frequently these days.
The problem is that people will always try to game the system :/
If you're decided on googling for a suggestion of a tool, at least include "open source". Even if you're searching for proprietary tools, you'll probably find the traditional "it has better X, Y, compared to proprietary tool W" review.
We had a compititor who spams his page full with SEO garbage Words, our Software is used 100 times more than his software, more people search for our software, click it and use it, link it, but who is on 1st place in search results? Right, the SEO spammer, with the slower page, full of shiny SEO words that has nothing todo with the software.
@google i wait for working AI that detects such garbage sites!
You can't pretend this isn't funny as fuck lol.