Anonymous Source Shared Leaked Google Search API Documents (opens in new tab)

(sparktoro.com)

333 pointsandrewfong1y ago296 comments

296 comments

This just proves all the "suspicions" privacy-conscious users have had about large corporations fingerprinting users, often in very obvious ways. There's often no better place to find ideas for surveillance than the people conscious about being surveilled.

p3rls1y ago

Many of the SEO suspicions were confirmed too.

I found it VERY amusing if you go to r/SEO just yesterday there were moderators and flaired users (you know, the elites of the SEO community, lol) insisting much of this was "debunked" years ago.

They of course deleted their posts, but the threads are still up. What a den of scammers over there.

https://www.reddit.com/r/SEO/comments/1d1eqjj/comment/l5tvfw...

https://www.reddit.com/user/WebLinkr/

I love how reddit is turning into the new SEO scam over night because of this stuff. Great work as always Danny Sullivan!

p3rls1y ago

It's just endlessly fascinating to me the grift on rSEO

How these types first gain moderator status on a few subs and then the spam begins (picture of spam https://pixeldrain.com/u/a6qUPjTq )

I haven't been able to find a single legitimate expert in the entire sub, and I've checked about every flaired user and moderator.

You have lots of people like the above, or https://www.reddit.com/user/jesustellezllc/ that claim to run an agency in Frenso California called Ozelot Media, but when you look him up there's nothing. When you google "SEO" + "Fresno California", Ozelot media isn't even in the top 100 results. Lol, I thought that was the job of a SEO-type? Why let that stop the grift though?

phone86753091y ago

SEO is vandalism and I one day hope the majority of Internet users see that

harry81y ago

SEO is just another form of advertising, with all the costs, benefits and externalities of any other form.

2 more replies

bobthepanda1y ago

Most people are aware but are powerless to do anything about it.

tyingq1y ago

Perhaps, though a world without SEO doesn't necessarily surface the best content either. Not everything about Google's algorithm that's subpar is because of spam or SEO.

1 more reply

theolivenbaum1y ago

Seems like a lot of it came from them inadvertently posting some internal API to GitHub: https://github.com/googleapis/elixir-google-api/commit/078b4...

renegade-otter1y ago

I guess too many people got laid off to do the whole "three reviewers per PR" thing!

eru1y ago

When I was at Google (about a decade ago by now), we had two reviews per PR; not three. Could you tell me more about the third review?

RandomBK1y ago

I think GP meant <Author>, <Reviewer 1>, <Reviewer 2>

dontdoxxme1y ago

And it's Apache licensed, which grants a patent license. Some of the comments refer to specific aspects of how page rank is calculated. Pagerank itself is past patent protection but I wonder if this also accidentally might grant licenses to other patents.

yencabulator1y ago

There's still an angle where the copyright owner claims that the person who caused this to happen did not have the authority to apply the license to it.

ec1096851y ago

Oops, someone’s script was too greedy when uploading those elixir api documents.

xnx1y ago

I believe these are the leaked docs: https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...

precompute1y ago

> My anonymous source claimed that way back in 2005, Google wanted the full clickstream of billions of Internet users, and with Chrome, they’ve now got it. The API documents suggest Google calculates several types of metrics that can be called using Chrome views related to both individual pages and entire domains.

What answer do the engineers at google working on this have for this violation of privacy?

GuB-421y ago

I am not an engineer at Google but this is I would say if I was.

We don't know who you are, you are just a number in a database, and we don't even know what number, we just get the total number of visits for each website, not who visited it. It is like counting cars on a highway, not following your car. Plus, it serves the useful purpose of providing you with better search results, the terms and conditions allow it, and it can be disabled.

voltaireodactyl1y ago

The obvious response being that counting cars on the highway is a necessary first step on the road to identifying and then tracking their movements.

Similar to how insurance companies have offered voluntary, “anonymized” data dongles for discounts that are now being used (or at least revealed to be used) to collect data most often used to reject claims.

Ferret74461y ago

Agriculture is a necessary first step toward a dystopian society, so clearly we should ban agriculture.

The logic does not follow. "A is required for B, B is bad, so A is bad" is not logically valid.

1 more reply

lolinder1y ago

> we don't even know what number, we just get the total number of visits for each website, not who visited it

This is not what a clickstream is. A clickstream requires that the sequence of clicks be preserved, and preserving that sequence undermines anonymity.

tommiegannert1y ago

It can be pseudonymous. It doesn't have to undermine anonymity.

Google researchers spend time ensuring k-anonymity (for reasonably large k) when using data.

3 more replies

raxxorraxor1y ago

That would be money. If someone has another excuse, they are naive or lying to themselves.

It certainly is not "to improve the net or advertising" - that would be the lying part.

Google has done some good for the net, but the scales of their contributions slowly but steadily move to the negative side.

azemetre1y ago

Reminds me of the studies they’ve done on cognitive dissonance/lying.

Basically if you believe lies you tell yourself, they tend to turn into truths in your mind over time. Even if you were doing it “ironically.”

danpalmer1y ago

Personal (not work related opinion): This basically can’t happen with things like DMA and GDPR. DMA in particular means you can’t share data across “products” without explicit consent. So you could for example collect websites that don’t work for the purposes of improving Chrome, but not then share that with the Ads/Search orgs for personalisation or targeting, as far as I understand the legislation.

Personal opinion about work at Google (still not googles opinion) I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally, and that user choice is respected. The engineers on the ground are absolutely making sure this all works, and most of us care deeply about user privacy. I have personally worked both on implementing new features that significantly push forward privacy, and on implementing privacy controls for regulatory purposes.

BrenBarn1y ago

The thing is that preventing "sharing" isn't sufficient. People who are concerned about privacy don't want any such data collected or stored in the first place, ever. The implicit "sharing" of my data with Google (or whatever company) is a problem in itself. Regardless of how "seriously" Google (or whatever company) takes it, for a lot of the data I don't want them to ever have it in the first place.

specialist1y ago

Yes and:

Require opt-in by default. In all cases.

All PII data at rest must be encrypted at the field level. Like how passwords should be stored. aka Translucent Database techniques. Not just in transit. Not just encrypting the whole database. But encrypt the actual fields within a database.

Constitutional privacy means personal sovereignty over oneself. (A superset of the folk definition of keeping secrets.) Meaning any and all data about me is owned by me. Any one using my data for any purpose has to pay me. (See opt-in by default above.)

troyvit1y ago

> The thing is that preventing "sharing" isn't sufficient.

Exactly this. It doesn't matter that google doesn't "share" what they gather if they own so many conversion funnels from top to bottom anyway.

danpalmer1y ago

This is a fair position to take, but assuming good faith all round, one that I think will typically be a minority. If you ask a user if they're willing to share crash reports only to improve the reliability of the software, I'd bet most people would be ok with this. In fact it's sufficiently reasonable that I believe GDPR allows this to be opt-out, something I broadly agree with. I do think opt-outs should be available, I do think there should be configuration available for those who do not wish to share anything, but if the laws are being met, in the right spirit, then I would hope it would provide little actual benefit.

5 more replies

verteu1y ago

> I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally

I believe the law is violated when it's sufficiently profitable -- it just requires VP permission.

No public sources for this except Jedi Blue, the old anti-poaching case, etc.

noprocrasted1y ago

> This basically can’t happen with things like DMA and GDPR

I'm sorry but this is just wishful thinking. It might be what the spirit of the DMA & GDPR want but definitely not the reality thanks to inadequate or outright non-existent enforcement.

There are businesses out there whose entire business model and revenue stream are based on violating the GDPR. Not some kind of internal conspiracy or rogue employee, but the entire company is doing it in the open and the result of its doings (targeted ads or spam) are visible out there in the open for all to see.

Facebook, credit bureaus, data brokers, "consent management platforms", etc. All these companies' business models are big, obvious breaches of the GDPR. Yet, they are... still alive and kicking?

There is no chance that a concealed GDPR breach (whether intentional or accidental) will get addressed when the biggest intentional breaches are still allowed to continue out there in the open.

I suspect something very similar is going to happen with the DMA - Apple is already acting in bad faith but have yet to see any consequences.

marcinzm1y ago

> What answer do the engineers at google working on this have for this violation of privacy?

The same answer you probably have for the millions of questions about what the things you do that some other people find offensive to their personal views and beliefs.

bdlowery1y ago

How is it a violation of privacy. Did you read the terms of service?

precompute1y ago

It's a privacy violation regardless of the ToS.

y421y ago

A tos announcement is not an explicit consent. I doubt that this will help in court, even pre-GDPR.

HelloNurse1y ago

Further, a TOS announcement can be easily construed as an admission of intent to fuck users.

9dev1y ago

See, that’s the nice thing about the GDPR: You cannot hide unexpected hostile stuff in the ToS anymore. If you don’t tell me what you do with my data in a way that is obvious, easy to understand, and most importantly easy to disable, it’s illegal.

vouaobrasil1y ago

Sometimes I wonder how much better the internet would be hits on Google weren't directly tied to revenue from Google itself through its ad program. I am certain Google has made the internet and the world a worse place to live.

eitland1y ago

As a user of Kagi and search.marginalia.nu I can tell you:

Quite a bit.

So much that now that I have what "everyone" asked Google for for years - that is blacklists - I hardly use them.

Why? Because with Kagi I get much better results out of the box.

I am fairly sure Googlers will tell me there are multiple safeguards to prevent the inclusion of Google ads from affecting ranking, to which I just have to say that the results speak for themselves.

Please note: I have only used Kagi for two years. I am only one user. But I am a user with 20 years of experience with Google and that got to count for something.

Nevolihs1y ago

I actually use pinning, blocking and raising/lowering the value of individual sites every day. I wish this is the direction search engines went in the first place and it's the direction I hope Kagi continues. I want a personalized search engine that's personalized by me, not by a company trying to profile me and make money off of my clicks.

the_snooze1y ago

When each user can personalize the results themselves, you make SEO completely impractical because they can no longer target a single monolithic algorithm controled by one entity. Websites would actually have to have organic appeal to users, who get the final say to hide away bad sites from the results page (looking at you, Quora, Pinterest, and Fandom).

eitland1y ago

I am all for Kagi keeping that feature. If for nothing else then to rub it in the face of every googler who have argued that it was impossible.

And if you use it I am happy, that gives Kagi an incentive to keep it around.

I'm just saying that for me the results are so good out of the box that with a couple of exceptions I never had to block anything.

scutrell1y ago

I was excited to try Kagi, but I couldn't justify the cost. I find DDG with the occasional Google search to function almost as well. I'll try Kagi again at some point, but it wasn't the panacea people here made it out to be

p3rls1y ago

Kagi is the same garbage as google in my niche. Even worse, maybe. It looks like it's weighing backlinks and SEO garbage even higher. Well done.

I don't know how people keep talking about it. The results, as you say, speak for themselves.

eitland1y ago

Well, don't use it then.

I am happy for alternatives, otherwise I guess Kagi wouldn't improve so fast in areas I care about.

1 more reply

abhijat1y ago

I switched to Kagi in June last year. I just realized I tried it initially because I wanted to try out blocking sites in search results, and I have only ever needed to block three domains.

eitland1y ago

Thats exactly what I am talking about.

Kagi is kind of like Google in 2009, seriously good coverage, good ranking

... but also:

- more modern

- more features (summarizer, bangs like in DDG, FastGPT and probably a few I forgot)

- blocklists for websites (and also options to pin, raise and lower)

- with actual support: report a bug and you get an answer from a real engineer, a follow up when it is fixed and a shout out in the relevant release notes

- no tracking

3 more replies

beeboobaa31y ago

Is kagi good for finding things like old forum posts (not reddit)? I know some of those sites are still up but google seems to ignore them.

nalinidash1y ago

Try search.marginalia.nu

From the website about: "This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed."

eitland1y ago

There is a seperate "lens" (think like "images", "videos" and "news" in Google, only there are more and you can create your own) for "Small Web" which only includes what they describe as "Results that favor noncommercial domains and topics".

(Other standard lenses include

- Forums

- Fediverse forums

- Usenet/archive

and I think 7 others.)

stuffoverflow1y ago

In my experience kagi is decent, definitely gives more forum results than google. I've found yandex to be the best at finding all kinds of forum /discussion sites.

packetlost1y ago

I dunno, the first thing I did was blacklist G**ksf*rG**ks from my search results (and others, of course) and I couldn't be happier.

esperent1y ago

Kagi is worth the money, but it isn't magic. It's about as good as Google was ~five years ago, before they made all the search operators stop work. There's also a whole bunch of things it's worse at that Google - especially local search and shopping. Plus I still get plenty of blogspam and AI generated crap from Kagi.

mozman1y ago

The search operators makes big difference in result quality, I also don’t like how Google now returns zero results for something obscure. In the past I could find something peripheral and eventually get to what I was looking for.

1 more reply

karma_pharmer1y ago

Kagi is simply reselling google search results.

super2561y ago

They do more than that: https://help.kagi.com/kagi/search-details/search-sources.htm...

1 more reply

ptman1y ago

They have many sources https://help.kagi.com/kagi/search-details/search-sources.htm...

1 more reply

amelius1y ago

How do you know that Kagi won't become as bad as Google at some point?

duckmysick1y ago

What's the argument you're trying to make?

That because there's a chance Kagi will become bad, there's no point in using it now and thus we should stick to Google, which is already bad? That doesn't make sense.

The same line of thought can be applied to anything. We don't know what will happen in the future, therefore we can't be sure that things won't go bad. Is there even a way to have such a guarantee?

Assuming I don't want to use Google (because it's bad) and I won't use Kagi or Perplexity or others (because they might become bad), what's the realistic solution? Roll my own search engine? I don't have resources and I don't trust the future me to maintain it.

breakfastduck1y ago

We don't, but a model where a user pays for a service rather than being free and ad supported is significantly less likely to enact user unfriendly changes.

If the way you make money is by convincing people to pay, you are highly incentivised to make the product good, especially where there are many other free competitors who are ad supported.

1 more reply

eitland1y ago

Kagi is worth the price every month, unlike a number of other things I have supported it is not an investment for me in the hope that it will one day be worth it but rather a service that I pay a small sum for which in turns removes lots of frustration from my life every month.

If they do become bad then at least I have had a fantastic search engine for another few years of my life like I had from 2002 until 2009 ish.

And also, already at this point, they and marginalia has proven that it isn't impossible to enter the search engine marked even now. This was long considered impossible, at least here on HN.

spacebanana71y ago

Also it’s unlikely Kagi will ever become big enough for SEO people to specifically target with manipulative content.

Even if they got 100 million active paying users it’d still be a tiny fraction of overall search traffic.

3 more replies

Workaccount21y ago

The fundamental problem with the Internet is that people don't want to pay for things on it.

No matter what, whatever we ended up with was going to be shitty and exploitive.

eitland1y ago

Now you have a chance. Kagi is there.

I made my decision two years ago and I would probably do it even if it was just on par with Google, to support competition and to avoid supporting Google.

But in hindsight it is just exeptionally much better. There is no going back unless Kagi does something monumentally stupid.

jacob0191y ago

I'm a Kagi user too. I like your enthusiasm, but I can't say it's been all that life changing for me. DuckDuckGo is ok too, I still use it on some machines when I don't feel like logging in. GPT has been more life changing.

1 more reply

tjpnz1y ago

How much of that is due to ad-tech companies like Google conditioning people into thinking that way? What if online payments weren't so god awful and allowed people to throw in a few dollars as easily as they might at a toll booth? That's still an unsolved problem too. Credit card companies have solidified their involvement in every facet of the process and the alternatives are non-starters for frictionless commerce.

I'm still happy to put my money where my mouth is and do pay for services which are genuinely useful to me. But this is not the kind of internet I imagined when growing up.

L-four1y ago

It's not that people don't want to pay it's that it's difficult to pay small sums. The web browsers could solve this problem but they make money from ads so it's not in there best interest.

nradov1y ago

Jakob Nielsen's article "The Case for Micropayments" from 1998 still seems relevant. Nothing has really changed in 26 years so I'm skeptical whether it could ever work, but in principle it would be great for users and website owners to have that option.

https://www.nngroup.com/articles/the-case-for-micropayments/

UI_at_80x241y ago

And this was one of the hopes/goals/dreams that crypto offered that I really wish had come true.

wslh1y ago

Google was really great and revolutionary, they helped zillions of small companies to thrive. It was another cycle.

Then, now, it is like media before the 90s: you need to pay a lot of money to be in the center page of the newspaper.

But, hopefully we are talking about LLMs now, seems like one of the answers to search engines in general. Beyond AI, I see LLMs as a good evolution from PageRank.

A little bit general but lately I use the expression: "Complexity as Scam". Google always pointed to their "algorithms" and played with this term as if algorithms couldn't be adjusted to whatever you want to be. Initially the coined term was sound because it was based on a scientific paper and eventually it evolution but it seems like the PageRank original idea has detoured from being a "pure" graph algorithm.

Another context where I use "Complexity as Scam" is Web3. It is like Matryoshka dolls where there is always one more step of complexity to probe a point, but it never ends.

benterix1y ago

It's not black and white. There was a lot of junk that was forced on us and that was removed thanks to Google. But I agree the direct relationship is inherently corrupting.

GTP1y ago

Larry Page and Sergei Brin even stated very clearly in their original paper that using ads as revenue source can impact the quality of results returned from the search engine.

DarkNova61y ago

You mean the way Google worked originally? The founders were very careful in creating a barrier between ads and search.

A barrier whose erosion has been well documented over the last 10 years.

vouaobrasil1y ago

A barrier whose only purpose was to establish trust so that it could be later taken advantage of.

DarkNova61y ago

As much of a cynic I typically am, there is a well established record of events which shows that this is not true.

Google search was taken over by an ambitious clique of failed yahoo managers that successfully destroyed their former company for their own financial advantage then did the same at google.

Acting as parasites on society at large.

heresie-dabord1y ago

Instead of a semantic Web of knowledge, we got "grep the HTML... with ads".

josefx1y ago

You dropped the -v . Modern day Google seems fine tuned to return results that contain everything except for the words I searched for.

greg_V1y ago

I mean... maybe, but not really. The first problem of the internet was that there wasn't that much content specifically. The first internet companies were the broadband providers who were developing content themselves, like AOL.

Google and the ad ecosystem they acquired was basically the flywheel that spurred content creation at scale. Anyone could jump in, follow a few guidelines and earn a living by producing content on the internet. The Youtube acquisition and monetization followed the same pattern.

Over time the market consolidated and got less and less competitive: less platforms with complete control of traffic and one-sided revenue sharing agreements. The guidelines so to speak on how content should look and feel like were algorithmically made stricter and stricter until everything looks, feels, sounds and reads the same.

The problem right now is that the platforms are still tightening their grip, and it's all tied to the approach of using AI to replace the content creators on the platforms from Google to Spotify to Meta, and carving the spared money to shareholders. And while the web has been shitty for a few years now, we're now seeing a sudden drop in quality because the average user has no recourse or alternative, and neither does the average creator have the means of distribution and monetization (not just publishing, that's been solved) to even find, let alone meet the new kinds of demand.

I'm certain that in a few years this will even out: new search engines, new aggregators and new feeds will emerge, but the content - money - network problem triangle remains as a fundamental problem of the internet.

linsomniac1y ago

Did you experience the Internet before google? The idea of a world where Alta Vista won is truly chilling.

thsksbd1y ago

You mean a world where people still knew how to use a library catalog, still relied on more than one source of information and curious crazy tid bits are still out there?

The internet is boring. And the trash is still there. Its just become reputable instead.

linsomniac1y ago

There's a lot to unpack here...

Can you expand on how a card catalog improved the world? As a kid I used the card catalog a lot, both the physical version and the later electronic versions. Full text search definitely leads to pulling in information from a wider selection of sources.

I remember a lot of stratification of news sources pre-Google (which news channel you watched, which papers/magazines you read). Did Google cause reliance on one source of information, or does Google simply exist in a world where people tend towards echo chambers? How would Alta Vista have improved that?

1 more reply

badpun1y ago

> still relied on more than one source of information and curious crazy tid bits are still out there?

I think the curious crazy tid bits are still there.

washadjeffmad1y ago

I'd be okay with a world in which everyone else in search didn't lose, too.

msk-lywenn1y ago

In some way, didn't Google become Alta Vista?

linsomniac1y ago

How so? My memory of Alta Vista was so-so search results with a top page littered with garbage.

4 more replies

vouaobrasil1y ago

Yes, I did! I used to use Yahoo search where the results were more hand-curated and people did not create websties for intensive commercial purposes with useless SEO fluff like it is today.

linsomniac1y ago

I have been thinking a lot about Yahoo (pre yahoo-search, largely) lately. I don't fully understand how we lost the curated catalog, especially considering the success of Wikipedia. The latter demonstrates users willingness to curate knowledge bases... We have "awesome" lists, but I rarely seem to use them.

1 more reply

blowski1y ago

I imagine it would be a different flavour to what we have today, but the same intensity. Anything that so deeply penetrates daily life across the globe is going to bring enormous problems with it.

1vuio0pswjnm71y ago

There is something truly strange about the idea than people "trust" a website operator and can rely on it to provide them with useful information when that same operator is well-known to be secretive, deceptive and dishonest in order to protect its own interests. It's like imagining that a fact witness who tells the truth on some occasions and lies on others is credible.

https://ipullrank.com/google-algo-leak

nsmog7671y ago

I work in search and didn't find anything surprising in here. But that's mostly because I've just assumed Google has been lying for years about many things, such as not using click data or Chrome data.

I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).

ec1096851y ago

If anyone is surprised about chrome sending urls to Google, you can turn the “feature” off by unchecking “Make searches and browsing better” in the sync section of Google chrome settings.

Creepy.

HenryBemis1y ago

Or, and hear me out, you never use Chrome again, in any platform.. like ever ever again.

smegger0011y ago

I only have chrome installed for a couple of work related sites that don't display correctly on firefox. I dont get to choose not use the work related site and MS edge likely isn't any safer and also is not available on my choice of operating system

SSLy1y ago

you could use ungoogled-chromium, brave, vivaldi

Terr_1y ago

"But what if I don't want my own computer to build and share a detailed profile of everyone I know, everywhere I go, all my preferences, and how to manipulate me?"

"Well obviously it's your fault for not picking the 'Don't Be Cool' option on subpage 27b-6, duh!"

ralfn1y ago

Yeah. It's victim blaming. Reminds me of "they should have shouted louder".

The confusing thing is the crime itself is small on an individual level. The question is: does it add up cumulatively if a small crime is committed against many?

juleiie1y ago

A small crime can result in massive power. Knowledge is power.

Barring the ethics you can single handedly use such data to manipulate stock market, countries etc.

It’s just too much power

kulshan1y ago

I don't know if it's "Victim Blaming"...I teach Digital Literacy courses for seniors new to technology. While I do set them up with Firefox and Ublock, we generally have them use Gmail as they are all Android Devices. Google sends a confirmation email to walk each one of them through their security settings. Of course most users just ignore this email (like I used to have students do) but now we go through it and uncheck this setting in all my courses, and unpersonalize ads as well. Feel like the most basic user who has even the tiniest concern of data privacy should know how to look at their Google Account settings. These are 80 year olds who don't even know what a "click" is but they know to be skeptical of using Google.

2 more replies

andrybak1y ago

> unchecking “Make searches and browsing better”

Before that, you can make it audible: <https://github.com/berthubert/googerteller>

precompute1y ago

Is that part of Chrome not open-source?

alexvitkov1y ago

Presumably no, I haven't seen any overly creepy shit in Chromium. There's a project called ungoogled-chromium that tracks all the Google junk in Chromium and gets rid of it, their patch set is actually surprisingly small:

[1] https://github.com/ungoogled-software/ungoogled-chromium/tre...

noman-land1y ago

Imagine thinking you can escape your abuser by living in their house and asking them politely to stop.

1 more reply

thih91y ago

> Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot

Does anyone know more about yoshi-code-bot and how were these documents suddenly published?

Was it a script misconfiguration? A manual push? Something else?

chx1y ago

https://github.com/yoshi-code-bot

Created 1,891 commits in 19 repositories

All 19 is under googleapis

This looks like a bot Google uses to publish their stuff on github and so likely it's a misconfiguration.

ilrwbwrkhv1y ago

And that's why if a developer doesn't use Firefox and uses Chrome, they are just helping a monopoly take over everything and make a mess.

dgellow1y ago

Any user, not just developers

olliej1y ago

Developers just replaced IE as the only thing they develop for with chrome, users then _have_ to use chrome because of web developers who only develop for chrome and consider any behaviour other than "it works in chrome" as a bug in other browsers, just as they did with IE.

Then there's the relentless parade of "alternative browsers" that are just chrome skins - a period IE also went through - that intentionally try to trick people into believing they're not just using chrome but with less security engineering, and more scams.

barbariangrunge1y ago

It became trendy recently to break compatibility with Firefox. Blogs almost bragging about how they boldly made the choice. Very embarrassing stuff

1 more reply

dgellow1y ago

You’re conflating lots of unrelated things. IE was a horrible browser to support because Microsoft deliberately implemented their own incompatible version of web standards, or refused to implement modern standards. The push to deprecate IE was because it was creating a massive burden, I personally dealt with IE6 support in corporate world and can attest it’s depreciation was necessary.

What you call chrome skins isn’t a thing, people are building softwares on top of Blink, the rendering engine used by Chrome. The issue here is the risk of ending with a single rendering engine for the majority of the browser market, a diversity of engine ensure a good respect of web standards, that has nothing to do with privacy or security.

When you say “they just replaced IE”, that was >10 years ago…

1 more reply

metadigm1y ago

As soon as they add the ability to configure shortcuts, I'd more than happy to. After several years of requests, we're finally seeing some movement on their end.

ilrwbwrkhv1y ago

I hear you. But at the same time our duty on this planet as developers is to take one for the team when it comes to minor issues like this, which I am assuming this is for you. Otherwise the world will be consumed in the flame of the monopoly which others do not care about who do not understand browser engines.

HeatrayEnjoyer1y ago

Shortcuts? Like bookmarks?

cpeterso1y ago

Or search engine keywords in the address bar? https://support.mozilla.org/en-US/kb/how-search-from-address...

Or adding “top site” shortcuts on the Firefox New Tab page? https://support.mozilla.org/en-US/kb/customize-items-on-fire...

precompute1y ago

From the article:

Boosting "organic traffic":

- Brand matters more than anything else

- Experience, expertise, authoritativeness, and trustworthiness (“E-E-A-T”) might not matter as directly as some SEOs think.

- Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.

- Classic ranking factors: PageRank, anchors (topical PageRank based on the anchor text of the link), and text-matching have been waning in importance for years. But Page Titles are still quite important.

- For most small and medium businesses and newer creators/publishers, SEO is likely to show poor returns until you’ve established credibility, navigational demand, and a strong reputation among a sizable audience.

TL;DR: Clickbait + bot farms are the way to go. No wonder the internet is going to shit.

BillFranklin1y ago

FYI, it's much easier to read the linked GitHub code via the published docs at https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...

BillFranklin1y ago

In particular, https://hexdocs.pm/google_api_content_warehouse/0.4.0/Google...

Notably, for people on HN, it looks like there is indeed an internal initiative to promote small personal blogs :-)

> smallPersonalSite (type: number(), default: nil) - Score of small personal site promotion go/promoting-personal-blogs-v1

SquareWheel1y ago

Well, maybe. It's a factor that a twiddler can influence, but we don't know if that's done positively or negatively. It might also be more conditional, like for specific types of queries.

For example, a small, personal blog might be great for solving a specific technical problem ("my dishwasher of model XXX has YYY problem"), but might be terrible for something like giving public health advice.

iamacyborg1y ago

We don’t know whether that particular module was used to promote or downgrade small sites in the SERPs.

llmblockchain1y ago

> GoogleApi.ContentWarehouse.V1.Model.AppsPeopleOzExternalMergedpeopleapiAboutMeExtendedDataPhotosCompareDataDiffData

Java, is that you?!

deely31y ago

https://news.ycombinator.com/newsguidelines.html

> Omit internet tropes.

ziddoap1y ago

Indeed. Eschew humor. Avoid anything not super serious. Laughs aren't allowed on HN.

shepherdjerred1y ago

There's a difference between humor and pattern-matching memes like you see in Reddit threads.

1 more reply

lazide1y ago

Missing the ‘ManagerAgentUtil’ at the end.

resolutebat1y ago

FactoryFactoryImpl

lazide1y ago

Builder

2 more replies

isaacfrond1y ago

Most of the factors in ranking a page are no surprise. But i was surprised that having Product reviews on your site is apparently a demotion? Surely, many people are searching to find just that?

unnamed76ri1y ago

Years ago I had a site for deep fryer reviews. The whole thing existed to make money from Amazon’s affiliate program. I hadn’t personally used ANY of the deep fryers. Was just writing reviews based on features and other people’s reviews. In short, I ranked high in Google and added nothing of value to the world with that site.

There was a brief period of time where I made decent money with it until Google deranked all the product review websites.

b1121y ago

This is likely more about reviews with affiliate links. 99.99% of those are people reviewing absolutely nothing, just copying reviews and putting their own affiliate link.

zeroCalories1y ago

Sites spam low quality product reviews with affiliate links to Amazon. This is done by "reputable" sites as well. I don't blame Google for down ranking this meta.

nottorp1y ago

We are, but I’m not sure there are any real product reviews left on the internet.

sidewndr461y ago

Other than reviews of Google search itself obviously

nottorp1y ago

Are there? I can't [1] write an objective review, I can just subjectively say that it's been more and more useless to me in the past ... 7-8 years now?

[1] Or maybe can't be bothered because I stopped caring ages ago.

cqqxo4zV46cp1y ago

“xx,xxx five star reviews” I’ve found is a modern day over-marketed product trope. It feels well within the realm of reasons that this ends up serving as a useful heuristic.

yieldcrv1y ago

I don’t trust conflicts of interest, if that’s about a site selling it’s own product and having reviews, I’m glad to find that results in a demotion

While bigger marketplaces have other ways of driving ranking

ren_engineer1y ago

most of these have been outright publicly denied by Google employees, despite people showing with A/B tests that things like CTR and backlinks impacted rankings

skilled1y ago

I would usually call this a dupe but this article and the other one from SparkToro are completely different even if they are on the same topic.

Haven’t had a chance to look at the API myself but the first impressions are that a lot of this was suspected by SEOs, but Google kept rejecting the ideas. Looks like clicks increase ranking for sure, which means click farms definitely have a legitimate business solution to offer.

JSDevOps1y ago

Seriously considering switching back to Firefox after all these years.

jasonsb1y ago

What's stopping you? I use both browsers and I see no reason why someone would pick Chrome over Firefox at this point in time.

4gotunameagain1y ago

While the reasons someone would pick Firefox:

  - Privacy
  - Tree style tabs

SushiHippie1y ago

- uBo works better in firefox https://github.com/gorhill/uBlock/wiki/uBlock-Origin-works-b...

1 more reply

blitzar1y ago

(Some) sites don't work on Firefox.

Sure it isn't frequent, but it is frequent enough that once a day or so I have to open chrome to do something.

elaus1y ago

Seriously curious what sites those are, especially if it's not the same page every day. It literally never occurs to me (using Firefox again since 3-4 years) but I mostly browse dev-related websites.

6 more replies

ilikehurdles1y ago

Once a day? That’s huge. What sites? (I use Firefox daily for about the last year and haven’t had this kind of issue)

sangeeth961y ago

ICYDK, do consider reporting on https://webcompat.com if you see them.

Nuzzerino1y ago

Have people never heard of Brave?

thisisit1y ago

for now the seamless extension switching using Extensity. I am yet to find an extension on Firefox which can deliver this functionality.

metadigm1y ago

No shortcut configuration.

GuB-421y ago

I have used both for many years, and now, I see little difference in practice. I am leaning more towards Firefox these days. Main change is that I now use Firefox as my main mobile browser for ad blocking reasons. A few websites don't work on Firefox, I use Chrome for these few.

I don't consider it a problem to use two browsers at the same time, I usually don't to the same thing with them, so having separate profiles can be an advantage.

Note that privacy is not the reason why I am using Firefox. It is just that I think that knowing both is a good thing, and they are both good browsers, so why not? In some case, Firefox is better, in others Chrome is better, most of the times, they are interchangeable.

mind-blight1y ago

I've been using Firefox since Chrome forced users to sign in to the browser with their Google account, and I'm quite happy.

The only time it's a problem is when a site detects Firefox and won't display unlocked your using chrome or IE. I've only seen that a couple of times in the years since I switched back

Frank23121y ago

Even in that case,there are Firefox extensions to change your user agent. Suddenly the app requesting Chrome/Edge works perfectly, even though we are running in Firefox.

kernal1y ago

How did Chrome force you to log in? I've been using it signed out for the longest time.

mind-blight1y ago

Back in 2018, Chrome released an update that automatically logged users into the browser if they'd logged into a google. It was done silently and automatically, and it was a pain to log out of. They faced a ton of backlash (https://www.pcmag.com/news/google-faces-privacy-backlash-ove...) and rolled the feature back, but that was the tipping point for me. I've been a happy Firefox user since

WhyNotHugo1y ago

Firefox is better than Chrome [in the privacy aspect]... but still pretty terrible.

It sends a lot of "analytics" and "tracking" to some of Mozilla's servers, but if you inspect the requests, those servers are actually behind Google's CDN,and Google does the TLS termination.

So... Google has access too all the data that Mozilla sends when it phones home. Some of it even has a unique identifying id.

Ringz1y ago

I've been using Firefox since the days when it had the other name. Meanwhile, I use Floorp [1], which is based on Firefox, but offers much more possibilities for customization. I am very satisfied, except for the stupid name...

[1]: https://floorp.app/en/

MrAlex941y ago

Not to be a stickler, but just a note it no longer counts as FOSS or even open source I believe, with their new licence: https://github.com/Floorp-Projects/Floorp-private-components...

It’s left a bad taste in my mouth since they used the work of others to get to where they are, then when others do the same, they don’t like it.

rpgbr1y ago

Go for Firefox and keep ungoogled-chromium[0] for those sites that refuses to work properly on non-Chromium browsers.

[0] https://github.com/ungoogled-software/ungoogled-chromium

garbagewoman1y ago

… just considering?!? What is it gonna take

9dev1y ago

I found it interesting that the docs mention "site2vec" scores. This implies, I think, a variant of word2vec or document2vec, but for the full site; so probably a vector sum of the doc2vec scores of all individual pages?

HankB991y ago

> Successful clicks matter.

I wonder about this. If I click a link and read it and I find that it's garbage (e.g. got ranked based on SEO rather than useful content) does it count as a successful click? Worse yet, some of these sites have blatant errors that are only discovered after examination.

This is relative to technical subject matter. Other searches, such as shopping may not suffer this kind of problem (or I have not noticed it.)

I also wonder how Google knows a click is successful. If I open a link in another tab, does the browser tell Google how long I lingered on the site? Perhaps Chrome does but I use Firefox.

EcommerceFlow1y ago

Once you get to the top 1-3 results, CTR (click through rate) is a much bigger ranking factor. Google knows how long people stay on pages and whether they click and back out immediately. This is important for E-Commerce, because Google doesn't want Site #1 to be mostly out of stock even though they have better links.

HankB991y ago

> Google knows how long people stay on pages and whether they click and back out immediately.

What if I <ctrl><click> to keep the search page open and open the "found" page in another tab?

yencabulator1y ago

Can the on-page javascript detect the difference between click and control-click? If so, you can count just the former, and wait for the back button press, to get a sense of visit duration.

I think control-click is a power user feature that they just don't care to track. Average consumer is the target audience of the advertising...

badgersnake1y ago

Something like this I guess:

var words = query.split

var results = executeQuery( Select * from AdWords aw where word in query inner join adlinks al on aw.id = al.id return al.url, al.desc)

If (results.size < 30) { // todo call search engine }

Return results

ilyazub1y ago

It doesn't look like a leak but a misdeployment.

Same service wrappers from two years ago: https://github.com/googleapis/google-api-php-client-services...

usui1y ago

> Prior to the email and call, I had neither met nor heard of the person who emailed me about this leak. They asked that their identity remain veiled

And yet the journalist included a screenshot with one of the weakest blurs I've ever seen... Why would you not excise the person's video portion completely? What good does it serve to have it included in the story? Even if that portion is faked, why would you offer potential signals like skin complexion, hair color, background picture, etc.? Why...

mtlynch1y ago

The author is Rand Fishkin, who's not a journalist. He's the founder of SparkToro and Moz, both companies that provide tooling and analytics for SEO.

I haven't looked deeply into Fishkin's companies, but I wouldn't expect either to be on the user's side when it comes to privacy. Both companies seem to monetize clickstream data and personal information from users who probably didn't give informed consent.

If the source was trying to get this information to a responsible journalist who cares about privacy, I have no idea why they'd approach a company (not even a news organization) who seems to fund the erosion of user privacy.

phs1y ago

> Both companies seem to monetize clickstream data and personal information from users who probably didn't give informed consent.

I don't think you know what you're talking about. During Rand's tenure Moz was a subscription business selling access to marketing analytics tools. Those tools focused on the structure of the clients' sites themselves rather than any analytics they might have consumed.

Source: I worked at Moz for several of those years, and helped maintain those tools.

yencabulator1y ago

And since then, the person on the call has revealed their identity. This was an SEO bro talking to an SEO bro about something they found on Github, not an insider leak.

krackers1y ago

>weakest blurs I've ever seen

Isn't this the same type of "swirl" blur that Interpol was able to reverse even 10 years back? With advancements since then you're basically handing evidence on a silver platter.

txomon1y ago

To make it worse, he made clear when the call had happened, and you have: 1) Who was in the call 2) When the call happened 3) A blur instead of a complete black out

I'm not sure I would feel safe reporting stuff to journalists nowadays.

mrguyorama1y ago

This person is not a journalist.

roastedpeacock1y ago

That also struck me as odd. And seemingly a violation of journalistic best-practices of protecting sources. I sure hope this was done with consent of the anonymous source.

Control88941y ago

It's a fake background.

It's also clearly from Google Meet so... yeah. If he was worried about retribution (from Google, anyway) then they probably wouldn't have been using a Google service.

adrianvincent1y ago

The algorithm is probably so complex and bloated at this point I doubt even Google knows how it really works

stonogo1y ago

We call that "AI" in the web world nowadays. It's a feature! You can't game a system you can't understand.

cyanydeez1y ago

If($) return true

// TODO: search

adamgordonbell1y ago

Where is the link to the document?

pr337h4m1y ago

https://github.com/googleapis/elixir-google-api/commit/078b4...

https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...

skilled1y ago

Thanks! I couldn’t find the links so this is super useful.

zarathustreal1y ago

Hopefully this doesn’t surprise anyone..if Google actually told us correct information about how the search algorithm works it would be abused immediately

pembrook1y ago

What I find most interesting about this is that a lot of supposed "smart" algorithms of Big Tech are in fact a patchwork of "dumb" rules rules and human-picked winners. This would explain why the quality of search results is failing to keep up with developments in LLMs.

This also explains why it's impossible for incumbents to unseat the winners in many search categories -- because they've literally been picked as the winners by humans at Google.

Looking at my Twitter/X feed, I also see an oddly similar dynamic. Certain accounts appear to have been manually boosted, showing up all the time -- whereas others posting even the same exact content will never appear.

Silicon valley will loudly tell you all about how wonderful they are at "democratizing," however, if you look under the surface it appears they're just hand picking the winners.

trogdor1y ago

> because they've literally been picked as the winners by humans at Google

Is there evidence of that in the leaked documents?

pembrook1y ago

Yes, it’s in the linked article.

trogdor1y ago

I read the linked article. It doesn’t say that.

1 more reply

alun1y ago

Maybe this is an unpopular opinion, but if a search algorithm is truly designed to showcase the best content, then making it transparent shouldn't lead to manipulation

8note1y ago

For those out of the know, what's a "crap" in this? A "crap crap"?

throwaway7431y ago

... why the hell would an anonymous source use google meet to share info on google? ... so much for remaining anonymous :/

jgalt2121y ago

> A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years.

renegade-otter1y ago

There are so many Kagi fans on HN that it's a matter of time before the Big G buys it and shuts it down, like hundreds of its products before.

SadCordDrone1y ago

Didn't read article fully, but - since it's protocall buffer definitions, what if these fields are there for backward compatibility?

Havoc1y ago

Does it also recommend eating at least two stones a day?

StevenNunez1y ago

Wait... There's Elixir to be done at Google?!

dentemple1y ago

TL;DR Google lies about how its search algorithm works.

eitland1y ago

Would be interesting to see if any relavant authorities could be interested now that this is out?

I understand some of this is a direct contradiction of things they have said in court previously?

Aldipower1y ago

If there are really 14,000 attributes, most of them will have a weight near 0, thus are irrelevant. If they would be all heavy weighted, the ranking would be rendered irrelevant due to the sheer amount of attributes.

beejiu1y ago

Isn't that where deep learning comes into play?

ozehlaw1y ago

Yes, this makes sense. I think the only good thing from the leak for Google is that the scoring values are not present

j / k navigate · click thread line to collapse

296 comments

precompute1y ago

p3rls1y ago

Many of the SEO suspicions were confirmed too.

I found it VERY amusing if you go to r/SEO just yesterday there were moderators and flaired users (you know, the elites of the SEO community, lol) insisting much of this was "debunked" years ago.

They of course deleted their posts, but the threads are still up. What a den of scammers over there.

https://www.reddit.com/r/SEO/comments/1d1eqjj/comment/l5tvfw...

https://www.reddit.com/user/WebLinkr/

I love how reddit is turning into the new SEO scam over night because of this stuff. Great work as always Danny Sullivan!

p3rls1y ago

It's just endlessly fascinating to me the grift on rSEO

How these types first gain moderator status on a few subs and then the spam begins (picture of spam https://pixeldrain.com/u/a6qUPjTq )

I haven't been able to find a single legitimate expert in the entire sub, and I've checked about every flaired user and moderator.

phone86753091y ago

SEO is vandalism and I one day hope the majority of Internet users see that

harry81y ago

SEO is just another form of advertising, with all the costs, benefits and externalities of any other form.

2 more replies

bobthepanda1y ago

Most people are aware but are powerless to do anything about it.

tyingq1y ago

Perhaps, though a world without SEO doesn't necessarily surface the best content either. Not everything about Google's algorithm that's subpar is because of spam or SEO.

1 more reply

theolivenbaum1y ago

Seems like a lot of it came from them inadvertently posting some internal API to GitHub: https://github.com/googleapis/elixir-google-api/commit/078b4...

renegade-otter1y ago

I guess too many people got laid off to do the whole "three reviewers per PR" thing!

eru1y ago

When I was at Google (about a decade ago by now), we had two reviews per PR; not three. Could you tell me more about the third review?

RandomBK1y ago

I think GP meant <Author>, <Reviewer 1>, <Reviewer 2>

dontdoxxme1y ago

yencabulator1y ago

There's still an angle where the copyright owner claims that the person who caused this to happen did not have the authority to apply the license to it.

ec1096851y ago

Oops, someone’s script was too greedy when uploading those elixir api documents.

xnx1y ago

I believe these are the leaked docs: https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...

precompute1y ago

What answer do the engineers at google working on this have for this violation of privacy?

GuB-421y ago

I am not an engineer at Google but this is I would say if I was.

voltaireodactyl1y ago

The obvious response being that counting cars on the highway is a necessary first step on the road to identifying and then tracking their movements.

Ferret74461y ago

Agriculture is a necessary first step toward a dystopian society, so clearly we should ban agriculture.

The logic does not follow. "A is required for B, B is bad, so A is bad" is not logically valid.

1 more reply

lolinder1y ago

> we don't even know what number, we just get the total number of visits for each website, not who visited it

This is not what a clickstream is. A clickstream requires that the sequence of clicks be preserved, and preserving that sequence undermines anonymity.

tommiegannert1y ago

It can be pseudonymous. It doesn't have to undermine anonymity.

Google researchers spend time ensuring k-anonymity (for reasonably large k) when using data.

3 more replies

raxxorraxor1y ago

That would be money. If someone has another excuse, they are naive or lying to themselves.

It certainly is not "to improve the net or advertising" - that would be the lying part.

Google has done some good for the net, but the scales of their contributions slowly but steadily move to the negative side.

azemetre1y ago

Reminds me of the studies they’ve done on cognitive dissonance/lying.

Basically if you believe lies you tell yourself, they tend to turn into truths in your mind over time. Even if you were doing it “ironically.”

danpalmer1y ago

BrenBarn1y ago

specialist1y ago

Yes and:

Require opt-in by default. In all cases.

troyvit1y ago

> The thing is that preventing "sharing" isn't sufficient.

Exactly this. It doesn't matter that google doesn't "share" what they gather if they own so many conversion funnels from top to bottom anyway.

danpalmer1y ago

5 more replies

verteu1y ago

> I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally

I believe the law is violated when it's sufficiently profitable -- it just requires VP permission.

No public sources for this except Jedi Blue, the old anti-poaching case, etc.

noprocrasted1y ago

> This basically can’t happen with things like DMA and GDPR

I'm sorry but this is just wishful thinking. It might be what the spirit of the DMA & GDPR want but definitely not the reality thanks to inadequate or outright non-existent enforcement.

Facebook, credit bureaus, data brokers, "consent management platforms", etc. All these companies' business models are big, obvious breaches of the GDPR. Yet, they are... still alive and kicking?

There is no chance that a concealed GDPR breach (whether intentional or accidental) will get addressed when the biggest intentional breaches are still allowed to continue out there in the open.

I suspect something very similar is going to happen with the DMA - Apple is already acting in bad faith but have yet to see any consequences.

marcinzm1y ago

> What answer do the engineers at google working on this have for this violation of privacy?

The same answer you probably have for the millions of questions about what the things you do that some other people find offensive to their personal views and beliefs.

bdlowery1y ago

How is it a violation of privacy. Did you read the terms of service?

precompute1y ago

It's a privacy violation regardless of the ToS.

y421y ago

A tos announcement is not an explicit consent. I doubt that this will help in court, even pre-GDPR.

HelloNurse1y ago

Further, a TOS announcement can be easily construed as an admission of intent to fuck users.

9dev1y ago

vouaobrasil1y ago

eitland1y ago

As a user of Kagi and search.marginalia.nu I can tell you:

Quite a bit.

So much that now that I have what "everyone" asked Google for for years - that is blacklists - I hardly use them.

Why? Because with Kagi I get much better results out of the box.

I am fairly sure Googlers will tell me there are multiple safeguards to prevent the inclusion of Google ads from affecting ranking, to which I just have to say that the results speak for themselves.

Please note: I have only used Kagi for two years. I am only one user. But I am a user with 20 years of experience with Google and that got to count for something.

Nevolihs1y ago

the_snooze1y ago

eitland1y ago

I am all for Kagi keeping that feature. If for nothing else then to rub it in the face of every googler who have argued that it was impossible.

And if you use it I am happy, that gives Kagi an incentive to keep it around.

I'm just saying that for me the results are so good out of the box that with a couple of exceptions I never had to block anything.

scutrell1y ago

p3rls1y ago

Kagi is the same garbage as google in my niche. Even worse, maybe. It looks like it's weighing backlinks and SEO garbage even higher. Well done.

I don't know how people keep talking about it. The results, as you say, speak for themselves.

eitland1y ago

Well, don't use it then.

I am happy for alternatives, otherwise I guess Kagi wouldn't improve so fast in areas I care about.

1 more reply

abhijat1y ago

I switched to Kagi in June last year. I just realized I tried it initially because I wanted to try out blocking sites in search results, and I have only ever needed to block three domains.

eitland1y ago

Thats exactly what I am talking about.

Kagi is kind of like Google in 2009, seriously good coverage, good ranking

... but also:

- more modern

- more features (summarizer, bangs like in DDG, FastGPT and probably a few I forgot)

- blocklists for websites (and also options to pin, raise and lower)

- with actual support: report a bug and you get an answer from a real engineer, a follow up when it is fixed and a shout out in the relevant release notes

- no tracking

3 more replies

beeboobaa31y ago

Is kagi good for finding things like old forum posts (not reddit)? I know some of those sites are still up but google seems to ignore them.

nalinidash1y ago

Try search.marginalia.nu

eitland1y ago

(Other standard lenses include

- Forums

- Fediverse forums

- Usenet/archive

and I think 7 others.)

stuffoverflow1y ago

In my experience kagi is decent, definitely gives more forum results than google. I've found yandex to be the best at finding all kinds of forum /discussion sites.

packetlost1y ago

I dunno, the first thing I did was blacklist G**ksf*rG**ks from my search results (and others, of course) and I couldn't be happier.

esperent1y ago

mozman1y ago

1 more reply

karma_pharmer1y ago

Kagi is simply reselling google search results.

super2561y ago

They do more than that: https://help.kagi.com/kagi/search-details/search-sources.htm...

1 more reply

ptman1y ago

They have many sources https://help.kagi.com/kagi/search-details/search-sources.htm...

1 more reply

amelius1y ago

How do you know that Kagi won't become as bad as Google at some point?

duckmysick1y ago

What's the argument you're trying to make?

That because there's a chance Kagi will become bad, there's no point in using it now and thus we should stick to Google, which is already bad? That doesn't make sense.

The same line of thought can be applied to anything. We don't know what will happen in the future, therefore we can't be sure that things won't go bad. Is there even a way to have such a guarantee?

breakfastduck1y ago

We don't, but a model where a user pays for a service rather than being free and ad supported is significantly less likely to enact user unfriendly changes.

If the way you make money is by convincing people to pay, you are highly incentivised to make the product good, especially where there are many other free competitors who are ad supported.

1 more reply

eitland1y ago

If they do become bad then at least I have had a fantastic search engine for another few years of my life like I had from 2002 until 2009 ish.

And also, already at this point, they and marginalia has proven that it isn't impossible to enter the search engine marked even now. This was long considered impossible, at least here on HN.

spacebanana71y ago

Also it’s unlikely Kagi will ever become big enough for SEO people to specifically target with manipulative content.

Even if they got 100 million active paying users it’d still be a tiny fraction of overall search traffic.

3 more replies

Workaccount21y ago

The fundamental problem with the Internet is that people don't want to pay for things on it.

No matter what, whatever we ended up with was going to be shitty and exploitive.

eitland1y ago

Now you have a chance. Kagi is there.

I made my decision two years ago and I would probably do it even if it was just on par with Google, to support competition and to avoid supporting Google.

But in hindsight it is just exeptionally much better. There is no going back unless Kagi does something monumentally stupid.

jacob0191y ago

1 more reply

tjpnz1y ago

I'm still happy to put my money where my mouth is and do pay for services which are genuinely useful to me. But this is not the kind of internet I imagined when growing up.

L-four1y ago

It's not that people don't want to pay it's that it's difficult to pay small sums. The web browsers could solve this problem but they make money from ads so it's not in there best interest.

nradov1y ago

https://www.nngroup.com/articles/the-case-for-micropayments/

UI_at_80x241y ago

And this was one of the hopes/goals/dreams that crypto offered that I really wish had come true.

wslh1y ago

Google was really great and revolutionary, they helped zillions of small companies to thrive. It was another cycle.

Then, now, it is like media before the 90s: you need to pay a lot of money to be in the center page of the newspaper.

But, hopefully we are talking about LLMs now, seems like one of the answers to search engines in general. Beyond AI, I see LLMs as a good evolution from PageRank.

Another context where I use "Complexity as Scam" is Web3. It is like Matryoshka dolls where there is always one more step of complexity to probe a point, but it never ends.

benterix1y ago

It's not black and white. There was a lot of junk that was forced on us and that was removed thanks to Google. But I agree the direct relationship is inherently corrupting.

GTP1y ago

Larry Page and Sergei Brin even stated very clearly in their original paper that using ads as revenue source can impact the quality of results returned from the search engine.

DarkNova61y ago

You mean the way Google worked originally? The founders were very careful in creating a barrier between ads and search.

A barrier whose erosion has been well documented over the last 10 years.

vouaobrasil1y ago

A barrier whose only purpose was to establish trust so that it could be later taken advantage of.

DarkNova61y ago

As much of a cynic I typically am, there is a well established record of events which shows that this is not true.

Google search was taken over by an ambitious clique of failed yahoo managers that successfully destroyed their former company for their own financial advantage then did the same at google.

Acting as parasites on society at large.

heresie-dabord1y ago

Instead of a semantic Web of knowledge, we got "grep the HTML... with ads".

josefx1y ago

You dropped the -v . Modern day Google seems fine tuned to return results that contain everything except for the words I searched for.

greg_V1y ago

linsomniac1y ago

Did you experience the Internet before google? The idea of a world where Alta Vista won is truly chilling.

thsksbd1y ago

You mean a world where people still knew how to use a library catalog, still relied on more than one source of information and curious crazy tid bits are still out there?

The internet is boring. And the trash is still there. Its just become reputable instead.

linsomniac1y ago

There's a lot to unpack here...

1 more reply

badpun1y ago

> still relied on more than one source of information and curious crazy tid bits are still out there?

I think the curious crazy tid bits are still there.

washadjeffmad1y ago

I'd be okay with a world in which everyone else in search didn't lose, too.

msk-lywenn1y ago

In some way, didn't Google become Alta Vista?

linsomniac1y ago

How so? My memory of Alta Vista was so-so search results with a top page littered with garbage.

4 more replies

vouaobrasil1y ago

Yes, I did! I used to use Yahoo search where the results were more hand-curated and people did not create websties for intensive commercial purposes with useless SEO fluff like it is today.

linsomniac1y ago

1 more reply

blowski1y ago

I imagine it would be a different flavour to what we have today, but the same intensity. Anything that so deeply penetrates daily life across the globe is going to bring enormous problems with it.

1vuio0pswjnm71y ago

https://ipullrank.com/google-algo-leak

nsmog7671y ago

ec1096851y ago

If anyone is surprised about chrome sending urls to Google, you can turn the “feature” off by unchecking “Make searches and browsing better” in the sync section of Google chrome settings.

Creepy.

HenryBemis1y ago

Or, and hear me out, you never use Chrome again, in any platform.. like ever ever again.

smegger0011y ago

SSLy1y ago

you could use ungoogled-chromium, brave, vivaldi

Terr_1y ago

"But what if I don't want my own computer to build and share a detailed profile of everyone I know, everywhere I go, all my preferences, and how to manipulate me?"

"Well obviously it's your fault for not picking the 'Don't Be Cool' option on subpage 27b-6, duh!"

ralfn1y ago

Yeah. It's victim blaming. Reminds me of "they should have shouted louder".

The confusing thing is the crime itself is small on an individual level. The question is: does it add up cumulatively if a small crime is committed against many?

juleiie1y ago

A small crime can result in massive power. Knowledge is power.

Barring the ethics you can single handedly use such data to manipulate stock market, countries etc.

It’s just too much power

kulshan1y ago

2 more replies

andrybak1y ago

> unchecking “Make searches and browsing better”

Before that, you can make it audible: <https://github.com/berthubert/googerteller>

precompute1y ago

Is that part of Chrome not open-source?

alexvitkov1y ago

[1] https://github.com/ungoogled-software/ungoogled-chromium/tre...

noman-land1y ago

Imagine thinking you can escape your abuser by living in their house and asking them politely to stop.

1 more reply

thih91y ago

> Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot

Does anyone know more about yoshi-code-bot and how were these documents suddenly published?

Was it a script misconfiguration? A manual push? Something else?

chx1y ago

https://github.com/yoshi-code-bot

Created 1,891 commits in 19 repositories

All 19 is under googleapis

This looks like a bot Google uses to publish their stuff on github and so likely it's a misconfiguration.

ilrwbwrkhv1y ago

And that's why if a developer doesn't use Firefox and uses Chrome, they are just helping a monopoly take over everything and make a mess.

dgellow1y ago

Any user, not just developers

olliej1y ago

barbariangrunge1y ago

It became trendy recently to break compatibility with Firefox. Blogs almost bragging about how they boldly made the choice. Very embarrassing stuff

1 more reply

dgellow1y ago

When you say “they just replaced IE”, that was >10 years ago…

1 more reply

metadigm1y ago

As soon as they add the ability to configure shortcuts, I'd more than happy to. After several years of requests, we're finally seeing some movement on their end.

ilrwbwrkhv1y ago

HeatrayEnjoyer1y ago

Shortcuts? Like bookmarks?

cpeterso1y ago

Or search engine keywords in the address bar? https://support.mozilla.org/en-US/kb/how-search-from-address...

Or adding “top site” shortcuts on the Firefox New Tab page? https://support.mozilla.org/en-US/kb/customize-items-on-fire...

precompute1y ago

From the article:

Boosting "organic traffic":

- Brand matters more than anything else

- Experience, expertise, authoritativeness, and trustworthiness (“E-E-A-T”) might not matter as directly as some SEOs think.

- Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.

TL;DR: Clickbait + bot farms are the way to go. No wonder the internet is going to shit.

BillFranklin1y ago

FYI, it's much easier to read the linked GitHub code via the published docs at https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...

BillFranklin1y ago

In particular, https://hexdocs.pm/google_api_content_warehouse/0.4.0/Google...

Notably, for people on HN, it looks like there is indeed an internal initiative to promote small personal blogs :-)

> smallPersonalSite (type: number(), default: nil) - Score of small personal site promotion go/promoting-personal-blogs-v1

SquareWheel1y ago

Well, maybe. It's a factor that a twiddler can influence, but we don't know if that's done positively or negatively. It might also be more conditional, like for specific types of queries.

iamacyborg1y ago

We don’t know whether that particular module was used to promote or downgrade small sites in the SERPs.

llmblockchain1y ago

> GoogleApi.ContentWarehouse.V1.Model.AppsPeopleOzExternalMergedpeopleapiAboutMeExtendedDataPhotosCompareDataDiffData

Java, is that you?!

deely31y ago

https://news.ycombinator.com/newsguidelines.html

> Omit internet tropes.

ziddoap1y ago

Indeed. Eschew humor. Avoid anything not super serious. Laughs aren't allowed on HN.

shepherdjerred1y ago

There's a difference between humor and pattern-matching memes like you see in Reddit threads.

1 more reply

lazide1y ago

Missing the ‘ManagerAgentUtil’ at the end.

resolutebat1y ago

FactoryFactoryImpl

lazide1y ago

Builder

2 more replies

isaacfrond1y ago

Most of the factors in ranking a page are no surprise. But i was surprised that having Product reviews on your site is apparently a demotion? Surely, many people are searching to find just that?

unnamed76ri1y ago

There was a brief period of time where I made decent money with it until Google deranked all the product review websites.

b1121y ago

This is likely more about reviews with affiliate links. 99.99% of those are people reviewing absolutely nothing, just copying reviews and putting their own affiliate link.

zeroCalories1y ago

Sites spam low quality product reviews with affiliate links to Amazon. This is done by "reputable" sites as well. I don't blame Google for down ranking this meta.

nottorp1y ago

We are, but I’m not sure there are any real product reviews left on the internet.

sidewndr461y ago

Other than reviews of Google search itself obviously

nottorp1y ago

Are there? I can't [1] write an objective review, I can just subjectively say that it's been more and more useless to me in the past ... 7-8 years now?

[1] Or maybe can't be bothered because I stopped caring ages ago.

cqqxo4zV46cp1y ago

“xx,xxx five star reviews” I’ve found is a modern day over-marketed product trope. It feels well within the realm of reasons that this ends up serving as a useful heuristic.

yieldcrv1y ago

I don’t trust conflicts of interest, if that’s about a site selling it’s own product and having reviews, I’m glad to find that results in a demotion

While bigger marketplaces have other ways of driving ranking

ren_engineer1y ago

most of these have been outright publicly denied by Google employees, despite people showing with A/B tests that things like CTR and backlinks impacted rankings

skilled1y ago

I would usually call this a dupe but this article and the other one from SparkToro are completely different even if they are on the same topic.

JSDevOps1y ago

Seriously considering switching back to Firefox after all these years.

jasonsb1y ago

What's stopping you? I use both browsers and I see no reason why someone would pick Chrome over Firefox at this point in time.

4gotunameagain1y ago

While the reasons someone would pick Firefox:

  - Privacy
  - Tree style tabs

SushiHippie1y ago

- uBo works better in firefox https://github.com/gorhill/uBlock/wiki/uBlock-Origin-works-b...

1 more reply

blitzar1y ago

(Some) sites don't work on Firefox.

Sure it isn't frequent, but it is frequent enough that once a day or so I have to open chrome to do something.

elaus1y ago

Seriously curious what sites those are, especially if it's not the same page every day. It literally never occurs to me (using Firefox again since 3-4 years) but I mostly browse dev-related websites.

6 more replies

ilikehurdles1y ago

Once a day? That’s huge. What sites? (I use Firefox daily for about the last year and haven’t had this kind of issue)

sangeeth961y ago

ICYDK, do consider reporting on https://webcompat.com if you see them.

Nuzzerino1y ago

Have people never heard of Brave?

thisisit1y ago

for now the seamless extension switching using Extensity. I am yet to find an extension on Firefox which can deliver this functionality.

metadigm1y ago

No shortcut configuration.

GuB-421y ago

I don't consider it a problem to use two browsers at the same time, I usually don't to the same thing with them, so having separate profiles can be an advantage.

mind-blight1y ago

I've been using Firefox since Chrome forced users to sign in to the browser with their Google account, and I'm quite happy.

The only time it's a problem is when a site detects Firefox and won't display unlocked your using chrome or IE. I've only seen that a couple of times in the years since I switched back

Frank23121y ago

Even in that case,there are Firefox extensions to change your user agent. Suddenly the app requesting Chrome/Edge works perfectly, even though we are running in Firefox.

kernal1y ago

How did Chrome force you to log in? I've been using it signed out for the longest time.

mind-blight1y ago

WhyNotHugo1y ago

Firefox is better than Chrome [in the privacy aspect]... but still pretty terrible.

It sends a lot of "analytics" and "tracking" to some of Mozilla's servers, but if you inspect the requests, those servers are actually behind Google's CDN,and Google does the TLS termination.

So... Google has access too all the data that Mozilla sends when it phones home. Some of it even has a unique identifying id.

Ringz1y ago

[1]: https://floorp.app/en/

MrAlex941y ago

Not to be a stickler, but just a note it no longer counts as FOSS or even open source I believe, with their new licence: https://github.com/Floorp-Projects/Floorp-private-components...

It’s left a bad taste in my mouth since they used the work of others to get to where they are, then when others do the same, they don’t like it.

rpgbr1y ago

Go for Firefox and keep ungoogled-chromium[0] for those sites that refuses to work properly on non-Chromium browsers.

[0] https://github.com/ungoogled-software/ungoogled-chromium

garbagewoman1y ago

… just considering?!? What is it gonna take

9dev1y ago

HankB991y ago

> Successful clicks matter.

This is relative to technical subject matter. Other searches, such as shopping may not suffer this kind of problem (or I have not noticed it.)

I also wonder how Google knows a click is successful. If I open a link in another tab, does the browser tell Google how long I lingered on the site? Perhaps Chrome does but I use Firefox.

EcommerceFlow1y ago

HankB991y ago

> Google knows how long people stay on pages and whether they click and back out immediately.

What if I <ctrl><click> to keep the search page open and open the "found" page in another tab?

yencabulator1y ago

Can the on-page javascript detect the difference between click and control-click? If so, you can count just the former, and wait for the back button press, to get a sense of visit duration.

I think control-click is a power user feature that they just don't care to track. Average consumer is the target audience of the advertising...

badgersnake1y ago

Something like this I guess:

var words = query.split

var results = executeQuery( Select * from AdWords aw where word in query inner join adlinks al on aw.id = al.id return al.url, al.desc)

If (results.size < 30) { // todo call search engine }

Return results

ilyazub1y ago

It doesn't look like a leak but a misdeployment.

Same service wrappers from two years ago: https://github.com/googleapis/google-api-php-client-services...

usui1y ago

> Prior to the email and call, I had neither met nor heard of the person who emailed me about this leak. They asked that their identity remain veiled

mtlynch1y ago

The author is Rand Fishkin, who's not a journalist. He's the founder of SparkToro and Moz, both companies that provide tooling and analytics for SEO.

phs1y ago

> Both companies seem to monetize clickstream data and personal information from users who probably didn't give informed consent.

Source: I worked at Moz for several of those years, and helped maintain those tools.

yencabulator1y ago

And since then, the person on the call has revealed their identity. This was an SEO bro talking to an SEO bro about something they found on Github, not an insider leak.

krackers1y ago

>weakest blurs I've ever seen

Isn't this the same type of "swirl" blur that Interpol was able to reverse even 10 years back? With advancements since then you're basically handing evidence on a silver platter.

txomon1y ago

To make it worse, he made clear when the call had happened, and you have: 1) Who was in the call 2) When the call happened 3) A blur instead of a complete black out

I'm not sure I would feel safe reporting stuff to journalists nowadays.

mrguyorama1y ago

This person is not a journalist.

roastedpeacock1y ago

That also struck me as odd. And seemingly a violation of journalistic best-practices of protecting sources. I sure hope this was done with consent of the anonymous source.

Control88941y ago

It's a fake background.

It's also clearly from Google Meet so... yeah. If he was worried about retribution (from Google, anyway) then they probably wouldn't have been using a Google service.

adrianvincent1y ago

The algorithm is probably so complex and bloated at this point I doubt even Google knows how it really works

stonogo1y ago

We call that "AI" in the web world nowadays. It's a feature! You can't game a system you can't understand.

cyanydeez1y ago

If($) return true

// TODO: search

adamgordonbell1y ago

Where is the link to the document?

pr337h4m1y ago

https://github.com/googleapis/elixir-google-api/commit/078b4...

https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...

skilled1y ago

Thanks! I couldn’t find the links so this is super useful.

zarathustreal1y ago

Hopefully this doesn’t surprise anyone..if Google actually told us correct information about how the search algorithm works it would be abused immediately

pembrook1y ago

This also explains why it's impossible for incumbents to unseat the winners in many search categories -- because they've literally been picked as the winners by humans at Google.

Silicon valley will loudly tell you all about how wonderful they are at "democratizing," however, if you look under the surface it appears they're just hand picking the winners.

trogdor1y ago

> because they've literally been picked as the winners by humans at Google

Is there evidence of that in the leaked documents?

pembrook1y ago

Yes, it’s in the linked article.

trogdor1y ago

I read the linked article. It doesn’t say that.

1 more reply

alun1y ago

Maybe this is an unpopular opinion, but if a search algorithm is truly designed to showcase the best content, then making it transparent shouldn't lead to manipulation

8note1y ago

For those out of the know, what's a "crap" in this? A "crap crap"?

throwaway7431y ago

... why the hell would an anonymous source use google meet to share info on google? ... so much for remaining anonymous :/

jgalt2121y ago

> A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years.

renegade-otter1y ago

There are so many Kagi fans on HN that it's a matter of time before the Big G buys it and shuts it down, like hundreds of its products before.

SadCordDrone1y ago

Didn't read article fully, but - since it's protocall buffer definitions, what if these fields are there for backward compatibility?

Havoc1y ago

Does it also recommend eating at least two stones a day?

StevenNunez1y ago

Wait... There's Elixir to be done at Google?!

dentemple1y ago

TL;DR Google lies about how its search algorithm works.

eitland1y ago

Would be interesting to see if any relavant authorities could be interested now that this is out?

I understand some of this is a direct contradiction of things they have said in court previously?

Aldipower1y ago

beejiu1y ago

Isn't that where deep learning comes into play?

ozehlaw1y ago

Yes, this makes sense. I think the only good thing from the leak for Google is that the scoring values are not present

j / k navigate · click thread line to collapse