Even worse, they state that Brave search won't index a page only if other search engines are not allowed to index it. It is morally not their right to make that call. A publisher should have full control to discriminate which search engine indexes the website's content. That's the very heart of why the Robots Exclusion Protocol exists, and Brave is brazenly ignoring it.
Even worse than that, the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?
I wrote about all this here:
https://searchengineland.com/crawlers-search-engines-generat...
and more references elsewhere in this thread:
https://news.ycombinator.com/item?id=36989129
Amusingly, while I was writing my article, this got posted to their forums, asking about how to block their crawler:
https://community.brave.com/t/stop-website-being-shown-in-br...
No reply so far.
If you post something to the open web, what's it to you who reads it and how? You can block some IPs but that's about it.
I don't know if Brave has a knowledge graph - if they do, I would understand objecting if they filled it in with “stolen” content. But I don't see what's the problem with search.
By the way, isn't everyone's favourite archive.is doing the same thing?
I have no strong opinion on this, curious to hear counter arguments.
> A publisher should have full control to discriminate which search engine indexes the website's content
If you want someone to not see what you publish block him yourself. Also why would you want to do that? Do you want google to own the web or something?
I share your concern about Google having this much power, and I'd add that Microsoft Bing is equally bad but gets away with it because they're smaller. Still, the final decision about which search engine indexes a website is purely the publisher's.
And to use that analogy even further, if you want to block Chinese visitors you block Chinese IPs. You do not add a file called "countries.txt" containing "China block" and then expect Chinese users to see it and voluntarily cease to use it, and threaten to sue them if they don't.
Repeatedly asserting that "the final decision is with the publisher" is stupid. That is the point you seem to want to defend. Defend it! Give us a reason. Just saying the same thing over and over again doesnt make it true.
Much of the problem with search today arises from websites showing googlebot what it wants to see and showing real users. I have to manually remove entire domains from google search as they often appear 1st yet don't show any content without me signing up for an account. Clearly that's not what they are showing to google.
There should be no differentiation between a crawler and a human being with regards to what is being served.
It simply doesn't sound right to say which tool a user can use. It's literally the same as arguing that you should be able to block Firefox from accessing your website and it's Mozilla's fault that they don't respect your wishes as a webmaster to block Firefox exclusively. Or that a VPN doesn't publish its IP addresses so that you can block it. Or a screen reader that processes the text to speech in a way that you disagree with.
Philosophically it seems intuitive to say "I should be able to block a third party that is abusing my site" but it's ignoring the broader context of what "open web" and "net neutrality" actually mean.
I run a service for podcasters. There are podcast apps and directories that either ignorantly make unnecessary requests for content or have software bugs that cause redownloads. I could trivially block them, but I don't because doing so penalizes the end user who is ultimately innocent, rather than the badly behaved service operator. The better solution is primitives like rate limiting, which I use liberally. Plus, blocking anyone literally has a direct effect of incentivizing centralization on Apple, Spotify, etc. and making the state of open tech in podcasting even worse.
> the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?
I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee. Services like this have existed for decades. If you don't want your content scraped, put it behind a login. Especially considering this only applies when you allow other search engines and if you think Google and Bing aren't using your content to train AI, you're off your rocker.
1. User agents should identify themselves
2. A crawler is not a User agent - it's an agent for Brave
>I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee.
You can't end-run copyright like this: just because something is publicly available doesn't mean anyone can redistribute it. Look at the legal issues & cases relating to Library Genesis.
There is no rule that this is true, and many user agents exist _specifically to not be identified_. See Tor and other privacy-centric user agents.
> A crawler is not a User agent - it's an agent for Brave
You know, I thought "what does Wikipedia have to say on this matter?" and sure enough:
> Examples include all common web browsers, such as Google Chrome, Mozilla Firefox, and Safari, as well as some email readers, command-line utilities like cURL, and arguably headless services that power part of a larger application, such as a web crawler.
I can't even make that up.
> just because something is publicly available doesn't mean anyone can redistribute it
You're mistaking reselling content with providing access to it. By your logic, caching proxy servers would be illegal on the grounds of copyright. The physical act of downloading files necessarily creates copies of the data every step of the journey from the source server to you. There's a material difference between paying someone for a copy of some content and paying someone to fetch content for you on your behalf. Nothing about copyright law specifically requires the person physically acquiring the content is the one who ends up consuming it.
You're certainly allowed to try, but I don't see why indexers should be mandated to collaborate with you. They serve their users, not you.
What if I consider (some or any of) my ideas to be un-indexable, not directly suitable to representation in any hierarchy other than those I may set them in?
example.com/correcthorsebatterystaple
If you consider "word of mouth" to be public posts on a forum which millions can read at any time then block googlebot IP'sTo me as a search engine end user, this kind of behavior is undesirable. Why would I want a website to selectively degrade my experience because of my choice in search engine or browser?
Brings back horrible flashbacks of “this website is only compatible with IE6”.
Also, these search crawls by the browser do not identify themselves beyond the Brave standard UA header, namely a plain Chrome user-agent string.
How many Chrome users have opted in to sending data to Google.
Sometimes uninformed consent is not actually consent. These so-called "tech" companies love to toe that line.
Baidu
Bing
Brave
Google
Yandex
You can compare their results on this search comparison page I maintain:https://www.gnod.com/search/?engines=p,o,br,n,q&nw=1
(If you want to also search image libraries like Flickr and Pexels, click on "more engines" to select all places you want to search)
Bamboo sign
Give it a try on Google images. You'll see that nearly all the results are x-rays of people with ankylosing spondylitis, a form of which is commonly referred to as "bamboo spine".
I tested out brave search - it correctly shows 90% signs however it does also show a few spines.
Google still shows incorrect results months later. It's by far the worst of all the search engines in the list for this simple and obvious search.
I actually like Brave here for my test better than Google. I typed in a few cities, just wanting to see the skylines and such.
Brave gives me good photos, some stock photos, etc. Google gives me pictures from recent news articles, which isn't overly helpful IMO.
That already makes it worth of support.
But Google having become so bad of late has made switching quite easy, even if brave is not getting better super fast, Google unfortunately is getting worse and making up for it.
What are the censored image searches you found?
That tends to upset some engines including Bing I think.
See: https://help.kagi.com/kagi/why-kagi/kagi-vs-google.html
Click two links down in the same menu:
https://help.kagi.com/kagi/why-kagi/kagi-vs-brave.html
Kagi Search includes anonymized requests to traditional search indexes like Google and Bing as well as sources like Wikipedia, DeepL, and other APIs. We also have our own non-commercial index (Teclis), news index (TinyGem), and an AI for instant answers. Teclis and TinyGem are a result of our crawl through millions of domains, focusing primarily on non-commercial, high-quality content.
Our unique results combined from all of these sources help you discover the best content you can possibly find online, sometimes from the quieter places on the web.
c.f. yep.com
First I've heard of a "metasearch" engine. I just tried SearX and it gave me no results for my projects and only a couple for "Mastodon", but the animal not the software.
Brave search was my default for quite a while, until a few weeks after they got rid of bing results. As soon as that happened, stuff just wasn't showing up that I'd expect to be there, and 90% of the time I'd follow up searches with !g or !ddg just to get something decent to show up. The index just felt severely lacking, or the relevancy was pretty far off base.
Would you say search has greatly improved over the past month or two?
It's not as good as Google was at its peak (and Google itself has degraded severely in quality, IMO), but it's good enough that I can generally find what I'm looking for with a minimum of effort.
I run maybe 1 in 40-50 searches with "!g" because Brave is insufficient, for context.
Exactly my experience. I hadn't connected it to them getting rid of Bing results, but it makes sense. I've had to use the bang redirects to other search engines a lot more too, to a level I haven't had to in more than a year.
A recent example from my search history, "doors of stone release date". The author has announced a new novella releasing Nov. 2023, but not the actual book Doors of Stone. The google infobox gets this wrong, but the first result is correct. Brave accidentally gets it right that there's no release date for the book, but misses the novella announcement and all but one of the results are blogspam.
The difference might be that they (including myself) don't ask search engines for facts like "doors of stone release date". They'll search for "doors of stone", find personally reliable sources like Wikipedia, Fandom, Goodreads, browse them and decide on an answer. When sources fail to appear, they'll either refine the search (like "doors of stone rothfuss") or call it a failure and maybe try a different search engine.
This is one the reasons why Brave has been good for me so far. When a relevant Wikipedia article exists, it shows it, even if the title doesn't match. Whereas lately DDG and others don't. In fact, you can see this with "doors of stone". Brave shows "The Kingkiller Chronicle", DDG doesn't at all, Google has it low down in the results.
It also shows Reddit discussions without needing to explicitly filter for it. And I use ad block to remove the AI summariser that takes up half the screen, it's not what I want from a search engine.
I will say there are times when it just falls flat. Like I will search for a brand or specific thing , expecting to get to the home / login page for that brand, and it just flat out gives me weird results.
But when I put the !g in front of the query, the first result is always what I wanted.
On the other than, when doing more general searches, Brave is on par or better than google.
I'm so skeptical that I'm just now starting to develop a feeling of trust towards DuckDuckGo.
In the browser domain, Mozilla is the only company of which I feel that it is genuinely pro-customer.
So I give all my stuff to Google and hope that they at least just protect it from hackers, while I am aware that they analyze my data in order to see how they can monetize me better, but at least with anonymity in regards to 3rd parties. I just hope I'm not wrong.
My god, there is so much telemetry in FF now, and it's tricky to hunt down all the about:configs to disable it. Not friendly or privacy conscious at all. Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!
Quick FF enshittification example from 2 months ago:
Alert HN: Mozilla puts advertising into Firefox AGAIN
https://news.ycombinator.com/item?id=36351322 (48 comments)
Mozilla stops Firefox fullscreen VPN ads after user outrage
https://news.ycombinator.com/item?id=36085642 (220 comments)
They literally can't win. One group of vocal users is outraged by how much money Mozilla takes from Google while the other half screams about how Mozilla is trying to gasp gain additional revenue streams that isn't taking money from their biggest competitor.
> Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!
No, but I really don't give a shit either. At a certain point, I looked in the mirror and said life is too short to care about stupid shit like that. If I was a spy or a journalist in some state like China or Iran, maybe I would care. But this feels odd to hone in on when any website you go to is collecting all sorts of info of this sort.
Nobody's asking you to personally fight for everything. Life truly is too short for that. All we're asking for is your tacit support, or failing even that, your abstinence from the conversation.
I don't personally have time to look after stray dogs in my area, for example. But I sure as hell don't come out with "I looked in the mirror and decided I don't give a shit" when I meet someone who does care about that. Not online and not offline. Instead I'll be supportive and tell them how amazing they are for spending their valuable time on this.
Even if it's something I don't personally care about at all, or even if I think it's a massive waste of time but I can see it's important to someone else, I'd still never tell them that I don't give a shit.
Is it too much to ask you to have the same respect for people who care about important issues like online privacy? If it's not important to you, that's fine! Go be somewhere else instead of interrupting people who do care about this. There's about 500 million other conversations happening on the internet at this very second. Surely one of them is something you actually do care about enough to engage with in a positive way?
At this point, Brave is a more promising bet than MZ.
Mozilla doesn't go about it in as upfront way as Brave does, IME, but stuff like VPN, Pocket and other browser-related services I mind not at all.
I have no sympathy to the current political shitfest that Mozilla is as an organization, but as makers of Firefox I feel like Mozilla is in an impossible bind: Their users expect a fairytale of an independent, donation-funded browser that people spontaneously adopt, and go nuts about stuff like the inclusion of Pocket. I know, I used to be one of those people back when Pocket was introduced. But reeing about Mozilla trying to have independent funding by giving people useful services is just strange. It's exactly what they should be doing, and Brave setting up revenue streams like Talk and Search is great. Especially because they operate in the normal money universe for those of us who aren't terribly enthusiastic about crypto.
My problems with Mozilla are:
- Misuse of money: the browser team have brought in lots of money over the years (we talk billions) and the foundation is milking it dry. If the income created by the browser had stayed with the browser team they would have had funding for years to come.
- Being dishonest: Mozilla has sought donations for Firefox and I think many of us have donated thinking we supported Firefox, while in reality the Firefox team funds itself and the rest of Mozilla and Mozilla isn't even allowed to send money the other way.
- Not being up front about what they do: they more or less lied about their relationship with Pocket. I like Pocket, both the product and as a way to bring in income, but whenever it comes up, everyone who was there starts thinking about their lies.
- Nerfing the extension API.
- Writing "dear community members" in emails begging for money while simultaneously being rude to us in responses to real issues in Bugzilla.
Now, if anyone think I use Chrome, think again.
I am still optimistically waiting for authorities to wake up and punish Google the same way they punished Microsoft - huge fines and browser ballots - but that does not mean I give Mozilla a free pass ;-)
I'm curious to see what you think about this. If you're not okay with Firefox telling Mozilla your IP address every time you connect, does the same go for Brave sending entire pages of your search results to them? This also includes which results you've clicked on.
Brave offers a simple deal: if you believe or have audited their technical claims, you give them fully anonymized snippets of URLs and web pages, and they improve their search index for everyone
The thing is that it will always get worse. Every company needs to grow in order to stay alive and eventually quality and your user experience will suffer because of this, but here's the thing: Always choose the upstart competition but just be prepared to jump to the next up-and-comer after that. For me, I found DuckDuckGo getting worse over time (probably not on purpose, just spam) and somehow Brave is better so I'm sticking with that, but as soon as Brave decides to fuck me (and they will!) then I'll be jumping to whoever the new underdog is at that time.
On the other hand, the Google lawyers seem to have found an excuse to link some proprietary code into Chome (that's not part of Chromium). Does anybody know what that excuse is, and if it provides a loophole large enough to close off Chrome development?
It also explains why Brave Search is slow today I suspect... :)
Usually google+tineye+bing+yandex will yield some useful results
It surprisingly comes up for image searching. Google for example has been known to censor images of Tiananmen Square.
(though I've recently switched away from Brave Search since the goggles subscriptions (the reason I've switched to Brave) have a big bar right at the top and the whole top settings+goggles shifts search results on load!)
Now... I would just like to see the full https url on searches.
I already love the look, Brave summarizer AI and general results!
The current policies don't allow it, sure. But then there will be gradual changes to those policies, and all sorts of dark patterns that make 99% of users leak their data (think "enabled by default but opt-out").
Is there any guarantee against this?
You have to realize data is worth less over time -- a lot less. So the issue would be whether users stick if (heaven forbid) a change of control left Brave in the hands of unethical owners. By design, open source, network sniffing, and other auditors would catch wise and flame such a corrupt Brave into the ground. That's the best we can do here. There is no safety property ("X" holds for all future program states) enforceable on a company.
Then you need to gate your content such that it is not available openly to the public.
This falls inline with many objections to Google's WEI. If you host content openly and allow access freely, then don't be surprised when people access it at will and use it for free.
Loosely related XKCD: https://xkcd.com/927/