Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.
And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.
The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs uBlock Origin, and it's not clear to me that this is a statistically significant difference. They do not claim it is.
For Brave browser users, you can see what hardcoded lists you're using at brave://adblock .
As for the whole cat and mouse game, how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.
This has been my red line on where I will allow ads vs blocking them. If a site is hosting their own ads, that's acceptable to me. If they are using an ad provider, that is not. The newspaper example is my go to. If you wanted your ad in a paper, you called the paper and took out an ad. Today's equivalent would be every time you opened the paper, a slight delay while it randomly chose the highest bids for the ad space while potentially also inserting something that would slowly eat your hands. That's a nope.
You are obviously in the camp that feels entitled to be able to read anything at anytime without allowing for a website to earn money by wanting to block all ads regardless of their origin.
I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:
> However, manual maintenance of these filter lists requires significant human effort
Seems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.
[1] https://arstechnica.com/gadgets/2023/11/google-chrome-will-l...
Is it that easy? Sounds very abusable
It's easier to get a domain added than removed. and for the "corruption"/"rackeetering" part, it's a "win-win" for the adblockers and the list maintainers.
Adblockers also often pay browsers to be integrated by default (AdGuard, Adblock Plus, etc), and then they negociate with publishers to whitelist some domains (not necessarily the most obvious, can just be analytics).
"We offer your domain to be unblocked on xx millions of devices by default, this will create you a uplift of revenue of +yy%"
i had to create a ticket in a repo explaining why blocking a whole domain instead of a single subdomain was actually pretty bad. they approved it and reverted the change.
finding where exactly i had to open the ticket and what to write was a “down the rabbit hole” experience.
The only way to avoid native ads is to stop consuming content that relies on ads.
Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?
If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.
Or something along those lines, it's been a long time since I read it.
The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"
Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.
I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.
There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:
They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.
Death to advertisement! On billboards! On television! On the internet!
Ads are a parasite on the human mind that need to go away, forever.
It's a spectrum: Some level is an unavoidable part of communication ("I like dogs" forces you to think of dogs) some more is considered normal and traditional manipulation ("My food smells nice, that makes you hungry, wanna buy?") and then it goes on into grey-areas, scams, and eventually to potential extremes like "this image induces nausea" or "this sound knocks you out".
I’m building a pi-hole type solution for myself and essentially want all the filtering and blocking to happen at my firewall and not on my client (phone, laptop, tablet).
However you might end up using
1. pi-hole on router
2. Adguard as device level DNS
3. UBO on Firefox (android only)
It is possible but not recommended and wasteful. 1/2 and 3 is enough.
That's why I use uBlock and PiHole, which I deem is enough.
I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.
> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!
https://github.com/SKKU-SecLab/AdFlush/tree/main?tab=readme-...
But since the first webpage I tried still had huge ads, I turned uBlock back on ;)
... Has anyone even heard of these ad blockers before?