AdFlush (opens in new tab)

(dl.acm.org)

276 pointsgrac31y ago108 comments

108 comments

What's fascinating here is AdFlush is a classical feature engineering approach: define a bunch of features on the data manually, and then use ML to figure out the most useful / impactful ones. This is not the "throw terabytes of data and see what happens" approach we see with LLMs. It's a bit funny to even point this out because I don't recall the last time a feature-engineered ML project made it to the HN front page.

Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.

And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.

[1]: https://dl.acm.org/doi/pdf/10.1145/3589334.3645698

tofof1y ago

More specifically, page load time was 2.7 seconds without adblocker, decreased to 2.1 with uBlock Origin, but increased by 250% to 6.6 seconds with AdFlush, or increased to 3.4 seconds with AdFlush retaining prior predictions.

The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs uBlock Origin, and it's not clear to me that this is a statistically significant difference. They do not claim it is.

blacksmith_tb1y ago

That seems to argue for a first pass with a blocklist to filter out the well-known ad providers, and then possibly a followup step with the ML to catch things that are trying harder to slip by? But the extensions would have to cooperate to make that possible.

pradn1y ago

Thanks for extracting the details. It doesn't seem like they'll be competitive with blocklist-based approaches like uBlock Origin, because their features are fundamentally expensive to compute - parsing JS and such, not just matching URLs against a list of regexes.

1 more reply

andirk1y ago

I like the strategy of using flags to say "look into this suspicious part of the code" over a hardcoded block list. And also block shitty JS via "JS AST depth, average JS identifier length" etc even if it's not an ad but just bad code.

For Brave browser users, you can see what hardcoded lists you're using at brave://adblock .

As for the whole cat and mouse game, how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

dylan6041y ago

> how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

This has been my red line on where I will allow ads vs blocking them. If a site is hosting their own ads, that's acceptable to me. If they are using an ad provider, that is not. The newspaper example is my go to. If you wanted your ad in a paper, you called the paper and took out an ad. Today's equivalent would be every time you opened the paper, a slight delay while it randomly chose the highest bids for the ad space while potentially also inserting something that would slowly eat your hands. That's a nope.

You are obviously in the camp that feels entitled to be able to read anything at anytime without allowing for a website to earn money by wanting to block all ads regardless of their origin.

1 more reply

nomilk1y ago

AdFlush (F1 Score: 0.98) seems to do better than some other adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84), but it begs the question: why not compare to the most popular adblockers: uBlock Origin, Adblock Plus etc.

I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:

> However, manual maintenance of these filter lists requires significant human effort

Seems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.

Cthulhu_1y ago

The filter based adblockers are at risk though, with Google's new extension thingy that - at least a few years ago, I haven't heard from it since - limited the amount of rules. If there's a non-rule based system that is 98% effective then that would circumvent the arbitrary rule limits that Google set.

AlexandrB1y ago

My understanding is that under manifest v3[1] only a list of rules is allowed. An algorithmic ad blocker wouldn't be able to work at all.

[1] https://arstechnica.com/gadgets/2023/11/google-chrome-will-l...

1 more reply

Centigonal1y ago

If Google's goal is to thwart adblockers by creating limitations on what browser extensions can do, then creating a browser extension that blocks ads within the current set of limitations is a temporary solution at best.

1 more reply

4ggr01y ago

I guess that's why uBO Lite exists :) I started using it a couple of months ago instead of Ublock Origin, and still haven't seen any ads since.

https://github.com/uBlockOrigin/uBOL-home

1 more reply

Gud1y ago

They day Google starts blocking ad blocking users is the day the exodus starts from Google services.

4 more replies

babypuncher1y ago

Real easy problem to solve by just switching back to Firefox

1 more reply

klaussilveira1y ago

Isn't this the case for a bloom filter (vacuum maybe)? You can have very few rules.

RamRodification1y ago

> only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others

Is it that easy? Sounds very abusable

rvnx1y ago

Yes, and some list maintainers accept money to add or remove you from the list (officially, or officiously through a secondary maintainer, depending on the list), but otherwise it's no different than getting a domain marked as malware or phishing (with a few paid editors on Phishtank or VirusTotal).

It's easier to get a domain added than removed. and for the "corruption"/"rackeetering" part, it's a "win-win" for the adblockers and the list maintainers.

Adblockers also often pay browsers to be integrated by default (AdGuard, Adblock Plus, etc), and then they negociate with publishers to whitelist some domains (not necessarily the most obvious, can just be analytics).

"We offer your domain to be unblocked on xx millions of devices by default, this will create you a uplift of revenue of +yy%"

2 more replies

kmlx1y ago

yes, one of my clients was hit by this and i was tasked with solving the situation.

i had to create a ticket in a repo explaining why blocking a whole domain instead of a single subdomain was actually pretty bad. they approved it and reverted the change.

finding where exactly i had to open the ticket and what to write was a “down the rabbit hole” experience.

1 more reply

fckgw1y ago

Yes, but the effects of that abuse are observable and easily fixable. If suddenly a whole site goes offline for a bunch of people a change like that is likely to get reversed very quickly.

_al_1y ago

there is an entire section in the paper sub-titled: Comparison with uBlock Origin..

1oooqooq1y ago

practical solutions don't get you published

ko271y ago

"Practical solutions" also leave you vulnerable to cat and mouse games against sites that block or bypass adblockers (even with ublock origin). The end game is to have heuristic/AI adblocking which would directly hook into browser rendering so that it becomes undetectable. Obviously leading browsers do not support this for extensions, but forking Chromium wouldn't be so hard.

1 more reply

YmiYugy1y ago

Without comparison to the accuracy of crowed sourced blocklists it's not that valuable. Maybe there is a group of hopelessly overworked blocklist maintainers/contributors, that I'm not aware of. If so, their cries for help don't seem to make the HN front page. From a user perspective, blocking banner ads feels like a basically solved problem. I think the real pain point here is that for large chunks of the web, there is no distinction between ads and content.

JAlexoid1y ago

There will never be a solution to native ads. It's part of the content you choose to consume, that someone produced.

The only way to avoid native ads is to stop consuming content that relies on ads.

nemomarx1y ago

Stuff like sponsor block works pretty well? If the native ad is seperable from the rest of it you can just skip ahead, and most of those things are still a sign posted sponsor break for now. I can imagine extensions to do something similar in articles by removing affiliate links, etc.

YmiYugy1y ago

I think it depends on what solution space you are willing to explore. There is the possibility for regulatory action that restricts native ads. It's seems plausible that a flood of AI content tanks the prices for native ads, so some might pivot to original content + regular ads, which might also become more profitable if regulatory action weakens the oligopolies of that space. Aside from high level market shifts and regulatory action, there is of course also the possibility of technical solutions that can help you to avoid native ads.

yjftsjthsd-h1y ago

That really depends on what you mean by "native ads"; if you mean "blog posts that appear legitimate but push a product" then maybe not (although I wouldn't totally rule it out with LLMs), but if you just mean that the ads are inline I have to disagree since ex. SponsorBlock already exists.

cess111y ago

In some jurisdictions advertising has to be named as such, there it will be at least theoretically possible to create filters if the platform is compliant.

93po1y ago

or have LLMs recreate the content without the native ad

beefnugs1y ago

That is nonsense, if we know about 10 exact brands by name, then we can block their mentioning anywhere

3abiton1y ago

> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98

Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?

Mkengine1y ago

You can find the comparison to uBO under 5.5

dale_glass1y ago

The future is here.

If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.

Or something along those lines, it's been a long time since I read it.

karaterobot1y ago

Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.

The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"

Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.

I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.

Night_Thastus1y ago

Always a joy to see efforts in the ongoing battle against advertisements.

There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:

They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.

Death to advertisement! On billboards! On television! On the internet!

Ads are a parasite on the human mind that need to go away, forever.

Terr_1y ago

Ultimately it's about where we draw the line for hacking other people's brains.

It's a spectrum: Some level is an unavoidable part of communication ("I like dogs" forces you to think of dogs) some more is considered normal and traditional manipulation ("My food smells nice, that makes you hungry, wanna buy?") and then it goes on into grey-areas, scams, and eventually to potential extremes like "this image induces nausea" or "this sound knocks you out".

btbuildem1y ago

They are a scourge and a tell-tale sign that we've grown far beyond excess and into absurd territory where more effort is spent on bending our minds to consume a thing that it took to make the thing in the first place.

CatWChainsaw1y ago

Careful, apparently not wanting your mind polluted with psychological manipulation makes you a filthy communist..

p3rls1y ago

Death to small media companies! You should have gotten some VC money if you wanted to make products for people, you poor pieces of shit.

tjpnz1y ago

I use a combination of UBO, PiHole and AdGuard on my mobile devices. Can't say I've seen an ad in the last year. Is this trying to solve an existing problem or speculating on where things could go in future?

rgrmrts1y ago

I’m curious why you’re using 3 separate methods. Do you miss things with just one? AFAIK all 3 use similar block lists and are configurable.

I’m building a pi-hole type solution for myself and essentially want all the filtering and blocking to happen at my firewall and not on my client (phone, laptop, tablet).

bluish291y ago

I think pi-hole (Adguard home) is useful dns level ad blocker which can be used on network/router level. But it is limited, UBO provides you more flexibility to block cosmetics and certain ads that cannot be done via dns. There will be overlap of course but it is worth it. I agree that adguard here seems redundant and UBO itself recommend against using another ad blocker to avoid interference and websites adblock discovery.

However you might end up using

1. pi-hole on router

2. Adguard as device level DNS

3. UBO on Firefox (android only)

It is possible but not recommended and wasteful. 1/2 and 3 is enough.

tjpnz1y ago

AdGuard is for things I take off the home network, for example when I'm at work. It's true I could use AdGuard for both scenarios but I do like the additional visibility and configurability Pi-Hole provides.

1 more reply

Night_Thastus1y ago

uBlock only works in web browsers. It doesn't work in phone apps, smart TVs, anything integrated into the OS, etc.

That's why I use uBlock and PiHole, which I deem is enough.

alexcason1y ago

Looks like this is the associated repo on GitHub: https://github.com/SKKU-SecLab/AdFlush

KennyBlanken1y ago

....and of course only a chrome plugin is available.

infogulch1y ago

So AdFlush beats uBlock Origin with a marginal detection rate advantage of 0.86 vs 0.84, at the cost of significant performance overhead: median 2.7s load time (no ad block); 2.2s (uBO); 6.6s (AdFlush clean); 3.4s (AdFlush cached).

I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.

jarbus1y ago

I didn't realize this was an active area of research, love this.

cimnine1y ago

So, this begs the question when we'll see ML put in place to avoid AdBlocker detection. Or ads as we know them just disappear from the web and are replaced with other kinds of ML-enabled ads. I imagine deep-fake models used for interchangeable product placement in videos or pictures or so.

h4kor1y ago

How does this compare to list based solutions? An overblocking/underblocking comparison would be great

gastonmorixe1y ago

Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.

rpastuszak1y ago

Oh boy, that didn't take long. Just last year I made Butter https://butter.sonnet.io as an excuse to talk about this:

> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!

mannycalavera421y ago

https://chromewebstore.google.com/search/adflush

https://imgflip.com/i/8s3nur

marcod1y ago

The instructions are on their GitHub page

https://github.com/SKKU-SecLab/AdFlush/tree/main?tab=readme-...

But since the first webpage I tried still had huge ads, I turned uBlock back on ;)

Havoc1y ago

How realtime is this? Or well enough to not be noticeable while browsing

mrbluecoat1y ago

I'd be okay with a hybrid approach: lists for real-time blocking and machine learning for passive analysis to augment the lists over time.

flakiness1y ago

This can be a Copilot+PC's killer feature :-)

seized1y ago

... Has anyone even heard of these ad blockers before?

flakiness1y ago

These are all academic research projects.

j / k navigate · click thread line to collapse

108 comments

pradn1y ago

[1]: https://dl.acm.org/doi/pdf/10.1145/3589334.3645698

tofof1y ago

The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs uBlock Origin, and it's not clear to me that this is a statistically significant difference. They do not claim it is.

blacksmith_tb1y ago

pradn1y ago

1 more reply

andirk1y ago

For Brave browser users, you can see what hardcoded lists you're using at brave://adblock .

As for the whole cat and mouse game, how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

dylan6041y ago

> how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

You are obviously in the camp that feels entitled to be able to read anything at anytime without allowing for a website to earn money by wanting to block all ads regardless of their origin.

1 more reply

nomilk1y ago

> However, manual maintenance of these filter lists requires significant human effort

Cthulhu_1y ago

AlexandrB1y ago

My understanding is that under manifest v3[1] only a list of rules is allowed. An algorithmic ad blocker wouldn't be able to work at all.

[1] https://arstechnica.com/gadgets/2023/11/google-chrome-will-l...

1 more reply

Centigonal1y ago

1 more reply

4ggr01y ago

I guess that's why uBO Lite exists :) I started using it a couple of months ago instead of Ublock Origin, and still haven't seen any ads since.

https://github.com/uBlockOrigin/uBOL-home

1 more reply

Gud1y ago

They day Google starts blocking ad blocking users is the day the exodus starts from Google services.

4 more replies

babypuncher1y ago

Real easy problem to solve by just switching back to Firefox

1 more reply

klaussilveira1y ago

Isn't this the case for a bloom filter (vacuum maybe)? You can have very few rules.

RamRodification1y ago

> only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others

Is it that easy? Sounds very abusable

rvnx1y ago

It's easier to get a domain added than removed. and for the "corruption"/"rackeetering" part, it's a "win-win" for the adblockers and the list maintainers.

"We offer your domain to be unblocked on xx millions of devices by default, this will create you a uplift of revenue of +yy%"

2 more replies

kmlx1y ago

yes, one of my clients was hit by this and i was tasked with solving the situation.

i had to create a ticket in a repo explaining why blocking a whole domain instead of a single subdomain was actually pretty bad. they approved it and reverted the change.

finding where exactly i had to open the ticket and what to write was a “down the rabbit hole” experience.

1 more reply

fckgw1y ago

Yes, but the effects of that abuse are observable and easily fixable. If suddenly a whole site goes offline for a bunch of people a change like that is likely to get reversed very quickly.

_al_1y ago

there is an entire section in the paper sub-titled: Comparison with uBlock Origin..

1oooqooq1y ago

practical solutions don't get you published

ko271y ago

1 more reply

YmiYugy1y ago

JAlexoid1y ago

There will never be a solution to native ads. It's part of the content you choose to consume, that someone produced.

The only way to avoid native ads is to stop consuming content that relies on ads.

nemomarx1y ago

YmiYugy1y ago

yjftsjthsd-h1y ago

cess111y ago

In some jurisdictions advertising has to be named as such, there it will be at least theoretically possible to create filters if the platform is compliant.

93po1y ago

or have LLMs recreate the content without the native ad

beefnugs1y ago

That is nonsense, if we know about 10 exact brands by name, then we can block their mentioning anywhere

3abiton1y ago

Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?

Mkengine1y ago

You can find the comparison to uBO under 5.5

dale_glass1y ago

The future is here.

Or something along those lines, it's been a long time since I read it.

karaterobot1y ago

Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.

Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.

I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.

Night_Thastus1y ago

Always a joy to see efforts in the ongoing battle against advertisements.

There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:

Death to advertisement! On billboards! On television! On the internet!

Ads are a parasite on the human mind that need to go away, forever.

Terr_1y ago

Ultimately it's about where we draw the line for hacking other people's brains.

btbuildem1y ago

CatWChainsaw1y ago

Careful, apparently not wanting your mind polluted with psychological manipulation makes you a filthy communist..

p3rls1y ago

Death to small media companies! You should have gotten some VC money if you wanted to make products for people, you poor pieces of shit.

tjpnz1y ago

rgrmrts1y ago

I’m curious why you’re using 3 separate methods. Do you miss things with just one? AFAIK all 3 use similar block lists and are configurable.

I’m building a pi-hole type solution for myself and essentially want all the filtering and blocking to happen at my firewall and not on my client (phone, laptop, tablet).

bluish291y ago

However you might end up using

1. pi-hole on router

2. Adguard as device level DNS

3. UBO on Firefox (android only)

It is possible but not recommended and wasteful. 1/2 and 3 is enough.

tjpnz1y ago

1 more reply

Night_Thastus1y ago

uBlock only works in web browsers. It doesn't work in phone apps, smart TVs, anything integrated into the OS, etc.

That's why I use uBlock and PiHole, which I deem is enough.

alexcason1y ago

Looks like this is the associated repo on GitHub: https://github.com/SKKU-SecLab/AdFlush

KennyBlanken1y ago

....and of course only a chrome plugin is available.

infogulch1y ago

jarbus1y ago

I didn't realize this was an active area of research, love this.

cimnine1y ago

h4kor1y ago

How does this compare to list based solutions? An overblocking/underblocking comparison would be great

gastonmorixe1y ago

Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.

rpastuszak1y ago

Oh boy, that didn't take long. Just last year I made Butter https://butter.sonnet.io as an excuse to talk about this:

mannycalavera421y ago

https://chromewebstore.google.com/search/adflush

https://imgflip.com/i/8s3nur

marcod1y ago

The instructions are on their GitHub page

https://github.com/SKKU-SecLab/AdFlush/tree/main?tab=readme-...

But since the first webpage I tried still had huge ads, I turned uBlock back on ;)

Havoc1y ago

How realtime is this? Or well enough to not be noticeable while browsing

mrbluecoat1y ago

I'd be okay with a hybrid approach: lists for real-time blocking and machine learning for passive analysis to augment the lists over time.

flakiness1y ago

This can be a Copilot+PC's killer feature :-)

seized1y ago

... Has anyone even heard of these ad blockers before?

flakiness1y ago

These are all academic research projects.

j / k navigate · click thread line to collapse