Maybe there's some one-off exercise where this is useful, but it's very rare and I've seen people waste so much time with the whack a mole JA4 block just because they like the intellectual challenge.
> I've seen people waste so much time with the whack a mole JA4 block just because they like the intellectual challenge
You just store the ja4 on requests and build a catalogue of known JA4s over time using statistics. Outlier JA4s you treat with suspicion by default and challenge. It shouldn't be manual.
> If someone invests time/money in using a captcha solver, they're already dedicated enough and will easily get around a JA4 signature block.
Obviously, not for the regular user but captcha solvers are also blockable: - proxy detection - detection by running DNS server and capturing real IP over UDP request - abnormal TLS handshake latency - repeat behaviour at scale - rendering captcha on a fake origin instead of in the real page
Now the question of how motivated are noisy AI scrapers is still open. Even a solution that cuts down 50 percent of the dumbest scraping attempts will still provide much needed relief to a struggling site.
From my experience, there are all kinds of levels of bots. Add them all together and they can produce a ridiculous load on a site (especially a fragile one that you have to secure anyway). So I look at the volume, trying to block anything stupid I can get away with.
It is a game of whack-a-mole. It also can cut down the overall traffic to a fraction of the original, which has tangible infra costs benefits.
And yes, captcha works better in a lot of cases. Fortunately I'm not selling JA4, I'm just curious.
And yes, IP rate limits and ASN checks work really well in plenty cases. Side note: I got a high-throughput free offline asn-checker too! https://blog.miloslavhomer.cz/asn-check/
Now, to differentiate between spoofed and non-spoofed header, I need to check the "valid" JA4 signature for a given browser and then proclaim that the rest of them are wrong. The "valid" JA4 signature can be observed, but I've found that sometimes browsers tweak their handshake a bit, so it's not 100% consistent.
The JA4 DB was recently taken down, I've requested full access, but no response (as expected). There might be some issues in getting those valid headers for the browsers, the hardware and software varies a lot (PC, Mac, Android, Iphone of all kinds of versions and browsers).
I was hoping for a quick win to share, but it doesn't seem like so and I'll have to do it properly. That should be my next post on JA4.
As a quick note, approx 30% of traffic claims to use http2 and approx 60% of that traffic has a non-bot user-agent (you know, along the lines of "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/149.0.7827.102 Safari/537.36"). I suspect majority of those are spoofed as I know how many readers I have on my blog.