And when it does get more dangerous, is over zealous tracking the best counter for this?
I've dealt with a lot of these threats as well, and a lot are countered with rather common tools, from simple fail2ban rules to application firewalls and private subnets and whatnot. E.g. a large fai2ban rule to just ban anything that attempts to HTTP GET /admin.php or /phpmyadmin etc, even just once, gets rid of almost all nefarious bot traffic.
So, I think the amount of attacks indeed can be insane. But the amount that need over zealous tracking is to be countered, is, AFAICS, rather small.
All requests produced by those bots were valid ones, nothing that could be flagged by tools like fail2ban etc (my assumption is that it would be the same for financial systems).
Any blocking or rate limiting by IP is useless, we saw about 2-3 requests per minute per IP, and those actors had access to ridiculous number of large CIDRs, blocking any IP caused it instantly replace it with another.
blocking by AS number was also mixed bag, as this list growed up really quickly, most of that were registered to suspicious looking Gmail addresses. (I feel that such activity might own significant percentage of total ipv4 space)
This was basically cat and mouse game of finding some specific characteristic in requests that matches all that traffic and filtering it, but the other side would adapt next day or on Sunday.
aggregated amount of traffic was in range of 2-20k r/s to basically heaviest endpoint in the shop, with was the main reason we needed to block that traffic (it generated 20-40x load of organic traffic)
cloudflare was also not really successful with default configuration, we had to basically challenge everyone by default with whitelist of most common regions from where we expected customers.
So best solution is to track everyone and calculate long term reputation.
But that protection depends on the use case. And that in many of my use-cases, a simple f2b with a large hardcoded list of URL paths I guarantee to never have, will drop bot-traffic with 90% or more. The last 10% then split into "hits because the IP is new" and "other, more sophisticated bots". Bots, in those cases are mostly just stupid worms, just trying out known WP exploits, default passwords on often used tools (nextcloud, phpmyadmin, etc) and so on.
I've done something similar with a large list of known harvest/scraper bots, based on their user-agent (the nice ones), or their movements. Nothing complex, just things like "/hidden-page.html that's linked, but hidden with css/js.
And with spam bots, where certain post-requests can only come from repeatedly submitting the contact form.
This, obviously isn't going to give any protection against targeted attacks. Nor will it protect against more sophisticated bots. But in some -in my case, most- use-cases, it's enough to drop bot-traffic significantly.
As stated before the main reason we needed to block it was volume of the traffic, you migh imagine identical scenario for dealing with DDoS attack.
unfortunately fail2ban wouldn't even make a dent in the attack traffic hitting the endpoints in my day-to-day work, these are attackers utilizing residential proxy infrastructure that are increasingly capable of solving JS/client-puzzle challenges.. the arms race is always escalating
[img]https://example.com/phpmyadmin/whatever.png[/img]While I don't have experience with a great number of WAFs I'm sure sophisticated ones let you be quite specific on where you are matching text to identify bad requests.
As an aside, another "easy win" is assuming any incoming HTTP request for a dotfile is malicious. I see constant unsolicitied attempts to access `.env`, for example.
You wind up having to use things like tls fingerprinting with other heuristics to identify what to traffic to reject. These all take engineering hours and require infrastructure. It is SO MUCH SIMPLER to require auth and reject everything else outright.
I know that the BigCo's want to track us and you originally mentioned tracking not auth. But my point is yeah, they have malicious reasons for locking things down, but there are legitimate reasons too.
get /token
Returns token with timestamp in salted hash
get /resource?token=abc123xyz
Check for valid token and drop or deny.
...and we've circled back to the post's subject - a version of curl that impersonates browsers TLS handshake behavior to bypass such fingerprinting.
But also, what you wrote is basically nonsense. Clients don't need "an approved cert authority". Nor are there any "approved gatekeepers", all major browsers are equally happy connecting to your Raspberry Pi as they are connecting to Cloudflare.
If you're fighting adversaries that go for scale, AKA trying to hack as many targets as possible, mostly low-sophistication, using techniques requiring 0 human work and seeing what sticks, yes, blocking those simple techniques works.
Those attackers don't ever expect to hack Facebook or your bank, that's just not the business they're in. They're fine with posting unsavory ads on your local church's website, blackmailing a school principal with the explicit pictures he stores on the school server, or encrypting all the data on that server and demanding a ransom.
If your company does something that is specifically valuable to someone, and there are people whose literal job it is to attack your company's specific systems, no, those simple techniques won't be enough.
If you're protecting a Church with 150 members, the simple techniques are probably fine, if you're working for a major bank or a retailer that sells gaming consoles or concert tickets, they're laughably inadequate.
Today for example I changed energy company†. I made a telephone call, from a number the company has never seen before. I told them my name (truthfully but I could have lied) and address (likewise). I agreed to about five minutes of parameters, conditions, etc. and I made one actual meaningful choice (a specific tariff, they offer two). I then provided 12 digits identifying a bank account (they will eventually check this account exists and ask it to pay them money, which by default will just work) and I'm done.
Notice that anybody could call from a burner and that would work too. They could move Aunt Sarah's energy to some random outfit, assign payments to Jim's bank account, and cause maybe an hour of stress and confusion for both Sarah and Jim when months or years later they realise the problem.
We know how to do this properly, but it would be high friction and that's not in the interests of either the "energy companies" or the politicians who created this needlessly complicated "Free Market" for energy. We could abolish that Free Market, but again that's not in their interests. So, we're stuck with this waste of our time and money, indefinitely.
There have been simpler versions of this system, which had even worse outcomes. They're clumsier to use, they cause more people to get scammed AND they result in higher cost to consumers, so that's not great. And there are better systems we can't deploy because in practice too few consumers will use them, so you'd have 0% failure but lower total engagement and that's what matters.
† They don't actually supply either gas or electricity, that's a last mile problem solved by a regulated monopoly, nor do they make electricity or drill for gas - but they do bill me for the gas and electricity I use - they're an artefact of Capitalism.