I can't help but feel like these are the dying breaths of the open Internet though. All the megacorps (Google, Microsoft, Apple, CloudFlare, et al) are doing their damndest to make sure everyone is only using software approved by them, and to ensure that they can identify you. From multiple angles too (security, bots, DDoS, etc.), and it's not just limited to browsers either.
End goal seems to be: prove your identity to the megacorps so they can track everything you do and also ensure you are only doing things they approve of. I think the security arguments are just convenient rationalizations in service of this goal.
I agree with the over zealous tracking by the megacorps but this is also due to bad actors, I work for a financial company and the amount of API abuse, ATO, DDoS, nefarious bot traffic, etc. we see on a daily basis is absolutely insane
And when it does get more dangerous, is over zealous tracking the best counter for this?
I've dealt with a lot of these threats as well, and a lot are countered with rather common tools, from simple fail2ban rules to application firewalls and private subnets and whatnot. E.g. a large fai2ban rule to just ban anything that attempts to HTTP GET /admin.php or /phpmyadmin etc, even just once, gets rid of almost all nefarious bot traffic.
So, I think the amount of attacks indeed can be insane. But the amount that need over zealous tracking is to be countered, is, AFAICS, rather small.
I'm guessing investors actually like a healthy dose of open access and a healthy dose of defence. We see them (YC, as an example) betting on multiple teams addressing the same problem. The difference is their execution, the angle they attack.
If, say, the financial company you work for is capable in both product and technical aspect, I assume it leaves no gap. It's the main place to access the service and all the side benefits.
AI will replace any search you would want to do to find information, the only reason to scour the internet now is for social purposes: finding comments and forums or content from other users, and you don’t really need to be untracked to do all that.
A megacorp’s main motivation for tracking your identity is to sell you shit or sell your data to other people who want to sell you things. But if you’re using AI the amount of ads and SEO spam that you have to sift through will dramatically reduce, rendering most of those efforts pointless.
And most people aren’t using the internet like in the old days: stumbling across quaint cozy boutique websites made by hobbyists about some favorite topic. People just jump on social platforms and consume content until satisfied.
There is no money to be made anymore in mass web scraping at scale with impersonated clients, it’s all been consumed.
Computers used to be empowering. Cryptography used to be empowering. Then these corporations started using both against us. They own the computers now. Hardware cryptography ensures the computers only run their software now, software that does their the corporation's bidding and enforces their controls. And if we somehow gain control of the computer we are denied every service and essentially ostracized. I don't think it will be long before we are banned from the internet proper for using "unauthorized" devices.
It's an incredibly depressing state of affairs. Everything the word "hacker" ever stood for is pretty much dying. It feels like there's no way out.
Absolutely. They might not care about individuals, though. It's their approach to shape "markets". The Apple, Google, Amazon, and Microsoft tax is not inevitable and that's their problem. They will fight toe and nail to keep you locked in, call it "innovation", and even cooperate with governments (which otherwise are their natural enemy in the fight for digital control). It's the people that a) don't care much and b) don't have any options.
In the end, a large share of our wealth is just pulled from us to these ever more ridiculous rent seeking schemes.
These days I just tell friends & family to assume that nothing they do is private.
Child safety, as always, was the sugar that made the medicine go down in freedom-loving USA. I imagine these states' approaches will try to move to the federal level after Section 230 dies an ignominious death.
Keep an eye out for Free Speech Coalition v. Paxton to hit SCOTUS in January: https://www.oyez.org/cases/2024/23-1122
For those less informed, add “to impersonate the fingerprints of a browser.”
One can, obviously, make requests without a browser stack.
https://en.wikipedia.org/wiki/Next-Generation_Secure_Computi...
...and we're seeing the puzzle pieces fall into place. Mandated driver signing, TPMs, and more recently remote attestation. "Security" has always been the excuse --- securing their control over you.
At the time I wrote this up, r1-api.rabbit.tech required TLS client fingerprints to match an expected value, and not much else: https://gist.github.com/DavidBuchanan314/aafce6ba7fc49b19206...
(I haven't paid attention to what they've done since so it might no longer be the case)
> I doubt sites without serious anti-bot detection will do TLS fingerprinting
They don't set it up themselves. CloudFlare offer such thing by default (?).
Ultimately I was not able to get it to build because the BoringSSL disto it downloaded failed to build even though I made sure all of the dependencies the INSTALL.md listed are installed. This might be because the machine I was trying to build it on is an older Ubuntu 20 release.
Edit: Tried it on Ubuntu 22, but BoringSSL again failed to build. The make script did work better however, only requiring a single invocation of make chrome-build before blowing up.
Looks like a classic case of "don't ship -Werror because compiler warnings are unpredictable".
Died on:
/extensions.cc:3416:16: error: ‘ext_index’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
The good news is that removing -Werror from the CMakeLists.txt in BoringSSL got around that issue. Bad news is that the dependency list is incomplete. You will also need libc++-XX-dev and libc++abi-XX-dev where the XX is the major version number of GCC on your machine. Once you fix that it will successfully build, but the install process is slightly incomplete. It doesn't run ldconfig for you, you have to do it yourself.
On a final note, despite the name BoringSSL is huge library that takes a surprisingly long time to build. I thought it would be like LibreSSL where they trim it down to the core to keep the attack surface samll, but apparently Google went in the opposite direction.
The original repo was already full of hacks, and on top of that, I added more hacks to keep up with the latest browsers. The main purpose of my fork is to serve as a foundation of the python binding, which I think is easier to use. So I haven't tried to make the whole process more streamlined as long as it works on the CI. You can use the prebuilt binaries on the release page, though. I guess I should find some time to clean up the whole thing.
Worked around it by modifying the patch: https://github.com/jakeogh/jakeogh/blob/master/net-misc/curl...
Considering the complexity, this project, and it's upstream parent and grandparent(curl proper) are downright amazing.
These can work well in some cases but it's always a tradeoff.
When I have to do HTTP requests these days, I default to a headless browser right away, because that seems to be the best bet. Even then, some website are not readable because they use captchas and whatnot.
Headless browsers consume orders of magnitude more resources, and execute far more requests (e.g. fetching images) than a common webscraping job would require. Having run webscraping at scale myself, the cost of operating headless browsers made us only use them as a last resort.
Evade captchas. curl user agent / heuristics are blocked by many sites these days - I'd guess many popular CDNs have pre-defined "block bots" stuff that blocks everything automated that is not a well-known search engine indexer.
How close is it? If I ran wireshark, would the bytes be exactly the same in the exact same packets?
It could mean that the packets are the same, but timing is off by a few milliseconds.
It could mean a single HTTP request exactly matches, but when doing two requests the real browser uses a connection pool but curl doesn't. Or uses HTTP/3's fast-open abilities, etc.
etc.
Why is this?
The following browsers can be impersonated.
...unfortunately no Firefox to be seen.
I've had to fight this too, since I use a filtering proxy. User-agent discrimination should be illegal. One may think the EU could have some power to change things, but then again, they're also hugely into the whole "digital identity" thing.
You can find support for old firefox versions in the original repo.
It's now pretty common to have cloudflare, AWS, etc WAFs as main endpoints, and these do anti bots (TLS fingerprinting, header fingerprinting, Javascript checks, capt has, etc).
Is there a way to request impersonization of the current version of Chrome (or whatever)?
$ curl_chrome <TAB><TAB>
curl_chrome100
curl_chrome101
curl_chrome104
curl_chrome107
curl_chrome110
curl_chrome116
curl_chrome119
curl_chrome120
curl_chrome123
curl_chrome124
curl_chrome131
curl_chrome131_android
curl_chrome99
curl_chrome99_android