It's all classic dox profiling techniques. Even the things like spelling differences being regional signals and commonality to specific things being discussed.
It's why one has to think about what is being posted to which community if using different identities, rather than posting the same things across all of them. Though any such effort would be a waste if reliant on some non-public info that later was exposed in a database breach which tied together previously unrelated profiles.
Way simpler than hnprofile from the sibling comment. This one used cosine similarity between user vocs - https://web.archive.org/web/20221126225241/https://stylometr...
Right up there with Skynet, for me, has been the idea of disparate databases all being linked up by bad actors.
It appears as though DOGE illegally obtained taxpayer data from the IRS. I don’t trust DOGE to safeguard anything.
And the penalties do not seem to be very severe outside of HIPPA.
https://democracyforward.org/news/press-releases/new-details...
But I think it would be generally easier to counter in the same way.
Use an llm or heuristics to pose as someone else.
not only do you erase your traces, you add false positives in to the system which reduces the overall effectiveness of these techniques in the future. A bit of poisoning the well.
I hope eventually an easy to use tool, with maybe a small local llm, can make it easy enough to do this, so that any future deanonymization attacks would be too untrustworthy to rely on
It may actually be a fine line. You may be flagged as an LLM later if your style is too generic and identified if your style is too unique.
If someone can figure out who I am or what city I live in just by this username or my comments (with proof), I'll personally send you 500,000 JPY. I'm quite confident that's not going to happen though.
The paper referenced in the article does not even explain their exact testing methodology (such as the tools or exact prompts used) because they claim it would be misused for evil. In other words, "trust me bro."
Also see the previous discussion here: https://news.ycombinator.com/item?id=47139716
Unless I am misreading something. Take a look at surveillance capitalism to see what's possible right now. It's going to be 100x worse as LLMs become more advanced.
It's not the things you post online, it's the nuances behind the way you type and other ways to determine behavior that allows them to be able to build these kinds of profiles.
From what I can tell, the article/paper in question does not appear to utilize any of the techniques you mention, but I'd be interested to learn more about it.
> it's the nuances behind the way you type
I found this paper which talks about some of those methods.
https://www.audiolabs-erlangen.de/content/04_fraunhofer/assi...
For example the "Text" section on page 91.
Not that I care, and that could be wildly off, but opsec is a wide term… and Claude one shot that… so safe out there bro, AI is wild
LLM as the sickness and the cure...
(Above 99% accuracy)
Probably not GDPR-compliant then if comments can be deanonymised by LLMs.
The alternative is what you see on reddit. A lot of threads from the past have posts deleted or overwritten with some script. You now have to dig through archive sites to find the comments, and you usually do find them.
I participate in Signal chats with self-destructing messages, too. But I post different things here and on Signal, under different usernames. Heck, after a few weeks I'll make another account here, anyway.
Even if you somehow deanonymize me, it's a risk I willingly took when I started posting.
Finally, if you go after HN for deleting comments, will you go after the many archive sites?
From gdpr-info.eu: “ Subjective information such as opinions, judgements or estimates can be personal data.”
So yes. HN is in violation of the GDPR. I had already filed a complaint about this policy at my local GDPR authority.
I can already see palantir as the new man in the middle. Telling services: this guy with the same IP just posted xxx on yyy
Should I like, just as Claude Code to come up with this idea this weekend?