Why is this even here?
There is nothing in it that would be outside of the capabilities of the NSA.
IMO I tend to believe it is not fiction, without even looking at the authors resume.
It passes many smell tests that a lot of HN articles clearly fail.
If declared as fiction, not if stated as a fact. Right now that piece is as fake news as any other fake news.
BTW Mr. Muse also omits how once he was identified by the NSA, Mr. Satoshi Nakamoto was forced to leave the Illuminati ...
1. That the NSA has access to a sizable chunk of all Internet communications, sufficient to have the "known" communications to match to the "unknown" samples.
a. The story references PRISM, but public reports describe PRISM as only being an implementation of section 702 authority, and probably not bulk (i.e. 100%, or whatever) collection of all email.
b. It also references MUSCULAR, which was bulk interception but, frankly, the way this is written, it comes off as name-dropping, and isn't otherwise a meaningful reference. (It's like when a scifi author says that faster-than-light travel was made possible once scientists perfected hyper-tachyon capture--it sounds cool, but I just made up those words.)
2. That the NSA interception makes it easy to tie existing samples to known identities. This is a massive data mining problem. I could imagine an implementation where someone can (in a computationally expensive manner) rank writing samples by similarity to a target; an analyst might then manually pore over similar samples for clues to identity. So in this case, "Satoshi" writing might be very close to emails sent by some.real.name@example.com, which spills the beans quite quickly, but there's no reason to assume that is so.
3. That stylometrics are sufficiently advanced to distinguish between 1Bn English Internet users. I'm not well-read on this aspect, but it seems like the most improbable part; see e.g. https://www.researchgate.net/profile/Ahmed_Abbasi4/publicati.... The few papers I just dug up abstracts for seem to suggest error rates in the range of 5-15% among a limited number (~100) of authors, so the idea that the NSA could attribute writing out of 1B potential authors is pretty far-fetched.
I assumed the point of this story was to use fiction to illustrate something that's sort of hypothetically plausible, and to illustrate the privacy risks such a scenario poses. But I was annoyed by the author's claim to be writing fact. :)
Any tool that could do it well could translate perfectly between languages I'd guess since it'd need perfect understanding of the writing. Else you introduce leaks.
I've always thought this as well. It really makes the most sense.