The Apple PSI System is spyware.
They are providing all of this info to justify putting spyware on our devices. They are attempting to put spyware on our devices to see if we can be sent to jail. That's all that matters. That is the end effect.
Apple is justifying putting SPYWARE ON ALL OF THEIR PHONES. Any discussion of the technical merits of a SPYWARE system implemented against you is missing the point. It should not exist.
Humans at Apple thought nobody would have percent signs in their wifi names.
Programming is hard, not because knowing how to tell the computer what you want is hard, but because thinking about all the ways people will use your software forever is hard.
And if it's a semi-exact match algorithm like they say, it won't actually prevent any new abuse from happening.
That's not far from saying "only criminals have something to hide".
Seeing this come from Apple, a company which has won a lot of popularity with its stance on privacy, is absolutely astounding and definitely makes one wonder whether there is an ulterior motive.
They once had a marketing stunt where they wouldn't unlock an iPhone for the FBI only after they had already handed them decrypted backups from the very same phone.
It is ripe for abuse by governments, which Apple absolutely has a history of bowing to.
Saudi Arabia WILL use this to track dissidents and homosexuals. China WILL use this for whatever they need, on a daily basis. Hell even the US will use this, I'm sure.
The only way this would be OK was if the input data (the CSAM database) was cryptographically signed and no government, no single entity and not even Apple, could change the content, with the only signing key split in 10 parts to be held personally by Bruce Schneier, the Pope, Linus Torvalds, the Orthodox Patriarch, Keanu Reeves, a couple head Rabbis and a few of whatever their equivalent in Islam is, and they had to personally review the images one by one and certify that perceptual hash 0FB89C8A7DF6AA1945B is indeed CSAM content and agree to collectively sign it for addition.
This would only work if the full IOS source was fully published, and compiled as a reproducible build, so everyone could confirm the scanning code does what it's supposed to, and is not altered with any subsequent update.
P.S. Don't nitpick on the names, it was a deliberately absurd list of people either with a good reputation or with a lot to lose in the respectively chosen afterlife.
This race to the bottom of 'ownership' ends nowhere. Do you really own your device if it uses NDA or licensed parts? (be it software or hardware)
Do you really own your device if it depends on communications with other devices that you do not own? What about the first communication with a cell tower? The eNodeB? The RAN? The HLB? Or if you are communicating with someone, should you own their device as well?
Say your boundary is communications, what communications are we talking about? SPI? I2C? MII? The interface between the baseband and de application processor? The matrix scanner on a physical keyboard?
I'm not saying it's amazing to have an appliance where you don't control every aspect, but it's also not realistic to have a free-for-all at scale either. At the end of the day, when you live in a diverse society, not every device your pay for is a device that is your device. And that can be fine.
(Diverse doesn't universally mean religion, shades of skin or nationality - we're talking about carpenters, mechanics, teachers, artists, bus drivers, bakers, none of those will ever 'own' a device or software stack at scale; if you put a bunch of people together, organise them and specialise, not everyone will 'do tech' to the same degree, if any)
- They have a database of file hashes.
- They can’t validate its contents by design.
- They can’t explain who supplies it in detail.
- The suppliers of these data work with / close to the US gov.
- The match your own files on your own device against this database that contains who knows what.
- When it matches a couple of times, they alert the authorities.
- There is zero fucking visibility for both Apple and you by design and this is a very good thing for Apple.
- I assume the source of this data can update / add new hashes in time and your device will happily comply.
And their only concern is to say that the algorithm is so perfect, they can not see what’s happening and there won’t be false positives (hopefully).
You know what, I trust your ability to do it properly. No need to explain more.
Problem is that the very thing you are building is fucked up by design. And obviously Apple does not address it in any way.
But think about the children…
Who is supplying the hashes? It seems like a LOT of power will be concentrated there. They can basically decide who gets in trouble with seemingly no accountability whatsoever?
Actually, I am thinking of the children, most child sexual abuse happens inside the home by people who personally knows the child, and this crypto mumbojumbo does nothing to address this!!!
What we actually need is for every parent to be required to install surveillance equipment, cameras, sensors and whatnot into their home.
Then *something*something* AI/ML can filter out all the excess information, but send the child abuse to the authorities…
(/s)
If one of your pictures is a false positive hash collision, you'll have no idea until your front door gets broken down.
> Privacy for the client: Let X be the server’s input from which pdata is derived. A malicious server must learn nothing about the client’s Y̅ beyond the output of ftPSIAD with respect to this set X.
Apple can't check whether a hash match is a false positive or not, because they only get the matching hashes and not the pictures that triggered them. So if you have a bunch of false positives, your front door is getting broken down, with no opportunity for a human to realize the problem and intervene.
> The protocol need not provide correct output against a malicious client. That is, the protocol need not prevent a malicious client from causing the server to obtain an incorrect ftPSI-AD output when the protocol terminates. The reason for this is that a malicious client can always choose to hide some of its data from the PSI system in order to cause an undercount of the intersection.
Their protocol isn't (and can't be) secure against the one attack that the people that this system is supposed to catch would actually commit.
> Moreover, a malicious client that attempts to cause an overcount of the intersection will be detected by mechanisms outside of the cryptographic protocol.
This seems eerie to me but I can't put my finger on why.
Well, technically speaking, once enough security vouchers have been submitted and reached the threshold as stated, a report will be sent to an Apple employee (somewhere). The vouchers will then, combined, be decryptable and contain grayscale low-res versions of the original image for confirmation, in which case the NCEMC (National Center for Exploited and Missing Children) will be alerted and law enforcement.
I'm just pointing out that your door being broke down is by multiple false flags and a human will get a chance to "realize the problem and intervene" before it goes to the FBI or whatever. Not saying I like this system, just making a nitpick on your criticism. I don't really know how else you could have a human intervene without "breaking down your door."
But what happens when the false positives are erroneously confirmed as legitimate CSAM? What's the system in place for removing all the security vouchers on someone's account because the vision system flagged a bunch of false positives? What's the process for unfucking a person's life because the employee confirming the CSAM was in a bad mood that day? Is Apple going to pay the legal bills of someone they effectively SWATed?
For the ones of people this system might actually catch with legitimate CSAM there will be at least as many false positives slipping through to ruinous consequences. Law enforcement shouldn't be trusted as a backstop against abuse because LEOs and DAs are incentivized for "good numbers" and "results", not for actually meting out justice. If someone flagged by false positives gets to the stage of law enforcement being involved their lives will be ruined.
This is a completely false statement. Besides the fact that it requires more than one match for an account to be flagged and flagged accounts are reviewed manually by Apple before sending a report off to NCMEC thereby catching false positives, you’re notified if your account is flagged and can file a challenge.
This is just nice window dressing.
You can trust them. You have to trust them.
Oh, "provided by NCMEC and other child-safety organizations." [1]
Other unnamed organizations. Don't worry about who they are, its not relevant. Stop thinking so hard. Stop Screeching. Trust Apple.
[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...
If some men in black walk into the offices of some nice little child protecting service in Ohio and demand that they put some additional hashes into the database because otherwise something bad could happen to them or their families, does Apple really think they would decline?
It doesn't matter how secure the system is, the vulnerability is that virtually anyone can input virtually anything into the database without Apple even knowing and expose selected users that way.
This failure point is now at the heart of iOS and macOS and it's baffling that Apple doesn't see that or doesn't want to see it.
My guess is that they're somehow forced to implement this and try to talk their way out of it with some strange PR pieces which are only convincing their most naive users.
- It assumes that the server would not tamper its dataset (i,e., the list of CSAM). So it is OK to disclose the information if the client has enough matchings. But in reality, nothing prevents a malicious server adding arbitrary content to the list.
- It fails to consider the vulnerabilities of the perceptual hash. This includes false positives and adversarial collision attacks (https://arxiv.org/abs/2011.09473).
Another potential long term issue is that it is unclear how long Apple will store the safety vouchers. As a storage service, Apple may store them forever. The system is based on Elliptic-curve cryptography. Despite it is the current state-of-the-art encryption technique, it will be broken when the quantum computer becomes a reality in the future. So it is possible that every encrypted safety vouchers can be decrypted in the next 50 years.
So a theoretical attacker could send you a message on WA with a set of adversarial collision images (not even child pornography potentially), and effectively SWAT you
One could argue which is more likely, but the fact remains that the entire premise of this system is flawed and it seems the only way to play this game is to not use iCloud photos at all (though it's possible that malware could bypass that and turn on iCloud sync as well as upload the photos too...)
https://www.apple.com/child-safety/pdf/Alternative_Security_...
Basically it's a second opinion on the mathematics from a different perspective. The original post link is the formal proof by Apple employees and Stanford, the Alternative Proof is by the University of California.
A human at Apple needs to inspect it in order to quash your fourth amendment protection against a warrantless search.
Apple inspecting your images destroys your privacy, the extra step of review doesn't protect your privacy -- it's there to destroy the constitutional protection of your privacy.
Noteworthy is Dan Boneh's contribution on this. Given his reputation in the crypto community, it seems he really does believe in this, despite all the controversy as of recent.
They discuss and build upon several security properties, but the most contentious point-- privacy/leakage/scanning of your photos-- is addressed as follows:
Firstly some background-- the private set intersection[1] (PSI) technique in general permits 2 sets to be compared with both parties learning *only* the intersections. In a nut shell, Apple uses this concept such that, if the number of intersecting elements is greater than a threshhold, they're notified.
There are several modifications (Shamir's secret sharing, Cuckoo tables, PKI-ish schemes) to create what Apple calls a ftPSI-AD protocol to optimize desired properties-- performance, integrity, and most notably to me, zero false passes. That is, innocent people will be minimized at the cost of real child-pornographic images slipping by.
Couple noteworthy things that still raise red-flags--
1) they prove that, for honest servers <--> malicious clients, and vice-versa, privacy is not violated for either party, but to me this considers the client as the phone and Apple the server. I'd argue that the phone is actually "Mallory", and you are the client.
You might be honest, but how do you trust the Apple + phone short of reverse engineering it? This is the biggest hole to me, and so I don't fully understand this proof (or, perhaps I have the parties mixed up).
2) Several things are handwaved and/or left "variable to implementation". E.g. Section 5, on "near-duplicate images" that may count twice to this threshhold--
>> Several solutions to this were considered, but ultimately, this issue is addressed by a mechanism outside of the cryptographic protocol
What the?? Hello?? Perhaps this is addressed in another whitepaper, given this is a theory/protocol heavy paper, but this does not instill confidence.
Or, take this bit from remark 3--
>> If needed, these false negatives can be eliminated with a tweak to the data structure used
Uh, I thought not sending innocent people to jail was a pretty critical property. You're telling me the server/Apple, who controls the Cuckoo table, can just change this on a whim? How would I hold them responsible/be notified of this?
These "variations" are remarked on several times in the paper. Again, not exactly confidence building.
Overall, while I really applaud this effort, and I'm not as outraged as I initially was, I'm only slightly less so and have a handful of more questions than before.
Again, please correct me if my annoyance might be misguided, given these technical details.
I'll give you an EL18 description of a basic private set intersection:
I have a database of image fingerprints which I want you to test your images against and tell me if there are matches. I can assume that you're going to faithfully run my protocol because I use DRM to control the software that runs on your computing device. The obvious way to accomplish the matching would be for me to just send you the database of fingerprints-- they are hashes after all and don't tell you anything about the images other than letting you match them-- and for you to tell me about the matches.
But I don't want to tell you the database or tell you when an image matches because if I do you'll realize that I'm targeting images connected with a particular ethnicity, which I intend to mass murder. So, instead I tell you I want to search for child porn and I get you to agree to the following protocol, and you're foolish enough let me keep the hashes secret for no obvious reason.
The first building block we need is an encryption scheme which is additively homomorphic, such as elgamal encryption. With this special encryption scheme the following properties hold: Enc(Data1, key) + Enc(Data2, key) = Enc(Data1+Data2,key) and x*Enc(Data1,key) = Enc(Data1*x,key). Or, in English, the sum of two ciphertexts gives you a ciphertext of the sum of the plaintexts, and a ciphertext multiplied by a value gives you a ciphertext for the plaintext multiplied by that value.
With that in hand we can build a private set intersection.
(1) I pick a private key, send you the public key, and I encrypt each of the hashes in my database. I send you the encryptions-- which, thanks to the encryption, teaches you nothing about the database except an upper bound on its size.
(2) For each database entry you take the hash of image you want to test, encrypt it with the same key and subtract it from the encrypted database entry. If they matched you have an encryption of zero (which you can't tell is zero, due to encryption), if they didn't match you have an encryption of some non-zero value-- the difference between the image hash and the database entry. You then pick a new random number and multiply the result with it. You now either have an encryption of a totally random number (if there was no match) or an encryption of zero (since random*0=0).
(3) You send that to me, I decrypt it... and if it decrypts to zero I add you to the list of people to be executed at some time in the future. If it doesn't decrypt to zero I learn absolutely nothing about your hash, other than it didn't match, because the result is literally a random number.
The Apple scheme makes a number of elaborations on this basic idea to improve efficiency (the database is a cuckoo hash table, so instead of sending you one encryption per database entry per image of mine I only need to send you a few encryption per image-- however much fanout the hash table has), to make it so that matches result in leaking a decryption key so they can decrypt the image, and additional complexity to make it so that the matching isn't fully triggered unless you have more than some threshold number of matching images and to partially obscure the exact number of sub-threshold matches.
However, I make the prognostic that within the next 2 years, the Chinese government will force Apple to use its own database of "objectionable" content and will require that the on-device photo roll be scanned, not just iCloud (they already have access to that).
And probably not just China, because any LEA and secret services would love to have the ability to use such a system for ad-hoc searches: dear Apple, for those users, please extend the database source to this new one we maintain and enable scanning of all on-device pictures.
If it is, that's bad for human rights. The crypto part protects Apple and their under-specified list sources from accountability.
Without the crypto the system would just send you the database of bad hashes and snitch on you based on it. Your privacy would be no worse, but at least researchers would have some chance of detecting when the system was being used off-label to enable genocide.
The part that worries me the most is that no one outside of Apple can verify that the hash set they're pushing hasn't been tampered with (Section 4, Remark 5). This allows them for example to add leaked product image hashes to hunt down and prosecute people who share info about their products before release.
In fact, the system seems designed to be impossible to audit, with only a subset of the whole hash set being pushed to clients, so that researchers can't even tell when more hashes have been added. As a consequence of that design, they acknowledge that a "small number" of false negatives will be missed, and justify that with an argument that it improves performance (Section 2, Remark 3)
False positives on the other hand will be common (as detailed in Section 5, "Duplicate images") - simply copying a file on two client devices that don't share a cloud owner ID will count towards the threshold, and again fall back on "mechanisms outside of the cryptographic protocol".
And last but not least, let's spare a thought for the Apple employees that will be required to sift through potentially traumatizing imagery (assuming the company doesn't outsource that to a third party.)
Not that I am a open source extremist, but the moment where we couldn't control the way our own machines run is the moment where they stopped belonging to us. Supporting a true open source phone OS might be a good idea even if you don't use it. Because one day you might have to.
Is it really worth giving up our fundamental privacy rights when the police already routinely ignore CSA?
HN Link: https://www.naag.org/find-my-ag/
How long will it take before your hard drives are scanned for matching hashes of copyrighted material?
It's still an interesting read from a cryptography point of view though.
Just wow.