The thing that really gets me is that internally there are 4 levels of data 1 being public domain shit (the sky is blue) up to 4 which is private user data, or something that is sensitive if leaked or shared.
I was told that by default all user data is level 4, as in if you do anything without decent approval, you're insta fired. There are many stories about at least one person a month during boot camp accessing user data and getting escorted out of the building within hours.
The part where I worked, in visual research, we had to jump through a years worth of legal hoops to get permission to record videos in public. We had to build an anonymisation pipeline, bullet proof audit trail, delete as much data as possible, with auto delete if something went wrong.
We had rigid rule about where that data could be stored and _who_ could access it. We were not allowed to share "wild" footage (ie data that might have the hint of anyone who hadn't signed a contract) for annotation because it would be given to a third party. THe public datasets we released all had traceable people, locations all with legal waivers signed.
Then I hear they just started fucking hosing private data to annotators to _train_ on? without any fucking basic controls at all? Just shows that whenever Zuck or monetsization want something, the rules don't apply.
I look forward to that entire industry collapsing in on it's self.