It’s a terrible job, I wouldn’t want to do it, but someone needs to. Perhaps one day, AI will be accurate enough to not need it, but even then you need someone to process complaints and waivers (like someone’s home photos being inaccurately flagged).
The correct way to organize social media is in federated way. Each server only holds on average a few hundred or few thousand people. Server moderators should be legally responsible for content on their server. CSAM on social media will be 100x suppressed because banning people is way easier on small servers.
Not many moderators will have to look at CSAM because the structure of the system makes is unappealing to even try sharing CSAM, knowing you will be immediately blocked.
That's a tradeoff you can choose to make, but you need to enter into it with open eyes.
It doesn't matter how many are shared but how many are viewed. On a small server community policing works just fine, bad actors are easier and faster to block and to top it off, the smaller reach of each server makes it unprofitable to target multiple serves, fish for their weak points. etc - the dirty jobs become unprofitable which is what matters most.
With the help of AI, small players can do a better job at removing CSAM.
No it's not. It's certainly not my choice. No one asked me if it's okay for Facebook to distribute CSAM because you insist it would be worse if it didn't.
And therefore anything that is remotely questionable will be blocked. Not just kiddie porn. Pissed off a local business with a bad review? Blocked.
Child abusers are twisted people, and I really don’t care much what happens to them, but making it impossible for them to use the internet means sterilizing the whole thing.
This is already the case. There is a lot of lawful, useful, medical or educational content that is actively censured on social medias because they include words or pictures of organs while same social medias actively encourage and develop algorithm to push underage girls (and possibly boys) posting pictures of themselves in sexual poses, attires and context.
Big tech and social media networks love and push CSAM, they just hide the genitals but the content really is the same.
But that's not possible in today's oligopoly of social media. An invisible algorithm will ban you, and there is no way back, and few alternates. Big Social Media is way worse from a sanitizing perspective than some federated social media.
Moderators need to actually understand the context of the picture/video, which requires knowledge of culture and language of the people sharing the pictures. It's really difficult to do that without hiring moderators from every culture in the world.
But small federated servers can often align along real world human social networks, so it's easier for the server admin to understand what should be removed.
My impression is it would take Manhattan-Project levels of effort and funds to come close to "solving" this problem, especially without someone getting on a watchlist for having a telehealth-first primary care provider insurace plan and asking for advice on their toddler's chickenpox.
Human review? Meta has small armies worth of content moderators already that tend to burn out with psychological problems and have a suicide rate where you're probably better off going to fight in a real war. (This includes workers hired by Sama in Kenya, to link back to the OP.)
I will reluctantly grant Meta that they're up against a really hard problem here.
No it isn't. Small servers often don't have paid security or moderation, are run in anonymous fashion, and have no profit motive that can even be used to incentivize them against hosting illegal content.
That's visible when it comes to porn. There's a million bootleg porn sites on the internet hosted that show off illegal content. The only site that was ever forced to curate its content was Pornhub, because they're sufficiently large, work in a jurisdiction that has laws and can be held accountable. From a content moderation standpoint going after a million web forums is an absolute pain in the ass compared to going after Facebook.
Which is the first argument any decentralization advocate always brings up (and they're correct to do so), censorship is harder and evasion of law enforcement easier when dealing with a network of independent actors.
You now have 100x the total human effort for mods to review and ban him.
So if you want to send someone to jail, just talk your way into joining their server, upload some illegal content, and report them for it?
> Not many moderators will have to look at CSAM because the structure of the system makes is unappealing to even try sharing CSAM, knowing you will be immediately blocked.
Why would someone join a server with active moderation if they wanted to share CSAM with their social media friends?
They would seek out one of those servers that was set up specifically for those groups, where it was known to be a safe space.
This is what many people don't get about federated networks: The people in those little servers DGAF if you block them. They want to be surrounded by their likeminded friends away from the rules of some bigger service like Facebook or Twitter. Federated social media is the perfect platform for them because they can find someone who set up a server in some other country with their own idea of rules and join that, not be subject to the regulations of mainstream social media.
It also makes it relatively easy to avoid, as server admins share blocklists. I know a dozen servers offhand that i'd block if i ran another fediverse server.
Fosstodon fediverse server doesn't have this issue, for example.
I replied this way because the way you wrote it, it sounds like an indictment of a system that's designed to avoid advertisers getting user profiles, over all else.
The problem is the people who participate in this (the illegal and immoral), and not "the network."
Anecdotally, when I was a young adult I was a volunteer moderator for a large forum. We got reports of CSAM several times a month and had a process for escalating and reporting it to the FBI IC3 - we retained a lot of information about the users that posted it.
One of the administrators of the website mentioned to me that over the years since the inception of the forum, they'd reported almost a thousand incidents of CSAM distribution - and the FBI followed up with them to get information less than 10 times in total.
Actually companies should be bullied about privacy and copyright so they are unable to share any contents at a scale with 3rd parties. Thus they have to solve it on their own and forced to realize their business model is shit.
Big “citation needed” here. My bet is that Meta have far better moderation systems than any other social media company on the planet.
That's more what i got from that pull-quote. I know a company that has hundreds of individual forums, and those are all moderated quickly and correctly (last i heard). They're moderated so effectively they often get DDoS by Russian IPs for banning users for scam posts from that country.
Different situation.
Facebook has to do CSAM moderation because it's a publishing platform. People will post CSAM on facebook, so they must do moderation.
And "just don't have facebook" isn't a solution because every publication of any sort has to deal with this problem; Any newspaper accepting mail has this problem. (Albeit to a much more scaled down version) People were nailing obscene things to bulletin boards for all recorded history.
---
In contrast, OpenAI has no such problem. It did not have CSAM pushed onto it, it actively collected such data itself. It could have, at any point before and after, simply stopped scraping all of the web indiscriminately and switched to using more curated sources of scraped data.
The downside would be "worse LLMs" or "LLMs being created later", which is a perfectly acceptable compromise.
---
This is not to say that genuine content flagging firms have no reason to curate such data & build tools to automatically flag content before human moderators have to. (But then they also shouldn't be outsourcing this and traumatizing contract workers for $2-3 an hour)
But OpenAI is not such a firm. It's a general AI company.
Is there an hourly rate at which this should be acceptable?
The current support systems for police in this subject are already insufficient. Facebook's treatment of their moderation staff is abhorrent. The point of including the pay figure is to further illustrate just how damning this subcontracting practice is.
Not only is there an acceptable market rate for trauma, it’s sometimes competitive and requires licensing.
^ i originally said "triage doctors" but i meant the resident ER doc.
They have access to better counselling and are ostensibly trained for the job. But there are still suicides.
The core Facebook product is users' posts. It's not possible to separate those two. Nor can one downscale Facebook in a way that stops the problem; The aforementioned "Facebook has had this problem because it's a problem we've had since the medieval days of a town bulletin board"
With OpenAI, the way ChatGPT was built and user submissions are separate things. The GPT models could have been have been trained without this mess. OpenAI could be more selective in what data it scrapes.
While OpenAI cannot stop users sending god knows what in their prompt text and images, OpenAI can choose to not interact with that data beyond the minimum legal retention, by e.g. not using it for training the next generation of models. This would massively downscale the problem.
AI output is another such problem, where A) Maybe this'd be less of a problem if they didn't recklessly include a bunch of CSAM into the training data by accident, and B) LLMs just aren't the kind of fundamental human right that "having a public opinion" is. It would be fine if they were less good, invented years later, or even not invented at all.
The main counterargument to the latter has been the "But China is inventing evil AI" spiel, which is fairly weak. If China builds an orphaned baby crushing machine, we do not need to build an orphaned baby crushing machine of our own. (And the reality is that China is only chasing AI so aggressively because the west does. They're reasonable people, it would have been entirely possible for both the west and China to make a mutual "no orphan crushing" agreement and just accept slower rollout of technology. This is exactly what has been done with human genetic engineering, and China did in fact enforce these norms.)
I guess that they process billions of images every day.
I don’t think they’re getting csam from scraping (thankfully, I expect there isnt much publicly available csam).
They aren’t as big as facebook, but they must have this functionality or many users will be hurt.
You've just thrown the garbage over your fence. Instead of OpenAI contracting Sama to classify CSAM, the "Curators" have to.
At the end of the day, someone needs to classify it. If you say the platforms need to, and they miss some, and it ends up in OAI training data, OAI is going to be the entity paying the prices.
Any website that allows user to upload videos needs some sort of service that can identify and report CSAM.
This is of course incredibly illegal, but megacorps (by valuation) and oligarchy members are above the law so who cares. I assume there could be a regulatory framework which can make this legal for an extremely specific purpose, but there is zero change that OpenAI was part of this/abiding by this in 2022, absolutely none.
Westeners are too expensive and unwilling to do it. AI is a business model that requires poverty and extreme inequality to function. Yes other businesses do that too, but they don't claim it's a solution to everything while it actually has very special human requirements.
There are more reasons why these jobs are located in developing countries, it's not only the price of labour. Imagine for a second, these annotations would have to be done in the US. The public outrage would probably be audible across the Atlantic. This is another form of imperialism.
Granted the latter is kinda happening distantly on YouTube where you can’t talk about “ suicide “ so everyone self censors…
you must be extremy priviledge to think that way, even as EU I would be glad to do it for the minimum salary. For your info, a terrible job for most human is a job that is extremly hard physically at the point of destroying your health. That said, like many people, I would find it much more interesting than many boring job. [If someone read this, please hire me for this, in exchange I would work the 5 first hour for free]