Many of the “uncensored” model providers also do some fine tuning on the models. Some of them target better benchmarks or other measures, but outside of the benchmarks and metrics they’re fine tuned for they are generally noticeably worse than the original model.
I guess I’m looking for a kind of bulk/sticky dropout (which was in fashion way back when I studied DNN in school).
Abliteration whilst a neologism implies a surgical ablation of refusal.
Earlier approaches post–trained the model to refuse less and, much like other kinds of fine–tuning, it degraded performance. They were "uncensored".
Abliteration has seen some improvement to this day but it always was close to equivalent performance to the original when compared to those earlier techniques.
They're more prone to getting stuck in loops, becoming unresponsive, and hallucinating more (presumably because of the reduced desire to not answer).
I've tried all the popular heretic peddlers, but if you have one that you can vouch for maybe I've simply missed it.