undefined | Better HN

0 pointsmonster_truck9d ago0 comments

I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too.

0 comments

7 comments · 3 top-level

Aurornis9d ago· 4 in thread

Abliterarion is a brute force technique that removes or silences parts of the model. It reduces performance because the abliterated elements aren’t perfectly isolated to censorship so other aspects suffer.

Many of the “uncensored” model providers also do some fine tuning on the models. Some of them target better benchmarks or other measures, but outside of the benchmarks and metrics they’re fine tuned for they are generally noticeably worse than the original model.

yowlingcat9d ago

The kind of abliteration you are mentioning is no longer state of the art or the most common form of removing the refusal layer in most models. Your your understanding was up to date about a year and a half ago, but has been out of date since after that.

weitendorf9d ago

Unrelated but I’ve been putting off learning about post-abliteration technique and want to use it for an upcoming open source “retraining” project I have on my backlog. I’m not interested in the refusal layers though, more like deep fine tuning but in a way that might let me prune out or consolidate layers, if that makes sense? Do you have any pointers or links to the current SOTA in this area?

I guess I’m looking for a kind of bulk/sticky dropout (which was in fashion way back when I studied DNN in school).

avadodin8d ago

What OP is describing wasn't called abliteration at all.

Abliteration whilst a neologism implies a surgical ablation of refusal.

Earlier approaches post–trained the model to refuse less and, much like other kinds of fine–tuning, it degraded performance. They were "uncensored".

Abliteration has seen some improvement to this day but it always was close to equivalent performance to the original when compared to those earlier techniques.

ls6129d ago

Nowadays it is that Heretic tool is it not? I’ve seen Gemma models uncensored with it.

tredre39d ago

That is something often claimed by heretics. My experience couldn't diverge more, however. All heretic (and abliterix) models I've tried are worse than the original. It's not immediately obvious if all you do is ask 2-3 questions and marvel at how it didn't refuse, but try using them for real over longer 8k+ contexts and it falls apart real fast.

They're more prone to getting stuck in loops, becoming unresponsive, and hallucinating more (presumably because of the reduced desire to not answer).

I've tried all the popular heretic peddlers, but if you have one that you can vouch for maybe I've simply missed it.

antonvs9d ago

I'm curious about where you got that idea from. Neither the theory nor the available examples support it. If it did, everyone knowledgeable would be using abliterated models.

j / k navigate · click thread line to collapse