undefined | Better HN

0 pointsyencabulator22d ago0 comments

An LLM could probably make that distinction clearly.

a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.

For example:

- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful

0 comments

2 comments · 1 top-level

BizarroLand22d ago· 1 in thread

So, where are the truly uncensored models? There has to be some that have no guardrails, built on publicly available data, that will explain to anyone in graphic detail anything they want to know or talk about.

I've tried the abliterated ones from huggingface and they still have guardrails. I guess I could fire up unsloth and re-abliterate a 20b, but surely someone somewhere has already done this.

All of this concern about guardrails and security, people have such puckered butts about it when so far, 99.9% of people at least have no access to any of this to begin with, and if someone does use a tool for evil, it's on the user, not the tool.

fc417fc80222d ago

As I understand things (not a user) abliteration has been superceded by actively monitoring the model state during the run and steering specific "negative" directions as they arise. It's both more reliable and does less damage.

j / k navigate · click thread line to collapse