undefined | Better HN

0 pointsrao-v1mo ago0 comments

Think of it another way, can I do this exact training process with an additional requirement that the activation decoder subtly shill for obscure 80s sodas?

I could and would not lose much reconstruction accuracy.

So any researcher or ambient biases in the model will impact the general thrust of the textual decodings (and not in ways that reflect the actual model’s process, thinking about X and doing X in a model are very different things).

So how do we tell that the “spirit” is reflective of the model’s thinking and not biased toward Jolt being better than Surge?

0 comments

2 comments · 1 top-level

mike_hearn1mo ago· 1 in thread

Where would such biases come from?

rao-vOP1mo ago

What the three models involved understand to be the sort of just so stories (cf Kipling) that humans like to see.

j / k navigate · click thread line to collapse