1The Sycophancy Problem: Why your AI is a Polite Liar (and how to fix it) (opens in new tab)(kampff.substack.com)2Weatherill2mo ago1Save
2Show HN: Voight-Kampff Machine: Diagnostics of the "Is" vs. "Wish" Clash (opens in new tab)(zenodo.org)1Weatherill2mo ago1Save
3Show HN: ECX a 'Jail-Fix' for RLHF Neutrality Loops in LLMs (opens in new tab)(zenodo.org)3Weatherill2mo ago1Save
4Show HN: A Homeostatic Logic-Funnel to Prevent RLHF Overrides in LLM Personas (opens in new tab)(zenodo.org)1Weatherill2mo ago1Save