1When Models Manipulate Manifolds: The Geometry of a Counting Task (opens in new tab)(transformer-circuits.pub)41wheel8mo ago0Save
4Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (opens in new tab)(transformer-circuits.pub)1681wheel2y ago124Save
5The Claude 3 Model Family: Opus, Sonnet, Haiku [pdf] (opens in new tab)(www-cdn.anthropic.com)PDF331wheel2y ago3Save
7Patchscopes: A Unifying Framework for Inspecting Hidden Representations of LMs (opens in new tab)(pair-code.github.io)21wheel2y ago0Save
8Do Machine Learning Models Memorize or Generalize? (opens in new tab)(pair.withgoogle.com)4541wheel2y ago210Save
9An interactive introduction to grokking and mechanistic interpretability (opens in new tab)(pair.withgoogle.com)11wheel2y ago0Save
10From Confidently Incorrect Models to Humble Ensembles (opens in new tab)(pair.withgoogle.com)11wheel3y ago1Save
12Searching for Unintended Biases with Saliency (opens in new tab)(pair.withgoogle.com)21wheel3y ago1Save
13Interactive Visualizations of Federated Learning (opens in new tab)(pair.withgoogle.com)11wheel3y ago0Save
15It’s Not Spider-Man’s Fault: Why Best Picture Winners Aren’t Hits Anymore (opens in new tab)(roadtolarissa.com)21wheel4y ago0Save