The paper I read was this one from Catherine Olsson et al at Anthropic.
https://transformer-circuits.pub/2022/in-context-learning-an...
There's a useful article here that expands on the types of head composition and provides some illustrations.
https://www.lesswrong.com/posts/TvrfY4c9eaGLeyDkE/induction-...