Skip to content

Top Best Ask Show New Jobs

thw20 | Better HN

thw20

1 karmaJoined April 24, 20249 submissions

Recent submissions

1

Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition (opens in new tab)

(jeffreywong20.github.io)

1thw201mo ago0

2

Towards understanding multiple attention sinks in LLMs (opens in new tab)

(github.com)GitHub

1thw203mo ago2

3

The Existence and Behavior of Secondary Attention Sinks (opens in new tab)

(arxiv.org)arXiv

1thw204mo ago0