Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Understanding Multi-Head Latent Attention (From DeepSeek)
(opens in new tab)
(shreyansh26.github.io)
2 points
shreyansh26
1mo ago
1 comments
Share
Understanding Multi-Head Latent Attention (From DeepSeek) | Better HN
1 comments
default
newest
oldest
shreyansh26
OP
1mo ago
A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.
j
/
k
navigate · click thread line to collapse