Understanding Multi-Head Latent Attention (From DeepSeek) (opens in new tab)

(shreyansh26.github.io)

2 pointsshreyansh263mo ago1 comments

1 comments

A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.

j / k navigate · click thread line to collapse

1 comments

shreyansh26OP3mo ago

j / k navigate · click thread line to collapse