1Understanding Multi-Head Latent Attention (From DeepSeek) (opens in new tab)(shreyansh26.github.io)2shreyansh261mo ago1
2Deriving the gradient for the backward pass of Layer Normalization (opens in new tab)(shreyansh26.github.io)3shreyansh269mo ago0
3GTC'25 Notes: CUDA Techniques to Maximize Memory Bandwidth – Part 1 (opens in new tab)(shreyansh26.github.io)1shreyansh261y ago0
6Ask HN: What are some good resources on Recommender Systems?I am looking for good technical resources to study recommender systems in depth. Anything from books, courses to blogs would be helpful. Would also be better if I could follow along with code and implement stuff.show more14shreyansh263y ago3