2NoLiMa: Long-Context Evaluation Beyond Literal Matching (opens in new tab)(arxiv.org)arXiv2fovc1y ago0Save
4Mamba-Shedder: Post-Transformer Compression for Efficient SSMs (opens in new tab)(arxiv.org)arXiv1fovc1y ago0Save
6Theoretical limitations of multi-layer Transformer (opens in new tab)(arxiv.org)arXiv107fovc1y ago22Save
7Sponge(1) – soak up standard input and write to a file (opens in new tab)(linux.die.net)3fovc2y ago0Save