3UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms (opens in new tab)(fergusfinn.com)8kkm10d ago0Save
7No Token Left Behind: Demystifying Token-in-Token-Out in Miles (opens in new tab)(lmsys.org)2kkm16d ago0Save
8MoE expert co-activations: Reordering inputs yields easy throughput gains (opens in new tab)(blog.doubleword.ai)2kkm17d ago0Save
10Speculative KV coding: losslessly compressing KV cache by up to ~4× (opens in new tab)(fergusfinn.com)155kkm21d ago48Save
13Brave AI privacy:LLMs on NEAR AI Nvidia-Backed Trusted Execution Environments (opens in new tab)(brave.com)1kkm7mo ago0Save
15FHE can be leveraged for LLMs such as ChatGPT in a privacy-preserving manner (opens in new tab)(huggingface.co)4kkm1y ago0Save