1A running list of reasons to move to open source (opens in new tab)(whyopensource.ai)5mezark3d ago0Save
2Moe inference optimizations: 15% lower expert load by request reordering (opens in new tab)(blog.doubleword.ai)3mezark1mo ago0Save
6Also-RANS: Asymmetric Numeral Systems for Entropy Coding (opens in new tab)(fergusfinn.com)25mezark1mo ago0Save
8QueueSpec – drafting speculation tokens while a request queues (opens in new tab)(blog.doubleword.ai)1mezark5mo ago0Save
9ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism (opens in new tab)(mainlymatmul.com)1mezark5mo ago0Save
10Parallel Primitives for Multi-Agent Workflows (opens in new tab)(fergusfinn.com)1mezark5mo ago0Save
11New fastest AI Model Gateway – 450x less overhead than LiteLLM (opens in new tab)(github.com)GitHub2mezark8mo ago0Save
13Controlled generation of OS LLMs – without impacting latency (opens in new tab)(youtube.com)Video7mezark2y ago1Save