1Dynamic Persistent Tile Scheduling w/ Cluster Launch Control (CLC) on Blackwell (opens in new tab)research.colfax-intl.com2matt_d10h ago0
2VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? (opens in new tab)github.com2matt_d11h ago0
3CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure (opens in new tab)arxiv.org3matt_d12h ago0
4Microbenchmark-Driven Analytical Performance Modeling Across Modern GPUs (opens in new tab)arxiv.org1matt_d23h ago0
6VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU (opens in new tab)arxiv.org2matt_d1d ago0
7Aurora: A Leverage-Aware Optimizer for Rectangular Matrices (opens in new tab)blog.tilderesearch.com1matt_d1d ago0
8The Two Abstractions of System Design: Hide or Reduce (opens in new tab)muratbuffalo.blogspot.com1matt_d2d ago0
10Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs (opens in new tab)arxiv.org2matt_d3d ago0
13cuda-oxide: a custom rustc backend for compiling GPU kernels in pure Rust (opens in new tab)github.com3matt_d4d ago0