1Generalizing Test-Time Compute-Optimal Scaling as an Optimizable Graph (opens in new tab)(huggingface.co)2veryluckyxyz4mo ago0
2Hidden drivers of HRM's performance on ARC-AGI (opens in new tab)(arcprize.org)31veryluckyxyz5mo ago2
3Set Block Decoding Is a Language Model Inference Accelerator (opens in new tab)(arxiv.org)4veryluckyxyz6mo ago0
5A Batch Size and Token NUM- BER Agnostic Learning Rate Scheduler (opens in new tab)(arxiv.org)2veryluckyxyz9mo ago0
7Model Merging in Pre-Training of Large Language Models (opens in new tab)(arxiv.org)2veryluckyxyz10mo ago0
8Understanding Perception and Reasoning Through Model Merging (opens in new tab)(arxiv.org)2veryluckyxyz10mo ago0
9Building and better understanding vision-language models (2024) (opens in new tab)(huggingface.co)2veryluckyxyz10mo ago0
11Do Reasoning Models Show Better Verbalized Calibration? (opens in new tab)(arxiv.org)2veryluckyxyz11mo ago0
12Robustly identifying concepts introduced during chat fine-tuning with crosscoder (opens in new tab)(arxiv.org)6veryluckyxyz11mo ago0