OsamaJaber on Hacker News

1

Compiles any HuggingFace model into a single persistent megakernel (opens in new tab)

(twitter.com)

2OsamaJaber10d ago0

2

Mega Kernels, Written by Agents (opens in new tab)

(arxiv.org)arXiv

2OsamaJaber15d ago0

3

AutoMegaKernel: Compiling a LLM into a single CUDA kernel (opens in new tab)

(arxiv.org)arXiv

3OsamaJaber18d ago0

4

AutoMegaKernel: Compile an LLM into one provably-correct CUDA megakernel (opens in new tab)

(github.com)GitHub

4OsamaJaber19d ago0

5

StreamIndex: Memory-bounded compressed sparse attention via streaming top-k (opens in new tab)

(arxiv.org)arXiv

4OsamaJaber1mo ago0

6

Show HN: AutoKernel, Auto GPU Kernel Optimization (opens in new tab)

(arxiv.org)arXiv

2OsamaJaber1mo ago0

7

DeepSeek V4's indexer dies at 65K. We got it to 1M on 6GB (opens in new tab)

(arxiv.org)arXiv

5OsamaJaber1mo ago0

8

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search (opens in new tab)

(arxiv.org)arXiv

4OsamaJaber1mo ago0

9

DeepSeek V4's indexer OOMs at 65K context. We got it to 1M in 6G (opens in new tab)

(arxiv.org)arXiv

8OsamaJaber1mo ago0

10

Ouroboros: Dynamic Weight Generation for Recursive Transformers (opens in new tab)

(arxiv.org)arXiv

2OsamaJaber2mo ago0

11

Tide: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference (opens in new tab)

(arxiv.org)arXiv

3OsamaJaber2mo ago1

12

Own your AI. Optimized down to the kernel (opens in new tab)

(runinfra.ai)

1OsamaJaber2mo ago0

13

Agents with "Hands" (opens in new tab)

(openfang.sh)

7OsamaJaber4mo ago18

14

Open-Source Agent Operating System (opens in new tab)

(github.com)GitHub

11OsamaJaber4mo ago3

15

PicoLM: Run a 1B parameter LLM on a $10 board (opens in new tab)

(github.com)GitHub

4OsamaJaber4mo ago1

OsamaJaber

Recent submissions

Compiles any HuggingFace model into a single persistent megakernel (opens in new tab)

Mega Kernels, Written by Agents (opens in new tab)

AutoMegaKernel: Compiling a LLM into a single CUDA kernel (opens in new tab)

AutoMegaKernel: Compile an LLM into one provably-correct CUDA megakernel (opens in new tab)

StreamIndex: Memory-bounded compressed sparse attention via streaming top-k (opens in new tab)

Show HN: AutoKernel, Auto GPU Kernel Optimization (opens in new tab)

DeepSeek V4's indexer dies at 65K. We got it to 1M on 6GB (opens in new tab)

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search (opens in new tab)

DeepSeek V4's indexer OOMs at 65K context. We got it to 1M in 6G (opens in new tab)

Ouroboros: Dynamic Weight Generation for Recursive Transformers (opens in new tab)

Tide: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference (opens in new tab)

Own your AI. Optimized down to the kernel (opens in new tab)

Agents with "Hands" (opens in new tab)

Open-Source Agent Operating System (opens in new tab)

PicoLM: Run a 1B parameter LLM on a $10 board (opens in new tab)

Recent submissions

Compiles any HuggingFace model into a single persistent megakernel (opens in new tab)

Mega Kernels, Written by Agents (opens in new tab)

AutoMegaKernel: Compiling a LLM into a single CUDA kernel (opens in new tab)

AutoMegaKernel: Compile an LLM into one provably-correct CUDA megakernel (opens in new tab)

StreamIndex: Memory-bounded compressed sparse attention via streaming top-k (opens in new tab)

Show HN: AutoKernel, Auto GPU Kernel Optimization (opens in new tab)

DeepSeek V4's indexer dies at 65K. We got it to 1M on 6GB (opens in new tab)

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search (opens in new tab)

DeepSeek V4's indexer OOMs at 65K context. We got it to 1M in 6G (opens in new tab)

Ouroboros: Dynamic Weight Generation for Recursive Transformers (opens in new tab)

Tide: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference (opens in new tab)

Own your AI. Optimized down to the kernel (opens in new tab)

Agents with "Hands" (opens in new tab)

Open-Source Agent Operating System (opens in new tab)

PicoLM: Run a 1B parameter LLM on a $10 board (opens in new tab)