kkm on Hacker News

1

InfiniBand, RoCE, and All That (opens in new tab)

(fergusfinn.com)

4kkm6d ago0

2

2678x Faster Matrix Multiplication with a GPU (opens in new tab)

(0mean1sigma.com)

2kkm9d ago0

3

UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms (opens in new tab)

(fergusfinn.com)

8kkm10d ago0

4

Hacking Google with A.I. For $500k (opens in new tab)

(brutecat.com)

1kkm13d ago0

5

How to setup a local coding agent on macOS (opens in new tab)

(ikyle.me)

507kkm13d ago127

6

Anatomy of a high-performance EP kernel (opens in new tab)

(fergusfinn.com)

16kkm15d ago1

7

No Token Left Behind: Demystifying Token-in-Token-Out in Miles (opens in new tab)

(lmsys.org)

2kkm16d ago0

8

MoE expert co-activations: Reordering inputs yields easy throughput gains (opens in new tab)

(blog.doubleword.ai)

2kkm17d ago0

9

The Economics of Speculative Decoding (opens in new tab)

(fergusfinn.com)

30kkm17d ago6

10

Speculative KV coding: losslessly compressing KV cache by up to ~4× (opens in new tab)

(fergusfinn.com)

155kkm21d ago48

11

70x faster cold(ish) starts for SGLang (opens in new tab)

(fergusfinn.com)

1kkm22d ago0

12

Bringing Up DeepSeek-V4-Flash on AMD MI300X (opens in new tab)

(fergusfinn.com)

120kkm23d ago25

13

Brave AI privacy:LLMs on NEAR AI Nvidia-Backed Trusted Execution Environments (opens in new tab)

(brave.com)

1kkm7mo ago0

14

How fast can an LLM go? (opens in new tab)

(fergusfinn.com)

2kkm7mo ago0

15

FHE can be leveraged for LLMs such as ChatGPT in a privacy-preserving manner (opens in new tab)

(huggingface.co)

4kkm1y ago0

kkm

Recent submissions

InfiniBand, RoCE, and All That (opens in new tab)

2678x Faster Matrix Multiplication with a GPU (opens in new tab)

UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms (opens in new tab)

Hacking Google with A.I. For $500k (opens in new tab)

How to setup a local coding agent on macOS (opens in new tab)

Anatomy of a high-performance EP kernel (opens in new tab)

No Token Left Behind: Demystifying Token-in-Token-Out in Miles (opens in new tab)

MoE expert co-activations: Reordering inputs yields easy throughput gains (opens in new tab)

The Economics of Speculative Decoding (opens in new tab)

Speculative KV coding: losslessly compressing KV cache by up to ~4× (opens in new tab)

70x faster cold(ish) starts for SGLang (opens in new tab)

Bringing Up DeepSeek-V4-Flash on AMD MI300X (opens in new tab)

Brave AI privacy:LLMs on NEAR AI Nvidia-Backed Trusted Execution Environments (opens in new tab)

How fast can an LLM go? (opens in new tab)

FHE can be leveraged for LLMs such as ChatGPT in a privacy-preserving manner (opens in new tab)

Recent submissions

InfiniBand, RoCE, and All That (opens in new tab)

2678x Faster Matrix Multiplication with a GPU (opens in new tab)

UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms (opens in new tab)

Hacking Google with A.I. For $500k (opens in new tab)

How to setup a local coding agent on macOS (opens in new tab)

Anatomy of a high-performance EP kernel (opens in new tab)

No Token Left Behind: Demystifying Token-in-Token-Out in Miles (opens in new tab)

MoE expert co-activations: Reordering inputs yields easy throughput gains (opens in new tab)

The Economics of Speculative Decoding (opens in new tab)

Speculative KV coding: losslessly compressing KV cache by up to ~4× (opens in new tab)

70x faster cold(ish) starts for SGLang (opens in new tab)

Bringing Up DeepSeek-V4-Flash on AMD MI300X (opens in new tab)

Brave AI privacy:LLMs on NEAR AI Nvidia-Backed Trusted Execution Environments (opens in new tab)

How fast can an LLM go? (opens in new tab)

FHE can be leveraged for LLMs such as ChatGPT in a privacy-preserving manner (opens in new tab)