Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
irthomasthomas
10mo ago
0 comments
Save
Share
Oh, I didn't know that. Weird!
0 comments
3 comments · 1 top-level
top
newest
oldest
reissbaker
10mo ago
· 2 in thread
It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).
irthomasthomas
OP
10mo ago
Interesting, thanks. I didn't know you could even train at FP4 on H100s
reissbaker
10mo ago
It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.
j
/
k
navigate · click thread line to collapse