Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
leetharris
1y ago
0 comments
Save
Share
They likely continue to train dense models because they are far easier to fine tune and this is a huge use case for the Llama models
0 comments
1 comments · 1 top-level
top
newest
oldest
whimsicalism
1y ago
It probably also has to do with their internal infra. If it were just about dense models being easier for the OSS community to use & build on, they should probably be training MoEs and then distilling to dense.
j
/
k
navigate · click thread line to collapse