Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
viraptor
11mo ago
0 comments
Save
Share
Sure, it's done per token, but the question is: how much do the knowledge domains match up with experts. I could not find hard data on this.
0 comments
1 comments · 1 top-level
top
newest
oldest
boroboro4
11mo ago
Check out DeepSeek v3 model paper. They changed the way they train experts (went from aux loss to different kind expert separation training). It did improve experts domain specialization, they have neat graphics on it in the paper.
j
/
k
navigate · click thread line to collapse