Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
Oras
7mo ago
0 comments
Save
Share
Hard time? What value does adult videos description, views and comments add to small (7,32B) models?
0 comments
3 comments · 2 top-level
top
newest
oldest
andy99
7mo ago
· 1 in thread
It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds
ccgreg
7mo ago
Common Crawl is a particular dataset. commoncrawl.org
khimaros
7mo ago
what if that's where they learned how to utilize the double entendre? hard times indeed.
j
/
k
navigate · click thread line to collapse