1Insights from Multilingual Curation for a 20T-Token Dataset (opens in new tab)(datologyai.com)1hurrycane4mo ago0Save
2DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap (opens in new tab)(datologyai.com)5hurrycane5mo ago0Save
3DatBench: Cut VLM eval compute by >10× while INCREASING signal (opens in new tab)(datologyai.com)4hurrycane5mo ago0Save
4Luxical: Lexical-Dense Embeddings for Web-Scale Data Curation (3×–100× Faster) (opens in new tab)(datologyai.com)3hurrycane6mo ago0Save
6BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)(blog.datologyai.com)1hurrycane10mo ago0Save
7BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)(blog.datologyai.com)3hurrycane10mo ago0Save
8Image-Text Curation for 1B+ Data: Faster, Better, Smaller Clip Models (opens in new tab)(datologyai.com)12hurrycane1y ago0Save
10Augmenting Segment customer data with behavioral signals using the Moonsense SDK (opens in new tab)(moonsense.io)1hurrycane4y ago0Save
11Moonsense Recorder – Build live prototypes using mobile device sensor data (opens in new tab)(moonsense.io)2hurrycane5y ago0Save
12From the Gym to a Jupyter Notebook – Building a Squats Counter App in a Day (opens in new tab)(urimerhav.medium.com)7hurrycane5y ago0Save
14Reducing indexing latency of Twitter Search to one second (opens in new tab)(blog.twitter.com)3hurrycane6y ago0Save