hurrycane on Hacker News

1

Insights from Multilingual Curation for a 20T-Token Dataset (opens in new tab)

(datologyai.com)

1hurrycane4mo ago0

2

DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap (opens in new tab)

(datologyai.com)

5hurrycane5mo ago0

3

DatBench: Cut VLM eval compute by >10× while INCREASING signal (opens in new tab)

(datologyai.com)

4hurrycane5mo ago0

4

Luxical: Lexical-Dense Embeddings for Web-Scale Data Curation (3×–100× Faster) (opens in new tab)

(datologyai.com)

3hurrycane6mo ago0

5

Arcee Trinity Mini: US-Trained Moe Model (opens in new tab)

(arcee.ai)

70hurrycane6mo ago15

6

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)

(blog.datologyai.com)

1hurrycane10mo ago0

7

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)

(blog.datologyai.com)

3hurrycane10mo ago0

8

Image-Text Curation for 1B+ Data: Faster, Better, Smaller Clip Models (opens in new tab)

(datologyai.com)

12hurrycane1y ago0

9

Zico Kolter Joins OpenAI's Board of Directors (opens in new tab)

(openai.com)

3hurrycane1y ago0

10

Augmenting Segment customer data with behavioral signals using the Moonsense SDK (opens in new tab)

(moonsense.io)

1hurrycane4y ago0

11

Moonsense Recorder – Build live prototypes using mobile device sensor data (opens in new tab)

(moonsense.io)

2hurrycane5y ago0

12

From the Gym to a Jupyter Notebook – Building a Squats Counter App in a Day (opens in new tab)

(urimerhav.medium.com)

7hurrycane5y ago0

13

WeWork to Become Publicly Traded via SPAC (opens in new tab)

(wework.com)

2hurrycane5y ago0

14

Reducing indexing latency of Twitter Search to one second (opens in new tab)

(blog.twitter.com)

3hurrycane6y ago0

15

Twitter meets TensorFlow (opens in new tab)

(blog.twitter.com)

206hurrycane8y ago68

hurrycane

Recent submissions

Insights from Multilingual Curation for a 20T-Token Dataset (opens in new tab)

DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap (opens in new tab)

DatBench: Cut VLM eval compute by >10× while INCREASING signal (opens in new tab)

Luxical: Lexical-Dense Embeddings for Web-Scale Data Curation (3×–100× Faster) (opens in new tab)

Arcee Trinity Mini: US-Trained Moe Model (opens in new tab)

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)

Image-Text Curation for 1B+ Data: Faster, Better, Smaller Clip Models (opens in new tab)

Zico Kolter Joins OpenAI's Board of Directors (opens in new tab)

Augmenting Segment customer data with behavioral signals using the Moonsense SDK (opens in new tab)

Moonsense Recorder – Build live prototypes using mobile device sensor data (opens in new tab)

From the Gym to a Jupyter Notebook – Building a Squats Counter App in a Day (opens in new tab)

WeWork to Become Publicly Traded via SPAC (opens in new tab)

Reducing indexing latency of Twitter Search to one second (opens in new tab)

Twitter meets TensorFlow (opens in new tab)

Recent submissions

Insights from Multilingual Curation for a 20T-Token Dataset (opens in new tab)

DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap (opens in new tab)

DatBench: Cut VLM eval compute by >10× while INCREASING signal (opens in new tab)

Luxical: Lexical-Dense Embeddings for Web-Scale Data Curation (3×–100× Faster) (opens in new tab)

Arcee Trinity Mini: US-Trained Moe Model (opens in new tab)

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining (opens in new tab)

Image-Text Curation for 1B+ Data: Faster, Better, Smaller Clip Models (opens in new tab)

Zico Kolter Joins OpenAI's Board of Directors (opens in new tab)

Augmenting Segment customer data with behavioral signals using the Moonsense SDK (opens in new tab)

Moonsense Recorder – Build live prototypes using mobile device sensor data (opens in new tab)

From the Gym to a Jupyter Notebook – Building a Squats Counter App in a Day (opens in new tab)

WeWork to Become Publicly Traded via SPAC (opens in new tab)

Reducing indexing latency of Twitter Search to one second (opens in new tab)

Twitter meets TensorFlow (opens in new tab)