undefined | Better HN

0 pointsvisarga4y ago0 comments

It's not the training set. CLIP stands for Contrastive Language-Image Pretraining. It's a model that tells when a text and an image match.

0 comments

2 comments · 1 top-level

jazzyjackson4y ago· 1 in thread

huh, thanks, I'll have to look over the website again

do they say anything about their training set?

visargaOP4y ago

They say:

> When training the encoder, we sample from the CLIP and DALL-E datasets (approximately 650M images in total) with equal probability. When training the decoder, upsamplers, and prior, we use only the DALL-E dataset (approximately 250M images).

j / k navigate · click thread line to collapse