undefined | Better HN

0 pointsravi-delia4y ago0 comments

Basically makes sense, no? DALLE-2 suffered from misunderstanding propositional logic, treating prompts as less structured then it should have. That's a text model issue! Compared to that, scaling up the image isn't as important (especially with a few passes).

0 comments

1 comments · 1 top-level

espadrine4y ago

Is there a way to confirm that this extra processing relates to the language structure, and not the processing of concepts?

I wouldn’t be surprised if the lack of video and 3D understanding in the image dataset training fails to understand things like the fear of heights, and the concept of gravity ends up being learned in the text processing weights.

1 more reply

j / k navigate · click thread line to collapse