undefined | Better HN

0 pointsgwern3y ago0 comments

Why do they need to have something in text2image? It in no way builds lockin to the API or anything, especially with how gimped it is.

1. Yes, they are. Look at the constant iterative rollouts of GPTs 2. Most of which is useless to them, not that they have made any use of it 3. the fact that it would be so easy to improve, and they haven't, only emphasizes my point. 4. sure, that could be useful. Except there's zero integration or mention. (They haven't even opened up the vision part of GPT-4 yet.) 5. the fact that it would be so easy to improve, and they haven't, only emphasizes my point. 6. why wait for GPT-5 possibly years from now?

0 comments

1 comments · 1 top-level

TeMPOraL3y ago

> Why do they need to have something in text2image?

So they're "on the list". So whenever journalists and bloggers write articles about text2image, they're listed as a player in this space. For vast majority of such articles, neither the authors nor the audience will be able to tell that OpenAI's offering is far behind and that they're basically keeping a token presence in the space.

At least that's my hypothesis. I'm neither a domain expert or a business expert - I just feel that, for OpenAI, having laymen view them as an industry leader in AI in general, is worth the price of keeping Dall-E available. In fact, as more and more users realize there are better models available elsewhere, that price goes down, while the effect on laymen audience stays the same.

(Note: the term "laymen", as I use it here, specifically includes most entrepreneurs, managers and investors, in tech or otherwise. If I'm being honest in myself, I belong to that category too; it's in fact this conversation and some recent threads that made me realize just how weak OpenAI is in image generation space.)

> Look at the constant iterative rollouts of GPTs

You mean some unannounced ones, or the pinned models? Because AFAIK GPT-3.5 had two updates after release (the turbo model and the current one), and GPT-4 had one. I mean public releases; for example, how often they updated GPT-4 back before it was public, e.g. when Microsoft was building Bing Chat, is not relevant in this context.

Also compare that with how, going by HN submissions alone, every other day someone releases some improved LLaMA-derived LLM.

> 2. Most of which is useless to them, not that they have made any use of it 3. the fact that it would be so easy to improve, and they haven't, only emphasizes my point. 4. sure, that could be useful. Except there's zero integration or mention. (...) 5. the fact that it would be so easy to improve, and they haven't, only emphasizes my point.

There's little for them to gain by openly using all that work now. At the moment, they can just keep an eye on what's posted to Civitai, paying particular attention to how different model derivatives respond to prompts (think e.g. CyberRealistic vs. Deliberate) and why, and build up a training corpus of prompts and settings, helpfully provided by the community, complete with quality rating. They can do that using a small fraction of resources they have available - so that when the time comes, they can use their full resources to quickly train and deploy a model that blows everyone else out of the water.

Also, as an organization, they can focus only on so many things at a time. GPT-4 is buying them some space, and I believe they're currently focusing primarily on their cooperation with Microsoft, and/or other things involving LLMs. Given the relative usefulness and potential of LLMs vs. image generation, both short and long-term, doing more than bare minimum in image generation right now might be too much of a distraction for an organization this size.

> (They haven't even opened up the vision part of GPT-4 yet.)

They're in the lead. They're not in a hurry. They're likely giving Microsoft a head start.

> 6. why wait for GPT-5 possibly years from now?

Why do it earlier? What could they possibly gain by jumping back into text2image space now? At this point, compared to LLMs, text2image seems neither profitable not particularly relevant for x-risk, so whichever way you cut it, I can't see why would they want to prioritize it.

j / k navigate · click thread line to collapse