ChatGPT Images 2.0 (opens in new tab)

(openai.com)

140 pointsmeetpateltech19d ago29 comments

29 comments

Genuine question: what positive use cases are sufficient to accept the harm from image generators?

One that i can think of:

- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.

Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.

On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

chromacity19d ago

How else do you expect me to illustrate my LLM-generated blog posts about AI?

2ndorderthought19d ago

Oh my. You still make those? Ever since model chupacobra 2.46 we have AI agents making those for us. At one point I was on the fence about totally outsourcing it to agents but it's way more efficient. Now I have 50 posts a day under different names.

bulletsvshumans19d ago

Democratizing visual communication is arguably useful, for instance helping people to create diagrams that illustrate a concept they wish to convey. This is contingent on the tech working sufficiently well that the visuals are more effective at communication than the text that went into producing them though.

tdb789319d ago

It's always felt like way overhyping to call something "democratization" when it's something I could do as a middle schooler in 2005. It takes some skill to do very well but it's not like basic diagram creation isn't something people already could do for basically free (I create figures for my job all the time now and chatGPT is more expensive than tools I use for design).

Commissioning high quality diagrams from a designer is expensive and I guess it's much cheaper now to essentially commission something but idk, "democratization" still feels weird for just undercutting humans on price.

pesus19d ago

Yeah, it's not "democratization", people were just too lazy to do it before. It only takes some basic effort and a little bit of time to be able to create decent versions of those things.

lossyalgo19d ago

My workplace does this for EVERYTHING. And they are always immediately obviously AI slop, both because we all know they wouldn't ever pay an actual artist to create graphics, but also because the people creating the graphics have no sense of style and let it generate the most generic shit possible with zero creativity.

It's definitely not helpful. It's just annoying and disgusting and a waste of resources IMO. But hey at least Powerpoint presentations have AI slop instead of stuff taken from Google Images!?

galleywest20019d ago

Can these people not just create a diagram with their own hands? Literally a pencil and paper.

I am at the point where I would prefer a poorly human drawn diagram with terrible handwriting over AI slop.

twobitshifter19d ago

If you scroll far enough down the linked page, you’ll see they’re knocking off poor handwriting too!

rafael-lua19d ago

It is not the making of the diagram that is the problem, but often the fact I have no idea how to put it visually. AI is awesome at this.

Now, does that justify the harm? Not for me, but this issue is way out of my league.

1 more reply

zbrozek19d ago

I do that. My slide decks these days are hand scribbled.

spijdar19d ago

The same question could be poised of art in general. I know that response would (and probably should) ruffle peoples' figurative feathers, but I think it's worth considering. A lot of art isn't "necessary for society".

The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".

TomGarden19d ago

I used to think like what you describe, but I've fallen on the side of "art is just more emotionally resonant human communication". And most of the time human communication with more effort and thought behind it. AI art falls short on both being human and, on average, having more effort or thought behind it than your general interaction at the supermarket.

I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.

tills1319d ago

The difference between "art in general" and this is scale and speed. Sure, I'll grant you that people are going to engage in deception with or without this but the barrier to entry with this is literally on the floor. Do you have a $5 prepaid VISA? You can generate whatever narrative you want in 30 seconds. Replace the $5 Prepaid VISA with the pocketbook of a three letter agency and it starts getting crazy.

Barbing19d ago

>starts getting crazy

Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)

Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...

Jtarii19d ago

If you want to say the complete destruction of truth is worth it because some people are having "fun" then idk.

joegibbs19d ago

You shouldn't have believed photos since Stalin had Yezhov airbrushed out of them. The only thing that makes a photo more trustworthy than a painting is that it "looks" more real, and passes itself off as true. But there have always been photographic fakes, manipulation and curation of the photos to push a message. AI will finally end this and people will realise that the image of the thing is not the thing itself.

1 more reply

SpicyLemonZest19d ago

I was worried about the complete destruction of truth, but it seems that's not the result of commoditized image generation. False AI-generated images have been widespread for years, and as far as I've seen, society has adapted very well to the understanding that images can't prove anything without detailed provenance. I'd argue that this has been helped, actually, by random people on the Internet routinely generating plausible images of events that obviously didn't happen.

1 more reply

nothinkjustai19d ago

Art is for the producer, and if they feel it’s necessary for them to produce it than it’s necessary for them, and what is necessary for the individual extends to the society they’re in.

atleastoptimal19d ago

The problem is I'd prefer access to near-photorealistic image gen to be commodified vs something that is restricted, as then only those willing to skirt the law or can leverage criminal networks have access to it.

primax19d ago

Every technological advance in this space has caused harm to someone.

The advent of digital systems harmed artists with developed manual artistic skills.

The availability of cheap paper harmed paper mills hand-crafting paper.

The creation of paper harmed papyrus craftsmen.

The invention of papyrus really probably pissed off those who scraped the hair off thin leather to create vellum.

My point is that in line with Jevon's paradox there is always a wave of destruction that occurs with technological transformation, but we almost always end up with more jobs created by the technology in the middle and long term.

NathanielK19d ago

Ok, but the models only know what to draw because we fed them images of dementia patients and babies.

Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.

tantalor19d ago

Prototyping. Suppose you have a hard time expressing your vision in words or executing it visually.

1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.

2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"

Important: do not skip the last step.

apsurd19d ago

You audit thousands of genAI prototype candidates?

ticulatedspline19d ago

Is the argument any different replacing the word "image generators" with "photoshop" ?

Uncorrelated19d ago

Scale matters. Using Photoshop took vastly more time and skill to pull off realistic images, limiting how many could be made. With image generation there's no practical limit. Some of it will be used for relatively innocuous purposes like making joke images for friends or menus for restaurants. But the floodgates are open for more socially negative uses.

If you're the only one in the world with an internal combustion engine, the environmental impact doesn't matter at all. When they're as common as they are now, we should start thinking about large-scale effects.

davebren19d ago

It turns out that effort matters

ndriscoll19d ago

Not much beyond food, water, and shelter is "necessary" for society, but it's nice to have nice things.

I'm teaching my 4 year old to read. She likes PAW Patrol, but we've kind of exhausted the simple readers, and she likes novelty. So yesterday I had an LLM create a simple reader at her level with her favorite characters, and then turned each text block into a coloring page for her. We printed it off, she and her younger sister colored it, and we stapled it into her own book.

I could come up with 10 3 word sentences myself of course, but I'm not really able to draw well enough to make a coloring book out of it (in fact she's nearly as good as me), and it also helps me think about a grander idea to turn this into something a little more powerful that can track progress (e.g. which phonemes or sight words are mastered and which to introduce/focus on) and automatically generate things in a more principled way, add my kids into the stories with illustrations that look like them, etc.

Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!

drivebyhooting19d ago

Repetition rather than novelty is good for learning.

ndriscoll19d ago

Sure, and she gets that, but at some point she completely memorizes the stories. She also asks if we can get new books at the store, but they don't make 'em that fast.

mcmcmc19d ago

So the use case is just IP theft so you can get more Paw Patrol?

AI aside, if you’ve truly exhausted all the simple readers, maybe she should move on to more advanced books instead of repeating more of the same and gamifying it, which seems a great way to destroy a child’s natural curiosity.

ndriscoll19d ago

Sure, I don't view "IP" as valid, don't entertain the idea that it is possible to "steal" it, and absolutely don't care that someone out there might be sad imagining me making a coloring book for my kids. In fact I'd go so far as to say that holding the position that there's something wrong with tailoring teaching to a child's interests and avoiding that for fear of copyright concerns of all things actually makes you morally bad.

You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.

bsenftner19d ago

That is not IP theft, that's private use. If (s)he tries to sell those coloring books, that's then theft. You're free to do anything you want with IP in privacy, it's only when selling or exhibiting to the public IP law is triggered. Knock yourself out with protected IP in private.

LZ_Khan19d ago

Saving money for businesses trying to promote their products?

JumpCrisscross19d ago

> Genuine question: what positive use cases are sufficient to accept the harm from image generators?

Diagrams and maps. So much text-based communication begs for a diagram or a map.

infecto19d ago

Could the same argument not be applied to practically everything and have drastically different perspectives from people?

stackedinserter19d ago

I have plenty for you:

- package design

- pictures for manuals and guides

- navigation and signs

- booklets, tickets and flyers

- logos of all sorts

- websites

- illustrations for books

And many. many others. Not every image is art and very few illustrators are artists.

SyneRyder19d ago

No idea why you were down voted, I think that's exactly how this will get used.

I'm already imagining this is how the local live indie band night I sometimes go to will generate poster images each week for the bands that are playing, whether to put up at the venue or post to social media. And the bands might be using it to design images to put on their t-shirts and other merch. I already know some indie bands using this stuff for their album covers.

pesus19d ago

He's getting downvoted because none of these supposed "benefits" outweigh the costs.

apsurd19d ago

Downvotes because nobody actually wants this. Those image uses serve a purpose to an external audience. The audience doesn't want this shit.

Now of course I'm being dramatically absolute. I'm sure I already consume these things without knowing it. These things serve a function. Offloading to AI is the implementer admitting they can't be bothered to care whether it serves the function.

Jtarii19d ago

So the benefits are that something that was already being mass produced with no issue is slightly easier to mass produce?

It's not a particularly compelling argument.

kakapo567219d ago

No, the benefits are that something can be mass produced magnitudes faster and easier, which in turn also creates more latitude for creativity and new spaces.

It's a true state-change, which makes the argument pretty compelling IMO.

pesus19d ago

How do these justify the costs to society?

Legend244019d ago

The 'costs to society' are massively overblown, and some of them (automating jobs) are actually benefits to society.

2 more replies

lanthissa19d ago

people pay them to use it, they find that positive

_pdp_19d ago

There are many use-cases outside of spam and slop.

For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.

Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.

JimsonYang19d ago

I a 5’5” male can make myself look taller on dating apps

Short kings on tinder no more!

joegibbs19d ago

The quality of the text is really impressive and I can’t seem to see any artefacts at all. The fake desktop is particularly good: Nano Banana would definitely slip up with at least a few bits of the background.

daemonologist18d ago

There are a couple of AI-esque misspellings - in the More Myth than Menace wolves image, on the right in the "at a glance" section, it reads "wolves aarely approach people," and in the Typography image the text in the top right is "Type connncts us all."

But yeah the quality is remarkable, and rather scary.

wek19d ago

I use Nano Banana all the time and this seems like a step up

vunderba19d ago

OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

m_kos18d ago

Very useful website. Would you have insight into what models are best at editing existing images?

I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.

I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.

vunderba18d ago

OpenAI actually has really good adherence, but occasionally tends to introduce its own almost equivalent of "tone mapping", making hyper-localized edits frustrating.

I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.

The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.

I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.

https://genai-showdown.specr.net/image-editing

m_kos18d ago

Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.

VladVladikoff18d ago

Why does Gemini 3.1 get a pass for the same reasons they got image 2 gets a fail on the flat earth one? Gemini has all sorts of random body parts and limbs etc.

vunderba18d ago

That's a mistake~ None of the models successfully passed the Flat Earth composition test. I've updated the passing criteria to be more explicit as well. Thanks for catching that!

CamperBob218d ago

It'd be interesting if you could add HunyuanImage-3 to the competition. It's better than Z-Image at almost everything I've thrown at it.

It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.

vunderba18d ago

I’ll have to give it another try. Its predecessor, Hunyuan Image 2.0, scored pretty poorly when I tested it last year: 2 out of 15, so it'll be interesting to see how much it has improved.

Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:

https://genai-showdown.specr.net/?models=hy2,g2,zt

Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.

CamperBob218d ago

It does quite a bit better than 2.0, I think. Or at least it may be stylistically different enough to justify a rematch against the others.

Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)

9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)

Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)

Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)

Above are one-shot attempts with seed 42.

1 more reply

what18d ago

Where can I see the actual prompts and follow ups you fed each model?

vunderba18d ago

So the prompts are tuned and adjusted on a per-model basis. If you look at the number of attempts, each receives a specific prompt variation depending on the model. This honestly isn't as much of an issue these days because SOTA models natural language parsing (particularly the multimodal ones) has eliminated a lot of the byzantine syntax requirements of the SD/SDXL days.

The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.

Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.

what18d ago

Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.

1 more reply

ea01619d ago

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

Melatonic19d ago

Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?

vunderba19d ago

It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.

_the_inflator19d ago

Processing power, not training. The larger the scene in 2ď the more you need to compute. The resolution itself is not flexible. Imagine painting a white canvas. It is still a pixel per pixel algo which costs LLM GPU power while being the easiest thing to do without it.

You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.

It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.

nomel19d ago

Need a model trained on closeup/macro shots of everything, to use for upscaling, then run that, as a kernel, over the whole image.

lxgr19d ago

Interesting, I wonder why larger outputs are more expensive than smaller square ones on v2, while it’s the other way around in v1.

throw31082219d ago

Ok, I can hear the sound of entire industries crumbling right now.

Melatonic19d ago

We were afraid it would be Skynet and instead we got the ultimate meme generator !

ceejayoz19d ago

Now we'll just get teabagged by killer robots for the lolz.

louiereederson19d ago

The image of the messy desktop with the ASCII art is so impressive - the text renders, the date is consistent, it actually generated ASCII art in "ChatGPT", etc. I was skeptical that it was cherry-picked but was able to generate something very similar and then edit particular parts on the desktop (i.e. fixing content in the browser window and making the ASCII dog "more dog like"). It's honestly astounding, to me at least.

n2h418d ago

the neofetch for apple logo is messed up, though. the characters rendering that don't exist.

throwaway202719d ago

I know people like to dunk on ChatGPT and Gemini and say Claude is or used to be better, but you can still use worse models when you're out of usage AND make use of Nano Banana and and ChatGPT Image generation with separate limits for your subscription. I think it could make it a more package as a whole for some people (non-programmers). I do like having the option and am excited for which improvements they've done to ChatGPT Image generation because in the past it had this yellow piss filter and 1.5 it sort of fixed it but made things really generic with Nano Banana beating it (altough Gemini also had a too aggressively tuned racial bias which they fixed), it seems the images ChatGPT generates have gotten better.

SV_BubbleTime19d ago

I still see that piss filter on their samples. It isn’t as bad, but someone there really loves it.

Auracle18d ago

That “piss filter” was all the rage among medium and low budget family/wedding photographers for quite a while, and still isn’t uncommon. I doubt it’s just from RLHF.

ieie336619d ago

It's great. Also doesn't seem to have any "slop" standard look, the images it produces are quite diverse.

I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.

ChrisArchitect19d ago

Fake layouts, fake handwritten kid story, fake drunk photos? All from training on real things people did.

As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?

retrac9819d ago

The page keeps crashing on my iPhone 17 Pro.

6thbit19d ago

System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0

direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

dang19d ago

Link added to toptext. Thanks!

samiwami19d ago

do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?

I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.

alextheparrot19d ago

> Integrating an imperceptible, robust, and content-specific watermark

From the system card someone linked elsewhere in the discussion

ai-tamer19d ago

Zhao et al. 2023 showed any imperceptible watermark is provably removable by generative regeneration: pass the image through an img2img or VAE, the model reconstructs it visually identical but starts from a different latent. Watermark gone. SynthID and similar schemes do hold up well against normal sharing: recompression, crops, color tweaks, Twitter's pipeline. That covers most users. But the asymmetry is stuck — normally a GPU and a bit of motivation should be enough to strip it. Right? Got a tool to share? ;-)

Legend244019d ago

I think we are just going to have to accept that realistic images can be easily fabricated now.

Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.

pstuart19d ago

Hopefully the arms race will balance out with improved AI image detection, but I can see how that will never be guaranteed to be reliable.

swingboy19d ago

Maybe a stupid question, but does the SynthID still exist if you screenshot and crop your generated image? What if you screenshot, rotate _just_ a bit, and crop? Or apply some other effect to the image like adjusting the coloring a little bit, adding some blur, etc.

alextheparrot19d ago

The paper they published last year goes over some of these transformations: https://arxiv.org/pdf/2510.09263

Bennettheyn19d ago

fal has the endpoint under openai/gpt-image-2

thevinter19d ago

Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)

lehmacdj19d ago

So do you think there will be a better image model in a year?

throw31082219d ago

I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.

Vachyas19d ago

I'm honestly unsure what could be improved at this point.

Consistency? So it fails less often?

Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")

thevinter19d ago

There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

Especially when it comes to detailed outputs or non-standard prompts.

I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.

3 more replies

RobinL19d ago

I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.

It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right

jinushaun19d ago

Cost? Speed?

minimaxir19d ago

Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2

API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing

...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...

The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.

strongpigeon19d ago

> That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, [...]

I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.

j / k navigate · click thread line to collapse

29 comments

kibibu19d ago

Genuine question: what positive use cases are sufficient to accept the harm from image generators?

One that i can think of:

Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.

On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

chromacity19d ago

How else do you expect me to illustrate my LLM-generated blog posts about AI?

2ndorderthought19d ago

bulletsvshumans19d ago

tdb789319d ago

pesus19d ago

Yeah, it's not "democratization", people were just too lazy to do it before. It only takes some basic effort and a little bit of time to be able to create decent versions of those things.

lossyalgo19d ago

It's definitely not helpful. It's just annoying and disgusting and a waste of resources IMO. But hey at least Powerpoint presentations have AI slop instead of stuff taken from Google Images!?

galleywest20019d ago

Can these people not just create a diagram with their own hands? Literally a pencil and paper.

I am at the point where I would prefer a poorly human drawn diagram with terrible handwriting over AI slop.

twobitshifter19d ago

If you scroll far enough down the linked page, you’ll see they’re knocking off poor handwriting too!

rafael-lua19d ago

It is not the making of the diagram that is the problem, but often the fact I have no idea how to put it visually. AI is awesome at this.

Now, does that justify the harm? Not for me, but this issue is way out of my league.

1 more reply

zbrozek19d ago

I do that. My slide decks these days are hand scribbled.

spijdar19d ago

The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".

TomGarden19d ago

I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.

tills1319d ago

Barbing19d ago

>starts getting crazy

Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)

Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...

Jtarii19d ago

If you want to say the complete destruction of truth is worth it because some people are having "fun" then idk.

joegibbs19d ago

1 more reply

SpicyLemonZest19d ago

1 more reply

nothinkjustai19d ago

Art is for the producer, and if they feel it’s necessary for them to produce it than it’s necessary for them, and what is necessary for the individual extends to the society they’re in.

atleastoptimal19d ago

primax19d ago

Every technological advance in this space has caused harm to someone.

The advent of digital systems harmed artists with developed manual artistic skills.

The availability of cheap paper harmed paper mills hand-crafting paper.

The creation of paper harmed papyrus craftsmen.

The invention of papyrus really probably pissed off those who scraped the hair off thin leather to create vellum.

NathanielK19d ago

Ok, but the models only know what to draw because we fed them images of dementia patients and babies.

Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.

tantalor19d ago

Prototyping. Suppose you have a hard time expressing your vision in words or executing it visually.

1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.

2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"

Important: do not skip the last step.

apsurd19d ago

You audit thousands of genAI prototype candidates?

ticulatedspline19d ago

Is the argument any different replacing the word "image generators" with "photoshop" ?

Uncorrelated19d ago

davebren19d ago

It turns out that effort matters

ndriscoll19d ago

Not much beyond food, water, and shelter is "necessary" for society, but it's nice to have nice things.

Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!

drivebyhooting19d ago

Repetition rather than novelty is good for learning.

ndriscoll19d ago

Sure, and she gets that, but at some point she completely memorizes the stories. She also asks if we can get new books at the store, but they don't make 'em that fast.

mcmcmc19d ago

So the use case is just IP theft so you can get more Paw Patrol?

ndriscoll19d ago

You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.

bsenftner19d ago

LZ_Khan19d ago

Saving money for businesses trying to promote their products?

JumpCrisscross19d ago

> Genuine question: what positive use cases are sufficient to accept the harm from image generators?

Diagrams and maps. So much text-based communication begs for a diagram or a map.

infecto19d ago

Could the same argument not be applied to practically everything and have drastically different perspectives from people?

stackedinserter19d ago

I have plenty for you:

- package design

- pictures for manuals and guides

- navigation and signs

- booklets, tickets and flyers

- logos of all sorts

- websites

- illustrations for books

And many. many others. Not every image is art and very few illustrators are artists.

SyneRyder19d ago

No idea why you were down voted, I think that's exactly how this will get used.

pesus19d ago

He's getting downvoted because none of these supposed "benefits" outweigh the costs.

apsurd19d ago

Downvotes because nobody actually wants this. Those image uses serve a purpose to an external audience. The audience doesn't want this shit.

Jtarii19d ago

So the benefits are that something that was already being mass produced with no issue is slightly easier to mass produce?

It's not a particularly compelling argument.

kakapo567219d ago

No, the benefits are that something can be mass produced magnitudes faster and easier, which in turn also creates more latitude for creativity and new spaces.

It's a true state-change, which makes the argument pretty compelling IMO.

pesus19d ago

How do these justify the costs to society?

Legend244019d ago

The 'costs to society' are massively overblown, and some of them (automating jobs) are actually benefits to society.

2 more replies

lanthissa19d ago

people pay them to use it, they find that positive

_pdp_19d ago

There are many use-cases outside of spam and slop.

For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.

Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.

JimsonYang19d ago

I a 5’5” male can make myself look taller on dating apps

Short kings on tinder no more!

joegibbs19d ago

daemonologist18d ago

But yeah the quality is remarkable, and rather scary.

wek19d ago

I use Nano Banana all the time and this seems like a step up

vunderba19d ago

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

m_kos18d ago

Very useful website. Would you have insight into what models are best at editing existing images?

I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.

I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.

vunderba18d ago

OpenAI actually has really good adherence, but occasionally tends to introduce its own almost equivalent of "tone mapping", making hyper-localized edits frustrating.

https://genai-showdown.specr.net/image-editing

m_kos18d ago

Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.

VladVladikoff18d ago

Why does Gemini 3.1 get a pass for the same reasons they got image 2 gets a fail on the flat earth one? Gemini has all sorts of random body parts and limbs etc.

vunderba18d ago

That's a mistake~ None of the models successfully passed the Flat Earth composition test. I've updated the passing criteria to be more explicit as well. Thanks for catching that!

CamperBob218d ago

It'd be interesting if you could add HunyuanImage-3 to the competition. It's better than Z-Image at almost everything I've thrown at it.

It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.

vunderba18d ago

I’ll have to give it another try. Its predecessor, Hunyuan Image 2.0, scored pretty poorly when I tested it last year: 2 out of 15, so it'll be interesting to see how much it has improved.

Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:

https://genai-showdown.specr.net/?models=hy2,g2,zt

CamperBob218d ago

It does quite a bit better than 2.0, I think. Or at least it may be stylistically different enough to justify a rematch against the others.

Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)

9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)

Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)

Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)

Above are one-shot attempts with seed 42.

1 more reply

what18d ago

Where can I see the actual prompts and follow ups you fed each model?

vunderba18d ago

what18d ago

Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.

1 more reply

ea01619d ago

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

Melatonic19d ago

Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?

vunderba19d ago

It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.

_the_inflator19d ago

You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.

It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.

nomel19d ago

Need a model trained on closeup/macro shots of everything, to use for upscaling, then run that, as a kernel, over the whole image.

lxgr19d ago

Interesting, I wonder why larger outputs are more expensive than smaller square ones on v2, while it’s the other way around in v1.

throw31082219d ago

Ok, I can hear the sound of entire industries crumbling right now.

Melatonic19d ago

We were afraid it would be Skynet and instead we got the ultimate meme generator !

ceejayoz19d ago

Now we'll just get teabagged by killer robots for the lolz.

louiereederson19d ago

n2h418d ago

the neofetch for apple logo is messed up, though. the characters rendering that don't exist.

throwaway202719d ago

SV_BubbleTime19d ago

I still see that piss filter on their samples. It isn’t as bad, but someone there really loves it.

Auracle18d ago

That “piss filter” was all the rage among medium and low budget family/wedding photographers for quite a while, and still isn’t uncommon. I doubt it’s just from RLHF.

ieie336619d ago

It's great. Also doesn't seem to have any "slop" standard look, the images it produces are quite diverse.

I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.

ChrisArchitect19d ago

Fake layouts, fake handwritten kid story, fake drunk photos? All from training on real things people did.

As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?

retrac9819d ago

The page keeps crashing on my iPhone 17 Pro.

6thbit19d ago

System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0

direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

dang19d ago

Link added to toptext. Thanks!

samiwami19d ago

do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?

I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.

alextheparrot19d ago

> Integrating an imperceptible, robust, and content-specific watermark

From the system card someone linked elsewhere in the discussion

ai-tamer19d ago

Legend244019d ago

I think we are just going to have to accept that realistic images can be easily fabricated now.

Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.

pstuart19d ago

Hopefully the arms race will balance out with improved AI image detection, but I can see how that will never be guaranteed to be reliable.

swingboy19d ago

alextheparrot19d ago

The paper they published last year goes over some of these transformations: https://arxiv.org/pdf/2510.09263

Bennettheyn19d ago

fal has the endpoint under openai/gpt-image-2

thevinter19d ago

lehmacdj19d ago

So do you think there will be a better image model in a year?

throw31082219d ago

Vachyas19d ago

I'm honestly unsure what could be improved at this point.

Consistency? So it fails less often?

thevinter19d ago

There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

Especially when it comes to detailed outputs or non-standard prompts.

I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.

3 more replies

RobinL19d ago

It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right

jinushaun19d ago

Cost? Speed?

minimaxir19d ago

Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2

API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing

strongpigeon19d ago

> That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, [...]

I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.

j / k navigate · click thread line to collapse