One that i can think of:
- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.
Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.
On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.
Commissioning high quality diagrams from a designer is expensive and I guess it's much cheaper now to essentially commission something but idk, "democratization" still feels weird for just undercutting humans on price.
It's definitely not helpful. It's just annoying and disgusting and a waste of resources IMO. But hey at least Powerpoint presentations have AI slop instead of stuff taken from Google Images!?
I am at the point where I would prefer a poorly human drawn diagram with terrible handwriting over AI slop.
Now, does that justify the harm? Not for me, but this issue is way out of my league.
The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".
I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.
Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)
Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...
The advent of digital systems harmed artists with developed manual artistic skills.
The availability of cheap paper harmed paper mills hand-crafting paper.
The creation of paper harmed papyrus craftsmen.
The invention of papyrus really probably pissed off those who scraped the hair off thin leather to create vellum.
My point is that in line with Jevon's paradox there is always a wave of destruction that occurs with technological transformation, but we almost always end up with more jobs created by the technology in the middle and long term.
Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.
1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.
2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"
Important: do not skip the last step.
If you're the only one in the world with an internal combustion engine, the environmental impact doesn't matter at all. When they're as common as they are now, we should start thinking about large-scale effects.
I'm teaching my 4 year old to read. She likes PAW Patrol, but we've kind of exhausted the simple readers, and she likes novelty. So yesterday I had an LLM create a simple reader at her level with her favorite characters, and then turned each text block into a coloring page for her. We printed it off, she and her younger sister colored it, and we stapled it into her own book.
I could come up with 10 3 word sentences myself of course, but I'm not really able to draw well enough to make a coloring book out of it (in fact she's nearly as good as me), and it also helps me think about a grander idea to turn this into something a little more powerful that can track progress (e.g. which phonemes or sight words are mastered and which to introduce/focus on) and automatically generate things in a more principled way, add my kids into the stories with illustrations that look like them, etc.
Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!
AI aside, if you’ve truly exhausted all the simple readers, maybe she should move on to more advanced books instead of repeating more of the same and gamifying it, which seems a great way to destroy a child’s natural curiosity.
You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.
Diagrams and maps. So much text-based communication begs for a diagram or a map.
- package design
- pictures for manuals and guides
- navigation and signs
- booklets, tickets and flyers
- logos of all sorts
- websites
- illustrations for books
And many. many others. Not every image is art and very few illustrators are artists.
I'm already imagining this is how the local live indie band night I sometimes go to will generate poster images each week for the bands that are playing, whether to put up at the venue or post to social media. And the bands might be using it to design images to put on their t-shirts and other merch. I already know some indie bands using this stuff for their album covers.
Now of course I'm being dramatically absolute. I'm sure I already consume these things without knowing it. These things serve a function. Offloading to AI is the implementer admitting they can't be bothered to care whether it serves the function.
It's not a particularly compelling argument.
It's a true state-change, which makes the argument pretty compelling IMO.
For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.
Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.
Short kings on tinder no more!
/s
But yeah the quality is remarkable, and rather scary.
That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”
I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.
Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.
For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:
https://genai-showdown.specr.net/image-editing?models=nbp3,s...
And here’s the same comparison for generative performance:
https://genai-showdown.specr.net/?models=s4,nbp3,g15
UPDATES:
gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.
Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:
- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.
- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.
- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.
All Models:
https://genai-showdown.specr.net
Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0
I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.
I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.
I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.
The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.
I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.
It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.
Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:
https://genai-showdown.specr.net/?models=hy2,g2,zt
Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.
Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)
9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)
Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)
Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)
Above are one-shot attempts with seed 42.
The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.
Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.
GPT Image 2
Low : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005
Medium : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041
High : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165
GPT Image 1 Low : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016
Medium : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063
High : 1024×1024 $0.167 | 1024×1536 $0.25 | 1536×1024 $0.25You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.
It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.
I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.
As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?
direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...
I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.
From the system card someone linked elsewhere in the discussion
Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.
Consistency? So it fails less often?
Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")
Especially when it comes to detailed outputs or non-standard prompts.
I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.
It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing
...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...
The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.
I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.