Howdy, HN. Authors here. We got tired of text-to-image leaderboards that only focus on aesthetics, so we built our own benchmarks to test what matters for real work: fidelity to complex prompts, safety, bias, and IP infringement.
We analyzed 18 models and found that no single model is good at everything. For example, GPT-4o has the best safety guardrails but also a 98% IP infringement rate on celebrity likenesses. Google's Imagen 4 Ultra actively counters bias (e.g., 90% of its "CEOs" are female) but struggles with generating crowds. X AI's Grok 2 blocks almost nothing.
Lots more detail in the post. We'll be here all day to answer questions.