Adobe is lying. They are relying on general ignorance about the technology to get away with it.
Adobe has not shown how they train the text encoders in Firefly, or what images were used for the text-based conditioning (i.e. "text to image") part of their image generation model. They are almost certainly using CLIP or T5, which are trained on LAION2b, an image dataset with the very problems they are trying to address, C4 (a text dataset similarly encumbered) and similar.
bUt nO oNe eLsE hAs bRoUgHt tHiS uP. It's so arcane for non-practitioners. Talk about this directly with someone like Astropulse, who monetizes a Stable Diffusion model: no confusion, totally agrees with me. By comparison, I've pinged the Ars Technica journalist who just wrote about this issue: crickets. Posted to the Adobe forum: crickets. E-mailed them on their specific address for this: crickets. I have no idea why something so obvious has slipped by everyone's radar!