It is undecided/unclear what is legal or not, I am not an american so I am not the one that should explain this.
But Microsoft trained their coding AI with GPL software but not with proprietary software impaling that they can risc screwing Open Source but not the paying GitHub customers or not sure why they did not used their own internal proprietary code.
Open AI trained with the entire inernet, books, ignored any license copyright cponcerns.
Stability.ai also trained with lot of stuff.
Adobe trained only with stuff they have licenses.
So as we can see until an USA judge decides how to interpret the existing laws then stuff are not sure, some companies take the risks other take less risks, others no risk.
But if USA some other judge in a superior court my decide something else, so IMO USA also needs to take a decision and not let FUD (deserved or not) about AI to spread.
I seen many do not know that the issue ChatGPT had in Italy was about a data leak. There are laws on what you do when this happens, probably are similar laws in USA , maybe in USA is harder to start a investigation or maybe there were some lobbying involved.