I feel like it’s the opposite: the copy-paste issue is solvable, you just need to equip the model with the right tools and make sure they are trained on tasks where that’s unambiguously the right thing to do (for example, cases were copying code “by hand” would be extremely error prone -> leads to lower reward on average).
On the other hand, teaching the model to be unsure and ask questions, requires the training loop to break and bring a human input in, which appears more difficult to scale.