https://token-verse.github.io/results/multi_concepts/25.pnghttps://token-verse.github.io/results/multi_concepts/06.png
Both of these show a man's face in a source image being used in a newly generated image. I agree that it isn't complicated, but you seem to be drawing different conclusions to everyone else here.
If your point is that it can't perform face transfer, you seem to be wrong - that's what's happening here. If your point is that the blurred photos used for other parts of the input mean that this suggests the model may get confused by other faces, then that's a fair point, but it seems clear they have demonstrated face transfer, and requiring blurring irrelevant faces seems a minor point compared to transferring the face that's intended. I'm not sure how that would really impact use-cases.