I think we really need more development of format-specific diff and merge tools. A lot of binary formats could absolutely be diffed or merged, but you'd need algorithms and maybe UIs specific to that format - there is no "generic" algorithm like for text-based files. (And even there, generic line-wise diffing if often more "good enough" than really good)
I think it would be awesome if we could get "diff/merge servers" analogous to the "language servers" for IDEs some day.
https://github.com/ewanmellor/git-diff-image/blob/master/REA...
The alternative of preventing complex merge situations in the first place through file locking is low-tech, easy to implement, and automatically works on all current and future file formats.
The problem was that the scene information was fundamentally visual (assets arranged in 3D space) so even a diffable text format wouldn't help you much. On the other hand, scenes are large enough that you often would want to work on them in parallel with other people.
I believe their first solution to that was the Asset Server that supported locking. But that still doesn't give two people the ability to work on a scene concurrently.
Eventually, some users went and developed a custom diff/merge tool to solve the problem.
https://discussions.unity.com/t/scene-diff-ease-your-sufferi...
I've never done exactly that but I have occasionally decided how information will be represented in a data file with merging in mind.