Even quite minor OS updates can cause the tests to fail en masse, because of a global OS change to the system font or button design or whatever, which is a shame as that is just the point where you want to see what the OS change actually broke (like one of your windows now comes up offscreen, or your help screen is now always in Icelandic).
With app testing you can't restrict the detection to the window content area, as a bug could for example give the main window the wrong kind of title bar, or make it draw its default title in the wrong localized language, and you would want to detect that.
Then when making UI changes you need a mechanism for marking which comparisons are intended to fail after this change, and will need to be automatically regenerated for the next build. I don't think I've ever worked on a project that got this entirely automated, and it resulted in a lot of work for QA. On a complicated app like a web browser it is a really valuable system though.
I found it didn't even take that. Sometimes indeterminism in the code you wrote or indeterminism in code you can't even control will cause tests to fail en masse.
(disclaimer: I wrote this)
CSS is rife with potential to cause faraway effects. Catching these regressions is very satisfying.
It's also useful to get a survey of all of your UX. Being able to see everything at once has helped us to improve the dark corners of our app/site and see patterns where we can extract into a design system
Running mobile width screenshots has been awesome. Designing/developing for mobile first doesn't always happen and this surfaces areas that need responsive work pretty effectively.
It worked, but it wasn't an ideal solution for a variety of reasons.
Having a tool that can do visual regression testing but doesn't necessarily have to be part of the build pipeline would be cool for designers that might already be doing this manually because, for a great variety of reasons, these types of issues aren't being caught if they don't catch them (been there, done that, "we don't need QA!" they said...). I've seen automated screen capture services before, but having to manually check them for regressions is a pain.
If there's already something like this out there, I'd love to hear about it.