(And yes, I'm also driven to rage by slow-fade animations. A practice I can date back to Microsoft's Clippy, which, when you punched it in the fact to go away, had just one more gratuitous animation just to twist the knife that just more.)
To reiterate: the primary goal seems to be slowing down bots.
Not necessarily, contrast adds detail and mistakes are expensive, so bots too are incentivized to wait for the final picture (this assuming that network communications aren't monitored to get the incoming image out of the request).
Also clicking on that image too early is a good signal that it's a bot.
Unless Google is literally streaming in the image frame-by-frame, I'll admit I haven't looked into the details but this doesn't seem likely as it's pretty complicated compared to just using an image.
... it really doesn't make it that much more expensive for bots, it's just a short delay. In fact, I doubt it makes a difference at all.
But it makes things really annoying for humans.
So I don't see any advantage in that trade-off.