Compare an ordinary pencil (no animations, movement is directly tied to your hand) to a pencil with a pompom on a spring attached to the end. Which is most fun for brief use? Which would you rather write a whole page of text with?
One of the ways to achieve this is to not actually transition between states, but simply animate the "end bounce" of an introduced element, as if it was eased into position. So not actually slid from the left, for example, but rebounding the last few pixels from an imaginary slide. Our eyes just draw their conclusions to inform us of a movement, and in exchange the component is readable and usable immediately.
[1] ~100ms represents optimal reflex time in recent research. [2] Anything that requires user attention to interact after the component appears is very comfortable with a 150ms transition. One important note is that for components you can navigate across (i.e. one key shortcut invokes a modal state, another key runs a command in that modal), experienced users will "type" consecutive shortcuts in one go, and you must have the second behaviour responsive from frame 1.
[2] Some athletes seem to train down to ~80ms on very specific reflexes, which recently lead to race-start controversies when block timers disqualify sub-100ms reactions for runners.
For unpredictable inputs. Intervals between a human own actions or discrepancies in delays between successive external events can be effected or perceived with significantly greater precision, especially for people with e.g. music training, especially for percussionists. I’d bet on somewhere between one and two orders of magnitude more precision, that is single-digit milliseconds, at higher skill levels. (Chopin’s Fantaisie-Impromptu is among the easier rhythm-based parlour tricks and already requires staying below ~30ms of error. Alternatively, a single frame at 60fps is 17ms, and speedrunners can hit single frames of a game pretty reliably.)
That's precisely why a GUI should not be animated. No matter how I operate it, the action I use already has its own animation, e.g. my finger moving on the mouse. If I add another animation in the GUI itself that's double animation. It's equivalent to pressing a button and triggering a servo to automatically press that same button.
Alternatively nothing in reality is animated and everything instantaneously snaps between states.
Consider a toolbar with a mix of enabled and disabled buttons. Hover effects (which I would consider animations) convey that something is clickable, and on-click effects confirm an action. These effects convey meaningful information to both beginner users and power users of any software, and are in no way inconvenient to either group.
I generally agree animations tend to get in the way when you want to get shit done, but the idea that animations are only applicable as artistic effects rings untrue to me.