User interactions would be modeled as a stream of events. And your functions would take that stream of events and return the rendered video player as it should be based on the given events.
There's two way to do it. One that requires keeping the full history of events, and another which only requires keeping the next one (or next few) and the result of applying the last ones.
I'll start with full history. You'd have something like:
defn button-icon(click-events):
if is-even(count(click-events))
return paused
else
return play
With this, when the program starts, click-events has nothing in it, so when we call button-icon with it, it returns paused, if the user clicks the button, we call button-icon again and now there is one click event in click-events, so we return play. If user clicks again, there are now two click events and so we return paused, and so on.
There is still state in the running program, something is remembering all clicks to the button, but your UI logic is pure.
Ok, now this is inefficient and requires lots of memory. Basically every new user action we recompute everything from the program start, and remember all prior actions. That's why there's the second approach. Instead we will do:
defn button-icon(current-button click-event):
if click-event
if (current-button = paused)
return play
else
return paused
else
return current-button
Now the running program won't remember the list of all click events from the program start, instead it'll remember the last result from the last call to button-icon and it'll pass that last result to button-icon the next time the user takes an action. This can be bootstrapped recursively or using fold.