M a
Then add a way to put things in the container. a -> M a
Then add a way to use the thing in the container. M a -> (a -> M b) -> M b(A nearly-exact parallel can be seen in the Iterator interface. You can describe it as "a thing that walks through a container presenting the items in order"... and yeah, that's the majority use case and where the idea came from... but it's also wrong. What it really is is just "a thing that presents items in some order". It doesn't have to be from "a container". You can have an iterator that produces integers in order, or strings in lexigraphic order, or yields bytes from a socket as they come in, or other things that have no "container" anywhere to be found. If you have "from a container" in your mental model then those things are confusing; if you understand it simply as "presenting items in order" then having an iterator that just yields integers makes perfect sense. A lot of the Monad confusion comes from adding extra clauses to what it is. Though by no means all of it.)
The "aha" realization that the "container" can be an ephemeral concept and not resident at run time can come later.
FWIW, I think of IO as a container: it contains the risk of side-effects within. All the examples you gave are containers in their own way.
Abstract computer science doesn't.
Part of why Haskell appears like such an implacable curmudgeon is the predilection of its community to believe that users must grasp type and logic theory to use it.
They don't.
Just like they don't need to have a mental model of their computer to write software for it.