a and b don't actually matter, because what matters is how the function is called. If given some `f :: Monad m => m -> m`, then you can only call f with IO as the monad from an already impure context. For example, you can't call f with IO from inside a function (g :: a -> b).
The purpose of IO in Haskell is to explicitly mark side effects, because they cannot be arbitrarily composed in the way pure functions can. IO represents a one-way boundary in that you can turn some pure computation into an impure one (a -> IO a), but there is no way of "extracting" that computation back from IO (i.e, there exist no (IO a -> a)). That "monads" are used to do this is useful because they provide the (a -> IO a), and happen to have a convenient function for chaining computations (IO a -> (a -> IO b) -> IO b).
How IO is defined is up to the implementation, and not in the scope of the language - different implementations could use a different representation for IO - what matters is that it must be defined in a way that one cannot define an (IO a -> a).
On "a Haskell program is a giant expression that reduces to an object of type IO", this is really nothing to do with Haskell, but a consequence of how we've built our operating systems and how the defacto meaning of "program" these days is equivalent to an "executable file". Traditionally "program" was much more abstract and could refer simply to any piece of code, such as a (pure) haskell function. We can consider any Haskell function to be a "program" in itself. If we had an environment from which we launched processes which provided all the command line switches and environment variables as arguments, we could easily omit the "IO" from our "main", if the rest of our code was pure. (On a side note, this is precisely what early versions of Haskell did, before monadic IO became practical)