> I have to say that I think the GHC approach is a bit of a hack. Why? Because it relies for its correctness on the fact that the compiler never duplicates a redex.
It can be restated as: The world-token-state-monad approach to compiling IO depends on lazy operational semantics to never duplicate redexes.
One consequence is that WTSM makes it more difficult to optimize for space rather than speed. A program optimized for space might resort to repeating work. That is, it might duplicate redexes, rather than store the voluminous results of that work for later sharing.
(If you talk to Oleg Kiselyov he's got HPC war stories about how this form of DRY-ness isn't the right default. That accounts for his present anti-laziness bent.)
Since the Monads chapter in the book was pretty light, I thought I will check out this paper that was referenced. Re-learning the Monad laws from this paper inspired me to look at the implementation details of some basic transformers, which has been very helpful. That said, the paper quickly went over my head after he started discussing denotational semantics!
I sometimes feel learning Haskell is like learning algebra but just that the laws are not obvious to me. I wish there are more materials that walk you through more complex implementation (like 3-level stacked Monads) step by step on how certain things work. I guess once you understand how something works, it is hard to explain to a beginner who doesn't have much of an understanding -- much like riding a bike or swimming.