story
Haskell isn't necessarily that language, partly because it still requires centralized coordination of development of these "extensions" to ensure they're interoperable - that is, there is only one parser for the language and its supported extensions, and many of them are build into the compiler rather than added as libraries, except those extensions which are done through quasi-quotation, such as with MetaHaskell, or some EDSL. Even that has it's own problems, and you'll have issues parsing if your quoted language happens to have delimiters which conflict with Haskell's quasiquoting delimiters `[| |]` - producing syntax which cannot be parsed unambiguously (perhaps very rare or unlikely though).
Perhaps the biggest hurdle of having a modular language is that we do not understand how unambiguously parse the combination of two or more syntaxes. We only know that composition of two CFGs results in another CFG, but with no guarantee of unambiguity, and other parsers such as PEG rely on ordered choice, where the computer can't decide which choice you really want.
What makes lisps great for composition of languages (or "EDSLs" in market speak), is that it bypasses the parsing problem by asking you to just write your language directly in terms of the syntax tree which a parser would generate - and perhaps use macros or other functions to simplify the use of that tree. Instead of a language being vocabulary+syntax, we create new vocabulary for what would be done through syntax in other languages - and we can thus refer to it unambiguously. Similar can be done in haskell too, through regular functions and quotation.
The parse problem is only really a problem because we're stuck with this silly model of "sequential text files" to describe code, and we're required to limit our languages such that a parser can take one of these text files and make sense of it. When we break out of this model, and use intelligent editors, we can reach the point where syntaxes can be composed arbitrarily, because we can indicate where each new syntax begins and ends. Diekmann and Tratt have demonstrated how this can be done while still appearing much like traditional text editing, which they call Language Boxes.[1][2]
Language Boxes only provide the means to compose syntaxes, but handling the semantic composition of languages is left to the authors of the languages which are being composed. Haskell is perhaps a good choice of language for providing the kind of glue needed here, where we can decide where languages can be composed based on the types returned by their parsers.
[1]:https://www.youtube.com/watch?v=LMzrTb22Ot8, [2]:http://lukasdiekmann.com/pubs/diekmann_tratt__parsing_compos...