The convolution is approximated via a form of sampling with additional bookkeeping at each encountered branch. How well that scales for deeply branching programs depends on the probabilities of the branches and the diversity in the output on the resulting paths, the worst case being a program where all branches are equally likely and each path generates an entirely different output (as in a hash function, for example). In practice, we've been dealing with problems involving up to tens of thousands of branches or so.
We've haven't done a direct comparison to MCMC approaches yet, but it's on the Todo list. My intuition is that MCMC will win out for problems where finding "just any" local optimum is not good enough.