The article talks about sampling from EBMs using Langevin Dynamics, but it appears to be identical to Bayesian sampling with Langevin dynamics, which has been fairly popular for a few years. Some of the other stuff is just focused on minimizing the EBM, but then that's just identical to MAP/frequentist estimates.
Also, they gloss over a lot of problems that Langevin dynamics has. Unlike what they claim, it is not at all good at finding nodes separated by low-probability regions, since it has to take increasingly small steps to maintain asymtotic correctness.
Yes, this is precisely what MCMC methods do as well. Every posterior distribution is a Boltzmann distribution for some energy function.
>However, since we initialize chains with a random prior distribution, each individual chain is individually likely to hit any mode so all modes are likely to be explored.
This is also a pretty standard technique in MCMC. But most high-dimension Bayesian models have a huge amount of modes that that cannot be explored in a reasonable number of samples/chains.