Implicit Generation and Generalization Methods for Energy-Based Models (opens in new tab)

(openai.com)

10 pointsgdb7y ago4 comments

4 comments

3 comments · 1 top-level

lenticular7y ago· 2 in thread

I'm not really clear on the difference between Bayesian methods and EBMs. EBMs associate an "energy" to each point the the parameter space, but this energy is just the log-partition function of some distribution. Sampling schemes for such models are just MCMC methods, which are already a long-established genre of Bayesian techniques.

The article talks about sampling from EBMs using Langevin Dynamics, but it appears to be identical to Bayesian sampling with Langevin dynamics, which has been fairly popular for a few years. Some of the other stuff is just focused on minimizing the EBM, but then that's just identical to MAP/frequentist estimates.

Also, they gloss over a lot of problems that Langevin dynamics has. Unlike what they claim, it is not at all good at finding nodes separated by low-probability regions, since it has to take increasingly small steps to maintain asymtotic correctness.

yilundu7y ago

EBMs actually associate an "energy" to each point of the input distribution which then defines a probability distribution through the Boltzmann Distribution. It's true that Langevin dynamics get stuck at low-probability modes and it would be worth drying with an adaptive version of HMC. However, since we initialize chains with a random prior distribution, each individual chain is individually likely to hit any mode so all modes are likely to be explored.

lenticular7y ago

>EBMs actually associate an "energy" to each point of the input distribution which then defines a probability distribution through the Boltzmann Distribution.

Yes, this is precisely what MCMC methods do as well. Every posterior distribution is a Boltzmann distribution for some energy function.

>However, since we initialize chains with a random prior distribution, each individual chain is individually likely to hit any mode so all modes are likely to be explored.

This is also a pretty standard technique in MCMC. But most high-dimension Bayesian models have a huge amount of modes that that cannot be explored in a reasonable number of samples/chains.

1 more reply

j / k navigate · click thread line to collapse