Learning the Integral of a Diffusion Model (opens in new tab)

(sander.ai)

161 pointsbenanne1mo ago23 comments

23 comments

19 comments · 5 top-level

darshanmakwana1mo ago· 8 in thread

This is way outside of my expertise, can anyone given a TL;DR or ai;dr?

Diffusion and flow matching models generate samples by iterative denoising. Iterative denoising means passing input to the neural network, running a forward pass, and taking the output back as input and rerunning the neural network. Often you do this 100 times, which is slow and expensive.

Flow maps / consistency models / shortcut models instead try to learn to compress this iterative work into 1 forward pass. This makes inference 100x faster as you'd only need to run the neural net forward pass once. Beyond speeding up inference, there are other advanced benefits to this, such as improved ability to perform inference-time steering.

Mathematically, learning a flow map corresponds to learning to solve an ordinary differential equation, i.e., learning the time integral of the velocity field. This mathematical foundation provides the basis for various training objectives for learning flow maps, which involve self-referential identities or identities such as the transport equation, which are discussed in the blog post.

Hope that helps! I'm an ML researcher currently researching flow maps.

cshimmin1mo ago

Very helpful! Naïve question (I haven’t had a chance to read TFA at all and diffusion/flow models are not my area of expertise). Doesn’t learning the integral/solution of the diffusion process in a single pass just take us back to like OG generative CNN that we had before diffusion models took over? Surely the answer is “no” but would love to hear your framing as to why.

1 more reply

richard___1mo ago

Why is self-distillation necessary? Why can't they get the ground-truth for "skipping" steps?

darshanmakwana1mo ago

Thanks! this was very helpful

anvuong1mo ago

This provides a high-level overview of diffusion models, you know, the models behind Stable Diffusion, Gemini banana, etc.

I haven't read it carefully but I think it's pretty comprehensive. From SDE to Flow matching formulation, and different perspective of constructing the flow maps, i.e. x-formulation or v-formulation. It also deals with distillation and consistency, which is used to fast sampling.

Overall, it's a good read if you are new to the field.

refulgentis1mo ago

Why not put it into an AI yourself? :) I'd rather we avoided a precedent of asking for it and N people replying with their own favorite AI version. The comments section would end up a ghost town.

Extreme TL;DR: Diffusion models are like getting f(x) by calculating and summing f'(0), f'(1)...f'(x). Flow models are like just calculating f(x).

_doctor_love1mo ago

HN is a place where it's legitimate to ask those kinds of questions. The site has a high concentration of advanced practitioners -- in my experience it is not uncommon for the creator of a technology or deep expert to reply. John Carmack has an account on the site for instance. :)

2 more replies

tekacs1mo ago

We've all seen that AI can give you plausible but incorrect answers. Having an expert read it or use AI on it and interpret and validate it before posting would be most welcome IMO.

2 more replies

oliverx01mo ago· 3 in thread

Does anyone have good resources into a more practical approach toward building diffusion models? I found the book by Rashka for Building an LLM from Scratch really helpful in understanding a lot of concepts behind LLMs, and I am looking for a similar resource for diffusion models

throwaway2194501mo ago

MIT's OCW is usually pretty reliable:

https://www.practical-diffusion.org/lectures/

There is more math-heavy https://diffusion.csail.mit.edu/2026/index.html

oliverx01mo ago

Love this! Thank you

Ifkaluva1mo ago

I really liked Calvin Lou’s tutorial. It’s quite old at this point and probably not up to date, but I felt it was awesome for understanding the concepts

programjames1mo ago· 2 in thread

It is a good post, but is missing the connection to continuous normalizing flows. Diffusion models, flow matching, consistency models are biased approximations of continuous normalizing flows (which themselves have some slight biases, but less). Adversarial losses can somewhat help with bias (e.g. RL, GANs), but training those has issues.

programjames1mo ago

As explanation, something I wrote previously:

The most common approach to modeling continuous distributions is to train a reversible model f that maps it to another continuous distribution P that is already known. The original image can be recovered by tracking the bits needed to encode its latent, as well as the reverse path:

  −log P(f(x)) − log|det ∂f/∂x (x)|

This technique is known as normalizing flows, as usually a normal distribution is chosen for the known distribution. The second term can be a little hard to compute, so diffusion models approximate it by using a stochastic PDE for the mapping. When f is a solution to an ordinary differential equation,

  dx/dt = g(x)

then

  log|det ∂f/∂x (x)| = ∫ Tr(∂g(x)/∂x) dt = ∫ E_{ε∼N(0,I)} [εᵀ ∂g(x)/∂x ε] dt

The last equality is known as Hutchison's estimator. Switching to a stochastic PDE

  dx′ = g(x′)dt + ε(t)dW

and tracking the difference δx = x′ − x, the mean-squared error approximately satisfies

  d(δxᵀδx)/dt = 2δxᵀ ∂g(x)/∂x δx,

which is close to Hutchinson's estimator, but weighted a little strange.

benanneOP1mo ago

I briefly covered that connection in an earlier blog post: https://sander.ai/2023/07/20/perspectives.html#flow ... but it's definitely something that might deserve a longer-form treatment at some point :)

vivzkestrel1mo ago· 1 in thread

- just a headsup

- your links to the slides for deeplearning you did here https://sander.ai/2014/05/29/slides-meetup.html are broken

benanneOP1mo ago

Thanks for pointing this out! I'm not sure why, the files are still on my Dropbox, they must have changed the link format at some point? I've gone ahead and fixed them.

wwarner1mo ago

Haven't finished this but for me it's so refreshing to read some science on deep learning and not just weird predictions.

j / k navigate · click thread line to collapse

23 comments

19 comments · 5 top-level

darshanmakwana1mo ago· 8 in thread

This is way outside of my expertise, can anyone given a TL;DR or ai;dr?

mxwsn1mo ago

Hope that helps! I'm an ML researcher currently researching flow maps.

cshimmin1mo ago

1 more reply

richard___1mo ago

Why is self-distillation necessary? Why can't they get the ground-truth for "skipping" steps?

darshanmakwana1mo ago

Thanks! this was very helpful

anvuong1mo ago

This provides a high-level overview of diffusion models, you know, the models behind Stable Diffusion, Gemini banana, etc.

Overall, it's a good read if you are new to the field.

refulgentis1mo ago

Why not put it into an AI yourself? :) I'd rather we avoided a precedent of asking for it and N people replying with their own favorite AI version. The comments section would end up a ghost town.

Extreme TL;DR: Diffusion models are like getting f(x) by calculating and summing f'(0), f'(1)...f'(x). Flow models are like just calculating f(x).

_doctor_love1mo ago

2 more replies

tekacs1mo ago

We've all seen that AI can give you plausible but incorrect answers. Having an expert read it or use AI on it and interpret and validate it before posting would be most welcome IMO.

2 more replies

oliverx01mo ago· 3 in thread

throwaway2194501mo ago

MIT's OCW is usually pretty reliable:

https://www.practical-diffusion.org/lectures/

There is more math-heavy https://diffusion.csail.mit.edu/2026/index.html

oliverx01mo ago

Love this! Thank you

Ifkaluva1mo ago

I really liked Calvin Lou’s tutorial. It’s quite old at this point and probably not up to date, but I felt it was awesome for understanding the concepts

programjames1mo ago· 2 in thread

programjames1mo ago

As explanation, something I wrote previously:

  −log P(f(x)) − log|det ∂f/∂x (x)|

  dx/dt = g(x)

then

  log|det ∂f/∂x (x)| = ∫ Tr(∂g(x)/∂x) dt = ∫ E_{ε∼N(0,I)} [εᵀ ∂g(x)/∂x ε] dt

The last equality is known as Hutchison's estimator. Switching to a stochastic PDE

  dx′ = g(x′)dt + ε(t)dW

and tracking the difference δx = x′ − x, the mean-squared error approximately satisfies

  d(δxᵀδx)/dt = 2δxᵀ ∂g(x)/∂x δx,

which is close to Hutchinson's estimator, but weighted a little strange.

benanneOP1mo ago

vivzkestrel1mo ago· 1 in thread

- just a headsup

- your links to the slides for deeplearning you did here https://sander.ai/2014/05/29/slides-meetup.html are broken

benanneOP1mo ago

Thanks for pointing this out! I'm not sure why, the files are still on my Dropbox, they must have changed the link format at some point? I've gone ahead and fixed them.

wwarner1mo ago

Haven't finished this but for me it's so refreshing to read some science on deep learning and not just weird predictions.

j / k navigate · click thread line to collapse