Block Diffusion: Interpolating Autoregressive and Diffusion Language Models (opens in new tab)

(m-arriola.com)

72 pointst551y ago16 comments

16 comments

13 comments · 4 top-level

holoduke1y ago· 7 in thread

Whenever I try to read and understand this paper, I feel extremely dumb. I have my degree in CS, but this is just too complex for me to understand.

evertedsphere1y ago

an undergraduate degree in a field is not enough to understand recent research in a specialised subfield of a subfield and you shouldn't beat yourself up over that

there's nothing wrong with you, you just need the right background and you can go get that. see e.g. the fast.ai course

smrtinsert1y ago

Do you mean the fast.ai stable diffusion lectures? The initial series doesn't get too deep at all from what I remember.

tippytippytango1y ago

I wouldn’t beat yourself up over it. Very few papers can be understood without reading a significant amount of the neighboring literature and the history of how that work came to be. There are norms and customs and a kind of academic language in every community that you won’t be able to see unless you’ve read a lot from that community. Even if you have the right math level it’s tricky.

A single paper is part of a conversation, not something that stands alone. Trying to read one random paper is like finding a 1000 page thread on an obscure topic that has been running for 10+ years and reading only the last page. It won’t make any sense without reading back a ways.

IncreasePosts1y ago

Might want to study some stats or other math.

nh23423fefe1y ago

depth first read the references until the leaves are obvious!

AlexCoventry1y ago

Ask ChatGPT o3 about anything you don't understand, ask it about anything in its responses you don't understand. Keep drilling down until you do understand. Takes patience, but you can learn a lot very fast, this way.

echelon1y ago

ChatGPT o3 understands the latest literature and isn't going to hallucinate weird details or make incorrect analogies or math?

I'd worry about learning the wrong things.

3 more replies

blurbleblurble1y ago· 2 in thread

Wow.

I can't wait to see ideas from the diffusion image generation world (like controlnet) work their way into language models.

soulofmischief1y ago

I've built diffusion based text models, it's old hat and not necessarily the most performant way to generate text. However it does produce interesting results and I'd love to test some ideas at scale.

joejoo1y ago

There’s already a few models that are diffusion based.

notrealyme1231y ago

This was posted here already a few weeks ago.

gitroom1y ago

Yeah I always end up lost in papers like this too, even with my CS degree, the research keeps leveling up nonstop.

j / k navigate · click thread line to collapse