Difficulty of scaling is not the only issue. Nobody is going to be particularly invested in scaling an architecture that has:
- consistently proven behind their auto-regressive counterparts in quality. Look at the dgemma benchmarks - pretty steep dropoffs and the more difficult the benchmark the worse the dropoff. That's not a good look and it's not like its some artifact of google's release. Every dllm is like this.
- And whose inference benefits are negated at scale. Transformers are still cheaper if you want to serve lots of users.
>"DiffusionGemma's speedup is designed for local and low-concurrency inference. In high-QPS cloud serving, autoregressive models can be deployed to saturate compute efficiently, so DiffusionGemma's parallel decoding offers diminishing returns and can result in higher serving costs"
Put yourself in the shoes of all the labs, even open source ones. Why would you put much effort into this ?