undefined | Better HN

0 pointsrndn11y ago0 comments

I don’t understand much of the paper but it looks awesome! I have two questions: Am I understanding it correctly that one would need to convert the internal representation to a textured triangle mesh in order to use ray tracing in the decoder stage? Is the encoder effectively similar to scene reconstruction via structure from motion?

0 comments

3 comments · 1 top-level

tejask11y ago· 2 in thread

there are many ways to parametrize the decoder. One of the ways is to constrain it to output an explicit mesh or volumetric representation and express the rendering pipeline so that it's differentiable. The encoder will then effectively learn an "inference algorithm" to get the best output. A feedforward neural network is not enough and recurrent computations will eventually be necessary.

jwp72911y ago

Can you explain a bit more why the recurrent network structure becomes necessary at some point? Is that because reversing a CNN naturally means rendering by (de)convolution?

tejask11y ago

In order to approximately learn a "real" graphics engine with support for basic physics, just feed-forward computation might not be sufficient. A more natural way to learn graphics/physics might be to learn the temporal structure more explicitly. On the other hand, it might also be interesting to just add temporal convolution-deconvolution structure in the existing model. This is work in progress though.

j / k navigate · click thread line to collapse