undefined | Better HN

0 pointsSmerity9y ago0 comments

When you say "exactly these sorts of dynamic graphs", what do you mean? TensorFlow has support for dynamic length RNN unrolling but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive.

The primary issue is that the computation graph is not imperative - you define it explicitly. Chainer describes this as the difference between "Define-and-Run" frameworks and "Define-by-Run" frameworks[1].

TensorFlow is "Define-and-Run". For loops and conditionals end up needing to be defined and injected into the graph structure before it's run. This means there are "tf.while_loop" operations for example - you can't use a "while" loop as it exists in Python or C++. This makes debugging difficult as the process of defining the computation graph is separate to the usage of it and also restricts the flexibility of the model.

In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning the graph structure is defined on-the-fly via the actual forward computation. This is a far more natural style of programming. If you perform a for loop in Python, you're actually performing a for loop in the graph structure as well.

This has been a large enough issue that, very recently, a team at Google created "TensorFlow Fold"[2], still unreleased and unpublished, that handles dynamic computation graphs. In it they tackle specifically dynamic batching within the tree structured LSTM architecture.

If you compare the best example of recursive neural networks in TensorFlow[3] (quite complex and finicky in the details) to the example that comes with Chainer[4], which is perfectly Pythonic and standard code, it's pretty clear why one might prefer "Define-by-Run" ;)

[1]: http://docs.chainer.org/en/stable/tutorial/basic.html

[2]: https://openreview.net/pdf?id=ryrGawqex

[3]: https://github.com/bogatyy/cs224d/tree/master/assignment3

[4]: https://github.com/pfnet/chainer/blob/master/examples/sentim...

0 comments

PieSquared9y ago

Ah, fair enough, I see your point. An imperative approach (versus TensorFlow's semi-declarative approach) can be easier to specialize to dynamic compute graphs.

I personally think the approach used in TensorFlow is preferable – having a static graph enables a lot of convenient operations, such as storing a fixed graph data structure, shipping models that are independent of code, performing graph transformations. But you're right that it entails a bit more complexity, and that implementing something like recursive neural networks, while totally possible in a neat way, ends up taking a bit more effort. I think that the trade-off is worth it in the long run, and that the design of TensorFlow is very much influenced by the long-run view (at the expense of immediate simplicity...).

The ops underlying TensorFlow's `tf.while_loop` are actually quite flexible, so I imagine you can create a lot of different looping constructs with them, including ones that easily handle recursive neural networks.

Thanks for pointing out a problem that I haven't really thought about before!

kalamaya9y ago

I'm intrigued by pyTorch but I am really having a hard time groking what you mean by the whole "but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive."

Would you mind providing a concrete example to relate to if you dont mind? Again, intrigued by PT so want to learn more about it vs TF...

j / k navigate · click thread line to collapse