Dynamic computation graphs arise whenever the amount of work that needs to be done is variable. This may be when we're processing text, one example being a few words while another being paragraphs of text, or when we are performing operations against a tree structure of variable size. This problem is particularly prominent in particular subfields, such as natural language processing, where I spend most of my time.
PyTorch tackles this very well, as do Chainer[1] and DyNet[2]. Indeed, PyTorch construction was directly informed from Chainer[3], though re-architected and designed to be even faster still. I have seen all of these receive renewed interest in recent months, particularly amongst many researchers performing cutting edge research in the domain. When you're working with new architectures, you want the most flexibility possible, and these frameworks allow for that.
As a counterpoint, TensorFlow does not handle these dynamic graph cases well at all. There are some primitive dynamic constructs but they're not flexible and usually quite limiting. In the near future there are plans to allow TensorFlow to become more dynamic, but adding it in after the fact is going to be a challenge, especially to do efficiently.
Disclosure: My team at Salesforce Research use Chainer extensively and my colleague James Bradbury was a contributor to PyTorch whilst it was in stealth mode. We're planning to transition from Chainer to PyTorch for future work.
[1]: http://chainer.org/
[2]: https://github.com/clab/dynet
[3]: https://twitter.com/jekbradbury/status/821786330459836416
The primary issue is that the computation graph is not imperative - you define it explicitly. Chainer describes this as the difference between "Define-and-Run" frameworks and "Define-by-Run" frameworks[1].
TensorFlow is "Define-and-Run". For loops and conditionals end up needing to be defined and injected into the graph structure before it's run. This means there are "tf.while_loop" operations for example - you can't use a "while" loop as it exists in Python or C++. This makes debugging difficult as the process of defining the computation graph is separate to the usage of it and also restricts the flexibility of the model.
In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning the graph structure is defined on-the-fly via the actual forward computation. This is a far more natural style of programming. If you perform a for loop in Python, you're actually performing a for loop in the graph structure as well.
This has been a large enough issue that, very recently, a team at Google created "TensorFlow Fold"[2], still unreleased and unpublished, that handles dynamic computation graphs. In it they tackle specifically dynamic batching within the tree structured LSTM architecture.
If you compare the best example of recursive neural networks in TensorFlow[3] (quite complex and finicky in the details) to the example that comes with Chainer[4], which is perfectly Pythonic and standard code, it's pretty clear why one might prefer "Define-by-Run" ;)
[1]: http://docs.chainer.org/en/stable/tutorial/basic.html
[2]: https://openreview.net/pdf?id=ryrGawqex
[3]: https://github.com/bogatyy/cs224d/tree/master/assignment3
[4]: https://github.com/pfnet/chainer/blob/master/examples/sentim...
In fact, with DyNet or PyTorth, you still need to bookkeeping the graph you traversed (tape) because no one is doing forward AD. If that's the case, why not have a good library to do symbolic computation graph and build dynamic feature on top of it. (I am not saying Tensorflow is a good symbolic computation graph library to build upon just arguing that start with a define-compile-run library doesn't necessarily hinder your ability to support dynamic graphs).
This won't be the same for TensorFlow as it was written with the concept of a static computation graph at its core. I'm certainly not saying it's impossible to re-architect - and many smart people in the community and at Google are devoting thinking and code to it - but simply that the process will be far more painful as it was not written with this as an intended purpose.
To note - there are many advantages to static computation graphs. Of particular interest to Google is that they distribute their computations very effectively over large amounts of hardware. Being able to do this with a dynamic computation graph would be far more problematic.
* Facebook * Twitter * NVIDIA * SalesForce * ParisTech * CMU * Digital Reasoning * INRIA * ENS
The maintainers work at Facebook AI Research
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Notably absent is the otherwise Facebook-typical PATENTS license thing. Which I see as a good sign.
Also, it doesn't look like this has happened just now? PRs in the repo go back a couple months and the repo has 100+ contributors.
The C libraries are shared among the Lua and Python variants
I am wondering if CUDA is mandatory for torch installation ? I use a Macbook air which doesn't have graphics card, so not sure if torch can be installed and used on my machine.
I use it more and more for hobby projects. Combine it with LuaJIT (which torch uses) and you have the fastest interpreted language around. Give it a try.
[1] http://opensource.stackexchange.com/questions/2121/mit-licen...
Pretty much no way to use neural networks (except for playing, like above) without writing code.
[1] - http://kur.deepgram.com/
Lua is less used than Python in the scientific community, and a lot of the most innovative machine learning researchers already work with C++ and Python. Using yet another language with only marginal benefit increases cognitive load and drains from the researcher's mental innovation budget, forcing the researcher to learn the ins and outs of Lua rather than working on innovative machine learning solutions.
Lua is a nice language. Python 3 is a nice language and there are many new exciting features and development styles (hello async programming?) in the making which will prevent a monoculture from forming in the near term.
The Python that you write when using these frameworks just the glue code / scripts. All you're doing is calling the framework's functions. Most of it gets thrown away (as researchers). The stuff that doesn't is self-contained and usually short. You're not writing 100k+ line codebases.
Lua may be faster for certain tasks (data processing), but the time it takes for does tasks is usually a rounding error in deep learning. Not to mention you can still code in C/C++ with pytorch.
If there is a monoculture in machine learning, it would be the deep learning monoculture.
If only Mike Pall created a transpiler infrastructure layer on top of LuaJIT.
Just a personal (anti-)preference I guess