Can you post a link to your code with some synthetic data of the sizes you’re talking about to demonstrate this? I hear it as a criticism a lot, but have never found it to be true (full disclosure: I work on a large-scale production system that uses pymc for huge Bayesian logistic regression and huge hierarchical models, both in GPU mode out of necessity).
> “Both TF and Theano require static graph while PyTorch lets you use Python’s regular control flows (if, for, while, etc). This makes building modular model components much easier, since you can reason about execution mostly as if it’s normal numerical Python code.”
I can’t tell if you’ve looked into pymc or not based on this (or Keras either for that matter), since in pymc, GPU mode is just a Theano setting, you don’t actually write any Theano code, manipulate any graphs or sessions directly, or anything else. You just call pm.sample with the appropriate mode settings at it is executed on the GPU.
Much like with Keras, where you can also easily use Python native control flow, context managers and so on, pymc doesn’t require low-level usage of underlying computation graph abstractions.
Again, I really like PyTorch too, but people just seem to have only ever tried PyTorch, liked one or two things about it, forgive the parts that are bad about it (like needing to explicitly write a wrapper for the backwards calculation for custom layers, which you don’t need to do in Keras for example), and generalize to criticize other tools.