Why is it inference only? At least the operations are the same...just a bunch of linear algebra
Inference also prefers different IO patterns, because you don't need to keep the activations for every layer ready for backpropogation.