That's not strictly speaking true. You also calculate the gradient of the input. Otherwise, you wouldn't be able to backpropagate over more than one layer. It's just useful because you can do both easily, where with forward mode getting derivatives with respect to weights is considerably more expensive.