Differential programming is not tied to functional programming. The Python library PyTorch enables differential programming since you can use almost arbitrary python code and differentiate it, including if-then control flow, allowing you to use gradient descent to optimize the parameters of your model.
Traditionally deep learning just meant a sequence of functions applied compositionally (hence the "deep") where each function (termed a layer) is a matrix multiply followed by some well-behaved non-linear function ("activation function"). These were differentiable by design and optimized using gradient descent.
But now the models we want to build are more complex structurally than this merely sequential composition of functions. We want to be able to use control flow, accept multiple inputs, return multiple outputs, etc but we still want the model to be differentiable so we can use an iterative optimization procedure like gradient descent. So this extension from what deep learning traditionally meant (a fairly restrictive class of sequential function compositions) to complex, branching models are now termed differentiable programs.