There is, and people have trained purely optical neural networks:
https://arxiv.org/abs/2208.01623
The real issue is trying to backpropagate those nonlinear optics. You need a second nonlinear optical component that matches the derivative of the first nonlinear optical component. In the paper above, they approximate the derivative by slightly changing the parameters, but that means the training time scales linearly with the number of parameters in each layer.
Note: the authors claim it takes O(sqrt N) time, but they're forgetting that the learning rate mu = o(1/sqrt N) if you want to converge to a minimum:
Loss(theta + dtheta) = Loss(theta) + dtheta * dLoss(theta) + O(dtheta^2)
= Loss(theta) + mu * sqrtN * C (assuming Lipschitz continuous)
==> min(Loss) = mu * sqrtN * C/2