One of the differences is a dynamic learning rate guided by approximation of the local curvature.
No comments yet.