An example of a weird use case outside ML is AD allows you to differentiate through a raytracer/complex program. Maybe your raytracer has a couple parameters for lighting and you have a target image you can to create something as similar to as possible. You could use AD to optimize the lighting parameters. That's one problem that mathematica will be very unlikely to be able to do. For a large application if you want to AD the entire thing either you are using a framework that supports AD everywhere like tensorflow/pytorch or you need language level support like Julia. Pretty few languages have AD at the language level.