One thing to consider since you're using a count+total model is that the most interesting timings will often be the 90th or 99th percentile, so by calculating averages you might be missing useful information.
I ran into some issues with the implementation after switching to using an async framework since the code was no longer a series of nested function calls. Since the current best practice is coroutines where this will still work I think it's okay, but you should consider how someone using callbacks might time their code. In my case I was in a hurry so I manually called the equivalent of your __enter__ and __exit__, but it was pretty ugly and left a lot of room for bugs.
I haven't thought about callback code at all... if you have any ideas or sample code to share, it would be great if you could create a github issue for it.
Thanks!
The only downside I see, it does record function name, but doesn't record module name (and, for class members, classname). For example, it wouldn't be too useful to see "authorize" instead of "ppp.common.authorize" in RADIUS server profiling logs. :)