In my opinion it is the number of requested features shipped minus the number of bugs introduced. Weighted by the importance of each, as collectively decided on by everyone or client.
Combine that with a code review process that will surface commits with excessive or insufficient tests and it was one of a couple ways I validated my feel for the work of my direct reports when it came time to fill out reviews.
Profit? The trouble is it is so far removed from the day to day work we do it's almost impossible to draw any direct conclusions. So instead we start trying to use proxies like function points, bugs or lines of code. In the most dysfunctional organisations worse metrics get used like time keeping a seat warm or political skills.
If your users have the confidence that they can walk up and get any feature they need implemented with you, that's about the best metric of success I can think of.
Of course, it still has problems. The metric probably has to be calculated long before the number of serious bugs is even known. On the other hand, good developers anticipate needs; there may be features implemented that the users haven't even realized they wanted yet. And of course, it's hard to come up with numerical weights for the features and bugs.
But the worst problem with this metric is that it doesn't count the maintainability of the code. That's an even harder thing to measure, of course.