It sounds like you already know this, but many people, including myself at one time, think of tail call optimization as a trick for not blowing up the stack when writing recursive functions. However, it's much more general. Tail call optimization doesn't have to be recursive, it can be applied any time the final statement or expression of a function is another function call. It's something like a special form of inlining. And as you say, it's actually possible to apply the optimization in limited cases where the tail called function returns a value![2]
That general optimization is very useful for implementing threaded[1] or continuation passing style VMs since it compiles function calls and all their associated baggage down to a jump and maybe some assignments.
[1] Forth style, not multithreading.