Not sure what GP had in mind, but if a feature x appears in a dataset n times, with pn times with positive label, and (1-p)n times with negative, and your classifier is f(x) which is trained with the "cross-entropy" cost, then the ideal value, that minimizes the cost should be f(x) = p. In this sense, f(x) is the probability of positive given feature.
Whether neural nets really realize this and how reliable that is, is another question. But that's the intention of the cross entropy cost.