undefined | Better HN

0 pointscosmic_ape7y ago0 comments

>>How do neural nets approximate a Bayesian posterior?

Not sure what GP had in mind, but if a feature x appears in a dataset n times, with pn times with positive label, and (1-p)n times with negative, and your classifier is f(x) which is trained with the "cross-entropy" cost, then the ideal value, that minimizes the cost should be f(x) = p. In this sense, f(x) is the probability of positive given feature.

Whether neural nets really realize this and how reliable that is, is another question. But that's the intention of the cross entropy cost.

0 comments

2 comments · 1 top-level

beta_binomial7y ago· 1 in thread

This does not make any sense to me and neither did OP's comment about NN's approximating the posterior. In fact, if p were the solution then that would simply be the maximum likelihood estimate, which would not include the p(theta), or the prior, and hence would not be Bayesian.

cosmic_apeOP7y ago

Well, p definitely is the solution in the case I mentioned. It is indeed the maximum likelihood solution. You could incorporate prior info about theta via a regularization term, if so inclined. What does not make sense in this?

Not sure what the OP meant, but I though it might be useful to mention how estimators may be interpreted as anything probabilistic at all. Often, arbitrary numbers between 0 and 1 are termed "probabilities", but in this case there actually is some proportion or probability to which f(x) should ideally correspond.

j / k navigate · click thread line to collapse