undefined | Better HN

0 pointsrefurb3y ago0 comments

Who is "we"?

In that case, please "educate" me on how to understand the quality of a prediction (was it accurate or not), of a binary event (say a Presidential candidate winning election), when both outcomes are both "probable" to a non-zero degree.

I'm eagerly looking forward to my "education".

0 comments

3 comments · 2 top-level

mananaysiempre3y ago· 1 in thread

A single prediction? You can't. Multiple predictions? Compare the predicted and the actual frequencies, that is, check whether, for example, (roughly) a third of the events that were predicted to happen with 33% certainty to happen actually happened. It is as bad to be not confident enough as it is to be too confident. The standard term for this is calibration[1,2].

Note that this is the easy part: a sports model that predicted a 50% win for the first team, a weather model that predicted historical averages, or a language model that predicted letter frequencies, would have near-perfect calibration, but would at the same time be pretty useless. The other part is discrimination: how educated your guesses are. That is not so simple to quantify, although the 538 articles above mention some of the possible measures.

That's it for what an uncertain binary prediction means; but why do we want one? Well, if you're betting (literally or figuratively) on an outcome, it probably makes a difference to you whether the "losing" possibility will come up 1% of the time or 40% of the time; but that does not seem that easy to formalize and may feel unsatisfactory.

In that case, here's a formal result.

An always-certain prediction service is obviously equivalent to a deterministic decision rule, which churns some data about the situation and says yes or no based on that. (They are the same thing.) An uncertain prediction service is (less obviously) equivalent to a randomized decision rule, which churns some data about the situation, tosses some (known) coins, and says yes or no based on both. (Take the service's result, output yes or no with the probabilities it gave.) Of course there's always a probabilistic decision rule that performs at least as well as any given deterministic one (run the deterministic rule, choose not to toss any coins, output its result).

It turns out (see e.g. the introduction[3] to Chentsov's monograph[4]) there can be randomized decision rules that are strictly better on average than any possible deterministic rule.

[1] https://fivethirtyeight.com/features/when-we-say-70-percent-...

[2] https://projects.fivethirtyeight.com/checking-our-work/

[3] https://books.google.lv/books?id=iqMluWtSFdoC&pg=PA8

[4] https://openlibrary.org/books/OL26831120M/, https://bookstore.ams.org/mmono-53

refurbOP3y ago

> A single prediction? You can't.

Thank you. I thought I was going slightly insane having someone argue otherwise.

I understand you could evaluate quality over many deterministic predictions, but I'd also presume that those have to all be similar in nature (i.e. all be election outcomes) otherwise there are too many confounding factors if you try and evaluate quality over vastly different prediction calculations.

wasabi9910113y ago

For 538 specifically, can start by reading their own evaluations, as written in their helpfully titled "How Good Are FiveThirtyEight Forecasts?"[1]. You can then look up keywords describing their methodology to understand strengths/limitations of those methods.

On a broader scale, what you are asking about is called a "Scoring Rule". Wikipedia, as usual, provides an overview[2]. You can take the mean of a forecaster's score, which allows you to compare forecaster methodologies.

I'm not sure why your comment needed to be so aggressive.

[1] https://projects.fivethirtyeight.com/checking-our-work/

[2] https://en.m.wikipedia.org/wiki/Scoring_rule

j / k navigate · click thread line to collapse