In that case, please "educate" me on how to understand the quality of a prediction (was it accurate or not), of a binary event (say a Presidential candidate winning election), when both outcomes are both "probable" to a non-zero degree.
I'm eagerly looking forward to my "education".
On a broader scale, what you are asking about is called a "Scoring Rule". Wikipedia, as usual, provides an overview[2]. You can take the mean of a forecaster's score, which allows you to compare forecaster methodologies.
I'm not sure why your comment needed to be so aggressive.
Note that this is the easy part: a sports model that predicted a 50% win for the first team, a weather model that predicted historical averages, or a language model that predicted letter frequencies, would have near-perfect calibration, but would at the same time be pretty useless. The other part is discrimination: how educated your guesses are. That is not so simple to quantify, although the 538 articles above mention some of the possible measures.
That's it for what an uncertain binary prediction means; but why do we want one? Well, if you're betting (literally or figuratively) on an outcome, it probably makes a difference to you whether the "losing" possibility will come up 1% of the time or 40% of the time; but that does not seem that easy to formalize and may feel unsatisfactory.
In that case, here's a formal result.
An always-certain prediction service is obviously equivalent to a deterministic decision rule, which churns some data about the situation and says yes or no based on that. (They are the same thing.) An uncertain prediction service is (less obviously) equivalent to a randomized decision rule, which churns some data about the situation, tosses some (known) coins, and says yes or no based on both. (Take the service's result, output yes or no with the probabilities it gave.) Of course there's always a probabilistic decision rule that performs at least as well as any given deterministic one (run the deterministic rule, choose not to toss any coins, output its result).
It turns out (see e.g. the introduction[3] to Chentsov's monograph[4]) there can be randomized decision rules that are strictly better on average than any possible deterministic rule.
[1] https://fivethirtyeight.com/features/when-we-say-70-percent-...
[2] https://projects.fivethirtyeight.com/checking-our-work/
[3] https://books.google.lv/books?id=iqMluWtSFdoC&pg=PA8
[4] https://openlibrary.org/books/OL26831120M/, https://bookstore.ams.org/mmono-53
Thank you. I thought I was going slightly insane having someone argue otherwise.
I understand you could evaluate quality over many deterministic predictions, but I'd also presume that those have to all be similar in nature (i.e. all be election outcomes) otherwise there are too many confounding factors if you try and evaluate quality over vastly different prediction calculations.