>> the uncertainty in the number of trials > Has no meaning to me.
What the author is trying to get at in the admittedly poorly worded question is that the trials are noisy measures of an underlying effect. Your job is to sort by effect size, while accounting for the random chance that a low sample size trial just got unlucky.
You might argue that the question is much harder than the author assumes, since your best guess at the actual effect size seems like it should still just be the success rate, even if the low sample size trials have wider error bars. You'd need to come up with some sort of heuristic that says why 7/9 deserves a lower rank than 50/70 using binomial confidence intervals.
Probably that heuristic is intended to be a bayesian approach? Like, if you add just two successes and two failures to each scenario as a prior, thats enough to put the 50/70 option ahead.
The essence of my comment was that this text/test is not for me (one person of the general public) but more like a few leetcode-style questions for statisticians.
Your attempt to explain what I didn't understand just proves my point as I don't really understand what you are saying either.
And that's ok: this is just not for me! (And that's why I deleted my original comment)
> it is very important that the uncertainty in the number of trials is taken into account because over-estimating a fraction is a costly mistake.
This is not some precise jargon that is meaningless to the layman but completely clearly specified to a professional statistician. It's more like the specification written by your non-technical product manager for how some technical feature should work. A skilled data scientist will have the experience and the context to figure out what it's probably asking for, but he might write down a few more clarifying details before giving it to a junior on his team to implement.
If testing these kind of guess-what-the-stakeholder probably-means skills is the point of this test, it's quite good at it. But that's not what leetcode is for.
If "binomial distribution" and "confidence interval" are unfamiliar terms then you probably are not prepared to pass OP's "statistical reasoning test" regardless. I think most engineers wouldn't, and I only understood the intent of question 1 because my pandemic lockdown project was reading a stats textbook cover to cover.
I don’t think this is “leetcode for statisticians.” This question (and the other two) are all examples of concrete, real-world problems that people across a variety of quantitative disciplines frequently encounter.
In fact, the first question is directly relevant to voting on this site. When sorting replies by fraction of upvotes, how should the forum software rank a new reply with 1 upvote/0 downvotes, versus an older reply with 4 upvotes/1 downvote? What about an older, more controversial reply with 20 upvotes/7 downvotes? 15 upvotes/2 downvotes?
Indeed, I use this technique to sort search results in Splunk, as an extension of TF-IDF. Consider a scenario where us-east-2 is broken but us-east-1 is fine (clearly just a hypothetical!). Split the logs along that good/bad dimension, and then break down by some other pattern; log class, punct, etc. Usually I use a prior of 50:50 to help sort out the "happened once in bad cluster" events.
> The lower bound of which can be used to order the fractions, and so control the risk of over-estimation.
It not clear to me from the question whether the cost of a mistake is in the over-estimating the underlying effect or in misranking the effects, and that seems like it would drive your heuristic selection.
“However, it is very important that the uncertainty in the number of trials is taken into account because over-estimating a fraction is a costly mistake.“
Seems fairly clear to me that you’re supposed to use a lower bound estimate to take into account variance on the fraction due to the number of trials in a way to bounds the chance of over estimation.
Further, there is no need for a heuristic when there a several statistical models for this exact problem with clear properties. Some are given in the answer.
In the context of the post this doesn't make sense, so the reader is left to hypothesize what the writer actually meant.
This is probably the formula to memorise and check against.
If you want a rough 95 % confidence interval without complicated maths, the Agresti–Coull interval is useful. It's computed as if the distribution was normal, but pretending there were two more successes and failures instead.
If you have access to a machine or lookup tables, you might as well plug in the values for the distribution Beta(1+n,1+m) which should correspond to the joint density.
(The formula above corresponds to the mean of this distribution, so it's probably right but I haven't work it through myself now ...)
Neither involve Monte Carlo sampling. Both are general and principled.