Show HN: I built a 2-min quiz that shows you how bad you are at estimating (opens in new tab)

(convexly.app)

20 pointsconvexly2mo ago66 comments

I've gotten to the point in my career where I now make strategic decisions often (hiring, firing, choosing what equipment to go with, etc.), as well as in my personal life where I need to strongly weigh my options for a big purchase or investment. I found a not-so-surprising parallel between the two as these decisions "resolved." Am I making good decisions or am I getting lucky?

Did some research, read some books, and realized I should get in the habit of tracking my decision process. That quickly turned into the idea that formed Convexly.

The landing page is a 10-question calibration quiz where you assign a confidence level to statements drawn from a rotating pool of 100 (working on making the pool larger) and you get a Brier score back instantly. No signup required, and you can share your scores right away.

If you find it interesting, you can create a free account where you can track your decisions with probability estimates, resolve them over time, and get calibration curves that show if you are over/underconfident. From what I've seen so far, users are overconfident when they say they're between 70-90% sure about something.

For the math: Beta-PERT distributions for the payoff modeling, Kelly criterion for the position sizing, signal detection theory for separating skill from randomness.

On the coding side: FastAPI with NumPy/SciPy, frontend in Next.js and Supabase.

So far this has been a solo project of mine. If you want to see all the features use code SHOWHN for 30 days of full access, no credit card required.

Curious if anything about your score surprised you after taking the quiz.

Show HN: I built a 2-min quiz that shows you how bad you are at estimating

(convexly.app)

20 pointsconvexly2mo ago66 comments

Did some research, read some books, and realized I should get in the habit of tracking my decision process. That quickly turned into the idea that formed Convexly.

For the math: Beta-PERT distributions for the payoff modeling, Kelly criterion for the position sizing, signal detection theory for separating skill from randomness.

On the coding side: FastAPI with NumPy/SciPy, frontend in Next.js and Supabase.

So far this has been a solo project of mine. If you want to see all the features use code SHOWHN for 30 days of full access, no credit card required.

Curious if anything about your score surprised you after taking the quiz.

66 comments

63 comments · 26 top-level

addisonl2mo ago· 7 in thread

> Question: A fair die rolling a 6 twice in a row is more likely than rolling 1-2-3-4-5-6 in sequence

Two 6s in a row is 1/36 chance (1/6)^2

1-2-3-4-5-6 is a 1/46656 chance (1/6)^6

Website is claiming they are the same probability:

> Same probability: 1/46,656 — Both outcomes have exactly the same probability: (1/6)^6 = 1/46,656. This illustrates the representativeness heuristic — random-looking sequences feel more probable than ordered ones.

Website's "answer" is wrong: was the question supposed to be rolling a 6 six times in a row?

cyanydeez2mo ago

Yeah, most likely it was try to identify a bias of human perception, that 1,2,3,4,5,6 would be more probably than 6x6.

A better way to illustrate this bias is with coin flips. People will tell you that odds of 6 heads is more rare than the odds 3 tails then 3 heads. The difficulty is understanding whether they mean "in order" or "as a group".

If it's in order, the odds are the same. Every order of H/T has the same probability, but humans will see "all heads" and think that's more rare. But the important bit is whether there's a clear understanding ordering.

convexlyOP2mo ago

That's definitely better framing for this question. Much cleaner way to illustrate that point!

convexlyOP2mo ago

You're right, that's a mistake in how I phrased the question. It should say "six times in a row" not "twice in a row". Fixing it now! Thanks for pointing that out!

snarf212mo ago

If anyone is interested in why we are bad at estimating, please check out the amazing book Thinking, Fast and Slow: Daniel Kahneman.

convexlyOP2mo ago

Great recommendation. That was one of the biggest influences for starting to write my decisions down and then building this.

1qaboutecs2mo ago

came with the same complaint. the website then had the nerve to tell me i am overconfident.

convexlyOP2mo ago

Fair point! Bad question on my end. The overconfidence was based on all 10 questions though, not just that one!

convexlyOP2mo ago· 4 in thread

Update: 400+ quiz takers now... insane. Best Brier score so far is 0.007 (nearly perfect calibration). The worst came in at 0.600. Average is 0.230, still just better than a coin flip. Where did you land?

tommica2mo ago

Worst came in 0.600? Fuck, I got 0.550...

convexlyOP2mo ago

Just need practice! People have no idea how overconfident they actually are.

bovermyer2mo ago

I hit 0.012.

As a test of general knowledge it was interesting. The confidence angle was the most interesting part, though.

convexlyOP2mo ago

That's the second best score I've seen today out of 700+ quiz takers! Exceptional calibration. The confidence angle is the whole point, people don't know how far off they actually are until they see the hard data!

gcanyon2mo ago· 3 in thread

Wait, so roughly is it rewarding being confident when correct, and penalizing being confident when wrong? Meaning that the highest score is only achievable if you answer fully confident true or false, and get all 10 correct?

If so, isn't that conflating knowledge with over/under confidence?

convexlyOP2mo ago

Your point on scoring is correct, if you're 100% confident and right on everything you would score a perfect 0. The calibration insight is in how you handle the questions where you don't know the answer. Say you're highly knowledgeable and 95% confident on everything, but get 2 wrong scores compared to someone that says they are 70% confident on those same two questions. That would indicate that you are overconfident compared to the other person!

gcanyon2mo ago

I think the 100% certain and always right scenario invalidates the calculation. In that outcome you know nothing about my (over) confidence level when I am wrong.

You should either return NA in that circumstance, or keep asking questions until you have actual data to work with.

macleginn2mo ago

How are they different? If you "know" something, you are 100% confident in it, which gives you an easy 0 for this question (or a surprising 1). Philosophically, the problem is more that there is no difference between confidently and modestly wrong in terms of consequences of binary decisions.

lorenzohess2mo ago· 2 in thread

Maybe I don't know enough about "calibration" in a technical sense, but it seems like this quiz cant really distinguish between factual knowledge and calibration skill?

Is this type of quiz reproducible for individuals and across various cross-sections of the population?

Are there studies on this? Is the quiz based on these studies?

convexlyOP2mo ago

Great question. Calibration specifically is about whether your confidence in an answer matches your accuracy, not whether you know the answer. Someone who knows a lot but is always 90% confident would score poorly even if they're wrong 20% of the time, as an example.

In terms of research, Tetlock's Expert Political Judgement and Superforecasting were the foundation. He did a 20 year study that showed domain experts were barely better than chance at long-range predictions. The Brier score was the standard metric for that research.

lorenzohess2mo ago

I see, that makes a lot of sense. Maybe the UI should reflect this? Have one button for True or False or Uncertain, and then the slider for confidence in the answer?

1 more reply

iamtedd2mo ago· 2 in thread

Why do I need to sign up to get the results? Why couldn't it just be on the page?

convexlyOP2mo ago

The Brier score and "diagnosis" are shown immediately, no signup needed. The email is optional and only if you want to see the calibration curve and the question breakdown sent to you. I'll make that clearer!

iamtedd2mo ago

> The email is optional and only if you want to see the calibration curve and the question breakdown sent to you.

That still doesn't make sense. Why can't it just be shown on the page?

reltnek2mo ago· 2 in thread

I think this might be conflating confidence with accuracy. I tried leaving the slider the the middle (nominally the least confident position) and it gave a score of 0.25 and diagnosed it as 'overconfident'.

convexlyOP2mo ago

That is definitely a bug, thank you for pointing that out. Should have been neutral! I'll push a fix for this.

macleginn2mo ago

The Brier score is pathological when the guess is 0.5: regardless of the outcome, it will be equal to 0.25, so if you define "better than random" as having a score < 0.25, actually acting randomly makes you "overconfident".

fred_is_fred2mo ago· 2 in thread

Is it down? The start and skip button both dont work and I see this error in my console.

Manifest fetch from https://www.convexly.app/manifest.json failed, code 403

convexlyOP2mo ago

Just checked and everything is up. That might just be a console warning, but shouldn't affect the quiz. Can you try a hard refresh (ctrl+shift+R)? If that still doesn't work, what browser are you on?

fred_is_fred2mo ago

I tried Chrome and Safari. It's working great on my phone, so probably zscalar.

1 more reply

loloquwowndueo2mo ago· 2 in thread

“You averaged 97% confidence but were right 80% of the time.”

Heck yeah.

convexlyOP2mo ago

Actually means you have really strong knowledge. That 17% gap though is what gets people in trouble in high-stakes settings!

loloquwowndueo2mo ago

Yeah! I’m confident when I give an answer. In a real life scenario I would actually research the ones I’m not so sure about - but having a confident first take narrows down that research a lot.

1 more reply

Evgeniuz2mo ago· 1 in thread

There’s a bias, I think. When I saw the title that is about how bad I’m at estimating, I’ve leaned towards counterintuitive answers. This got me quite a high score. I think test set should also include intuitive facts (or maybe I was just lucky).

convexlyOP2mo ago

As much as it is counterintuitive, that is actually a valid calibration strategy. If you notice the questions lean slightly towards counterintuitive and adjust for it, that IS better calibration! But you raise a fair point about framing bias from the title.

EForEndeavour2mo ago· 1 in thread

Apologies if this is off-topic, but having spent more time than I'd like to admit having to create and edit webapps that emerged entirely out of Claude Code, Cursor, Codex, etc. with minimal to no direct code-writing by their human subscribers, this website has strong AI smells:

- Inter font

- all caps section headers

- Lucide icons

- em dashes, of course the em dashes

- bubble status badges (of course with all-caps "IN PROGRESS" and "COMING SOON" that mean the same thing)

- Uncited claims like "Most founders are overconfident in the 70-90% range" and "Most people score between 0.20 and 0.30"

- No less than FOUR blog articles all published April 4

None of these points is by any means a dealbreaker. And after all, I suppose a product should be judged on its merits and the value it delivers to its users, not on the tools used to create it. But together, the frontend bears the unmistakeable generative AI "smell" that telegraphs that the human(s) directing the tools building this app might be optimizing for speed over rigor and quality (further supported by the volunteer QA/QC happening in the comments), and may only be as good and reliable as the uncritically accepted outputs of a $20/month coding assistant.

convexlyOP2mo ago

That's all true. I'm a solo founder and have been using Claude heavily to build this. It definitely shows in many places, and I'll make sure to clean those up. I did not expect to get this many visits from a show HN (almost at 1600 quiz takers from the last few hours alone). The core math is sound, but I agree the presentation needs more care. Appreciate the honest feedback!

testycool2mo ago· 1 in thread

I thought it was interesting, but don't appreciate having to give you my email to see full results.

I unsubscribe from mails that aren't useful to me day-to-day because they're distracting.

Other than that it seems like a cool idea. I'd recommend slightly bigger fonts. I often have this issue with Gemini.

  Brier Score: 0.216 (lower is better)
  Diagnosis: Overconfident

convexlyOP2mo ago

Just pushed a fix for that! You should be able to see everything without inputting your email now. I've made a note about font size, thank you for the feedback.

slothsonaplane2mo ago· 1 in thread

Brier scoring works on questions with cheap, fast resolution; the strategic decisions you mention (hiring, equipment, big purchases) resolve over months or years, often ambiguously, and the counterfactual never resolves at all. Curious whether the calibration gains from the rapid-feedback quiz actually transfer to the slow-feedback domains the tool is designed to help with, or whether it ends up training a slightly different skill. A second thing: most of my strategic decisions weren't solo, and once one calibrated person sits in a room with two louder uncalibrated ones, the calibration math stops being load-bearing. Have you thought about a team variant?

convexlyOP2mo ago

Both really good points. The research does suggest that the core skill does transfer. The quiz can help with long horizon predictions. The mechanism itself seems to be the actual awareness of overconfidence rather than just domain-specific knowledge. With that being said, the gap between the quiz and real-world application is real, and tracking both over time is part of why I built the decision logging side. For your question about teams, that's a built-in feature already! Submissions are "sealed" so you submit before seeing others. The team feature also has a believability-weighted aggregation based on each submitter's track record, and I also built an IC mode for investment committees. The problem you describe about one calibrated person in a room with two uncalibrated ones is what the sealed model prevents. Everyone makes draws their own conclusion, then they compare!

sonofhans2mo ago· 1 in thread

I’ve taken the quiz but not been compelled to sign up. The site feels manipulative, e.g., the “show me all the questions” link is tiny and hidden between two larger boxes, and even then it only shows 2 questions with a signup CTA. Maybe that’s best practice growth hacking these days, but to me it’s a manipulative turnoff. If you’d given me all the questions and answers simply then I would signed up for more, especially with the discount code. Otherwise, how am I supposed to even know what I’m signup up for? Every interaction I’ve had with the site so far is a sales attempt, so mostly I expect more of those.

convexlyOP2mo ago

That's honest feedback, I appreciate it. The post quiz shouldn't feel like a sales funnel. I'll clean that up. Working on it!

pacificpendant2mo ago· 1 in thread

Having previously spent a reasonable amount of time on Metaculus I’m familiar with Brier scores and rating my confidence. I assume that’s how I was able to get better than average results. It’s an interesting app.

It’s something I’m interested improving on as well as predictions in general. I saw you suggested Thinking Fast and Slow and I’ve skimmed through some of Superforecasters. Metaculus have a bunch of resources too.

https://www.metaculus.com/help/prediction-resources/

convexlyOP2mo ago

Thank you! Happy to hear to how it compares!

convolvatron2mo ago· 1 in thread

I didn't find the questions very representative about estimation. that is maybe if happen to know many of random root facts about the world under which they were based, then their application might be a revenant question about ability to estimate. I really felt more like I was making uneducated guesses (0.155). I suppose I was expecting more ping pong balls in airplanes

convexlyOP2mo ago

The point I was going for was more so how people handle questions they don't know the answer to. Someone that is "well-calibrated" would set things they are uncertain about at closer to 50% instead of guessing one way or the other (overconfident). That score is excellent, so it suggests you did exactly that!

Hnus2mo ago· 1 in thread

Why is it asking for email?

convexlyOP2mo ago

I just removed that, full results should be fully visible without email! A hard refresh should show the update.

rahimnathwani2mo ago· 1 in thread

This reminds me of:

https://taketest.xyz/confidence-calibration

The same site also has something with a fixed confidence level: https://taketest.xyz/ci-calibration

convexlyOP2mo ago

That's awesome, hadn't seen this one! I like the confidence interval approach.

unsnap_biceps2mo ago· 1 in thread

The slider disappearing when sliding between extremes is very confusing. I think the silver should be the only thing displayed and remove the buttons entirely.

convexlyOP2mo ago

The change to buttons was based on feedback I got today. The slider disappearing is a bug. Pushing a fix now!

Havoc2mo ago· 1 in thread

I'd consider removing some questions that are bound to be country specific. e.g. The one about time spent in front of a red light.

>0.188

Slightly above avg - yay

convexlyOP2mo ago

That's fair, I'll flag those or maybe even add regional context. Nice score, well above average!

suralind2mo ago· 1 in thread

Did it twice: once had 0.177, 2nd time got 0.280. Note sure what to make of this, I guess I should always leave it on 50/50?

convexlyOP2mo ago

The variance is normal, the questions pull from a pool of 138 questions so far. 0.177 is strong. Setting everything to 50% would just get you 0.25, so you did way better on the first attempt. The goal isn't 50/50 on everything, only on the occasions where you are not confident that you are right.

zupa-hu2mo ago· 1 in thread

It is very disappointing that you can't see what you got right or wrong without giving out your email. I'm not even sure if one would learn from the email or whatever the calibration result is.

I'm happy for you if it works but I sure feel cheated. I hope others also feel it's against the spirit of a Show HN. But maybe it's just me.

convexlyOP2mo ago

That's a good point, I might have gated it too hard. I'll open up the full results now. Appreciate the feedback.

convexlyOP2mo ago

Quick update: 1,934 quiz completions, 44.5% scored overconfident. Most interesting finding was that the quiz itself got more engagement than the product behind it. Added educational tooltips, a public roadmap with voting, and UTM tracking based on feedback here and from users that reached out directly!

convexlyOP2mo ago

Made a few changes based on feedback from this thread: full results now shown immediately with no email gate, changed the UX to include true/false/uncertain buttons + a confidence slider, I cleaned up the quiz result page, and fixed the die probability question. Thanks for all the honest feedback!

convexlyOP2mo ago

Update at 2 hours: 1350+ quiz takers! 50% overconfident, 40% well-calibrated, and 10% underconfident. The average score is around 0.228, with the best score still at 0.007 (nearly perfect). The pattern so far is people are most overconfident in the 70-90% range, but are right closer to ~55% of the time.

convexlyOP2mo ago

Interesting data from the quiz so far: 160+ quiz takers! The average is 0.239 (barely better than a coin flip at 0.25), but almost everyone indicates they are confident in their answers.

senectus12mo ago

heh.. nailed it first go:

Your Calibration Results 9/10 correct direction

Brier Score

0.131

Lower is better (0 = perfect)

Diagnosis

Well Calibrated Strong score. You were right more often than your confidence suggested. Trust your gut more.

j / k navigate · click thread line to collapse

66 comments

63 comments · 26 top-level

addisonl2mo ago· 7 in thread

> Question: A fair die rolling a 6 twice in a row is more likely than rolling 1-2-3-4-5-6 in sequence

Two 6s in a row is 1/36 chance (1/6)^2

1-2-3-4-5-6 is a 1/46656 chance (1/6)^6

Website is claiming they are the same probability:

Website's "answer" is wrong: was the question supposed to be rolling a 6 six times in a row?

cyanydeez2mo ago

Yeah, most likely it was try to identify a bias of human perception, that 1,2,3,4,5,6 would be more probably than 6x6.

convexlyOP2mo ago

That's definitely better framing for this question. Much cleaner way to illustrate that point!

convexlyOP2mo ago

You're right, that's a mistake in how I phrased the question. It should say "six times in a row" not "twice in a row". Fixing it now! Thanks for pointing that out!

snarf212mo ago

If anyone is interested in why we are bad at estimating, please check out the amazing book Thinking, Fast and Slow: Daniel Kahneman.

convexlyOP2mo ago

Great recommendation. That was one of the biggest influences for starting to write my decisions down and then building this.

1qaboutecs2mo ago

came with the same complaint. the website then had the nerve to tell me i am overconfident.

convexlyOP2mo ago

Fair point! Bad question on my end. The overconfidence was based on all 10 questions though, not just that one!

convexlyOP2mo ago· 4 in thread

tommica2mo ago

Worst came in 0.600? Fuck, I got 0.550...

convexlyOP2mo ago

Just need practice! People have no idea how overconfident they actually are.

bovermyer2mo ago

I hit 0.012.

As a test of general knowledge it was interesting. The confidence angle was the most interesting part, though.

convexlyOP2mo ago

gcanyon2mo ago· 3 in thread

If so, isn't that conflating knowledge with over/under confidence?

convexlyOP2mo ago

gcanyon2mo ago

I think the 100% certain and always right scenario invalidates the calculation. In that outcome you know nothing about my (over) confidence level when I am wrong.

You should either return NA in that circumstance, or keep asking questions until you have actual data to work with.

macleginn2mo ago

lorenzohess2mo ago· 2 in thread

Maybe I don't know enough about "calibration" in a technical sense, but it seems like this quiz cant really distinguish between factual knowledge and calibration skill?

Is this type of quiz reproducible for individuals and across various cross-sections of the population?

Are there studies on this? Is the quiz based on these studies?

convexlyOP2mo ago

lorenzohess2mo ago

I see, that makes a lot of sense. Maybe the UI should reflect this? Have one button for True or False or Uncertain, and then the slider for confidence in the answer?

1 more reply

iamtedd2mo ago· 2 in thread

Why do I need to sign up to get the results? Why couldn't it just be on the page?

convexlyOP2mo ago

iamtedd2mo ago

> The email is optional and only if you want to see the calibration curve and the question breakdown sent to you.

That still doesn't make sense. Why can't it just be shown on the page?

reltnek2mo ago· 2 in thread

convexlyOP2mo ago

That is definitely a bug, thank you for pointing that out. Should have been neutral! I'll push a fix for this.

macleginn2mo ago

fred_is_fred2mo ago· 2 in thread

Is it down? The start and skip button both dont work and I see this error in my console.

Manifest fetch from https://www.convexly.app/manifest.json failed, code 403

convexlyOP2mo ago

Just checked and everything is up. That might just be a console warning, but shouldn't affect the quiz. Can you try a hard refresh (ctrl+shift+R)? If that still doesn't work, what browser are you on?

fred_is_fred2mo ago

I tried Chrome and Safari. It's working great on my phone, so probably zscalar.

1 more reply

loloquwowndueo2mo ago· 2 in thread

“You averaged 97% confidence but were right 80% of the time.”

Heck yeah.

convexlyOP2mo ago

Actually means you have really strong knowledge. That 17% gap though is what gets people in trouble in high-stakes settings!

loloquwowndueo2mo ago

Yeah! I’m confident when I give an answer. In a real life scenario I would actually research the ones I’m not so sure about - but having a confident first take narrows down that research a lot.

1 more reply

Evgeniuz2mo ago· 1 in thread

convexlyOP2mo ago

EForEndeavour2mo ago· 1 in thread

- Inter font

- all caps section headers

- Lucide icons

- em dashes, of course the em dashes

- bubble status badges (of course with all-caps "IN PROGRESS" and "COMING SOON" that mean the same thing)

- Uncited claims like "Most founders are overconfident in the 70-90% range" and "Most people score between 0.20 and 0.30"

- No less than FOUR blog articles all published April 4

convexlyOP2mo ago

testycool2mo ago· 1 in thread

I thought it was interesting, but don't appreciate having to give you my email to see full results.

I unsubscribe from mails that aren't useful to me day-to-day because they're distracting.

Other than that it seems like a cool idea. I'd recommend slightly bigger fonts. I often have this issue with Gemini.

  Brier Score: 0.216 (lower is better)
  Diagnosis: Overconfident

convexlyOP2mo ago

Just pushed a fix for that! You should be able to see everything without inputting your email now. I've made a note about font size, thank you for the feedback.

slothsonaplane2mo ago· 1 in thread

convexlyOP2mo ago

sonofhans2mo ago· 1 in thread

convexlyOP2mo ago

That's honest feedback, I appreciate it. The post quiz shouldn't feel like a sales funnel. I'll clean that up. Working on it!

pacificpendant2mo ago· 1 in thread

https://www.metaculus.com/help/prediction-resources/

convexlyOP2mo ago

Thank you! Happy to hear to how it compares!

convolvatron2mo ago· 1 in thread

convexlyOP2mo ago

Hnus2mo ago· 1 in thread

Why is it asking for email?

convexlyOP2mo ago

I just removed that, full results should be fully visible without email! A hard refresh should show the update.

rahimnathwani2mo ago· 1 in thread

This reminds me of:

https://taketest.xyz/confidence-calibration

The same site also has something with a fixed confidence level: https://taketest.xyz/ci-calibration

convexlyOP2mo ago

That's awesome, hadn't seen this one! I like the confidence interval approach.

unsnap_biceps2mo ago· 1 in thread

The slider disappearing when sliding between extremes is very confusing. I think the silver should be the only thing displayed and remove the buttons entirely.

convexlyOP2mo ago

The change to buttons was based on feedback I got today. The slider disappearing is a bug. Pushing a fix now!

Havoc2mo ago· 1 in thread

I'd consider removing some questions that are bound to be country specific. e.g. The one about time spent in front of a red light.

>0.188

Slightly above avg - yay

convexlyOP2mo ago

That's fair, I'll flag those or maybe even add regional context. Nice score, well above average!

suralind2mo ago· 1 in thread

Did it twice: once had 0.177, 2nd time got 0.280. Note sure what to make of this, I guess I should always leave it on 50/50?

convexlyOP2mo ago

zupa-hu2mo ago· 1 in thread

It is very disappointing that you can't see what you got right or wrong without giving out your email. I'm not even sure if one would learn from the email or whatever the calibration result is.

I'm happy for you if it works but I sure feel cheated. I hope others also feel it's against the spirit of a Show HN. But maybe it's just me.

convexlyOP2mo ago

That's a good point, I might have gated it too hard. I'll open up the full results now. Appreciate the feedback.

convexlyOP2mo ago

Interesting data from the quiz so far: 160+ quiz takers! The average is 0.239 (barely better than a coin flip at 0.25), but almost everyone indicates they are confident in their answers.

senectus12mo ago

heh.. nailed it first go:

Your Calibration Results 9/10 correct direction

Brier Score

0.131

Lower is better (0 = perfect)

Diagnosis

Well Calibrated Strong score. You were right more often than your confidence suggested. Trust your gut more.

j / k navigate · click thread line to collapse