Did some research, read some books, and realized I should get in the habit of tracking my decision process. That quickly turned into the idea that formed Convexly.
The landing page is a 10-question calibration quiz where you assign a confidence level to statements drawn from a rotating pool of 100 (working on making the pool larger) and you get a Brier score back instantly. No signup required, and you can share your scores right away.
If you find it interesting, you can create a free account where you can track your decisions with probability estimates, resolve them over time, and get calibration curves that show if you are over/underconfident. From what I've seen so far, users are overconfident when they say they're between 70-90% sure about something.
For the math: Beta-PERT distributions for the payoff modeling, Kelly criterion for the position sizing, signal detection theory for separating skill from randomness.
On the coding side: FastAPI with NumPy/SciPy, frontend in Next.js and Supabase.
So far this has been a solo project of mine. If you want to see all the features use code SHOWHN for 30 days of full access, no credit card required.
Curious if anything about your score surprised you after taking the quiz.
Two 6s in a row is 1/36 chance (1/6)^2
1-2-3-4-5-6 is a 1/46656 chance (1/6)^6
Website is claiming they are the same probability:
> Same probability: 1/46,656 — Both outcomes have exactly the same probability: (1/6)^6 = 1/46,656. This illustrates the representativeness heuristic — random-looking sequences feel more probable than ordered ones.
Website's "answer" is wrong: was the question supposed to be rolling a 6 six times in a row?
A better way to illustrate this bias is with coin flips. People will tell you that odds of 6 heads is more rare than the odds 3 tails then 3 heads. The difficulty is understanding whether they mean "in order" or "as a group".
If it's in order, the odds are the same. Every order of H/T has the same probability, but humans will see "all heads" and think that's more rare. But the important bit is whether there's a clear understanding ordering.
As a test of general knowledge it was interesting. The confidence angle was the most interesting part, though.
If so, isn't that conflating knowledge with over/under confidence?
You should either return NA in that circumstance, or keep asking questions until you have actual data to work with.
Is this type of quiz reproducible for individuals and across various cross-sections of the population?
Are there studies on this? Is the quiz based on these studies?
In terms of research, Tetlock's Expert Political Judgement and Superforecasting were the foundation. He did a 20 year study that showed domain experts were barely better than chance at long-range predictions. The Brier score was the standard metric for that research.
That still doesn't make sense. Why can't it just be shown on the page?
Manifest fetch from https://www.convexly.app/manifest.json failed, code 403
Heck yeah.
- Inter font
- all caps section headers
- Lucide icons
- em dashes, of course the em dashes
- bubble status badges (of course with all-caps "IN PROGRESS" and "COMING SOON" that mean the same thing)
- Uncited claims like "Most founders are overconfident in the 70-90% range" and "Most people score between 0.20 and 0.30"
- No less than FOUR blog articles all published April 4
None of these points is by any means a dealbreaker. And after all, I suppose a product should be judged on its merits and the value it delivers to its users, not on the tools used to create it. But together, the frontend bears the unmistakeable generative AI "smell" that telegraphs that the human(s) directing the tools building this app might be optimizing for speed over rigor and quality (further supported by the volunteer QA/QC happening in the comments), and may only be as good and reliable as the uncritically accepted outputs of a $20/month coding assistant.
I unsubscribe from mails that aren't useful to me day-to-day because they're distracting.
Other than that it seems like a cool idea. I'd recommend slightly bigger fonts. I often have this issue with Gemini.
Brier Score: 0.216 (lower is better)
Diagnosis: OverconfidentIt’s something I’m interested improving on as well as predictions in general. I saw you suggested Thinking Fast and Slow and I’ve skimmed through some of Superforecasters. Metaculus have a bunch of resources too.
https://taketest.xyz/confidence-calibration
The same site also has something with a fixed confidence level: https://taketest.xyz/ci-calibration
>0.188
Slightly above avg - yay
I'm happy for you if it works but I sure feel cheated. I hope others also feel it's against the spirit of a Show HN. But maybe it's just me.
Your Calibration Results 9/10 correct direction
Brier Score
0.131
Lower is better (0 = perfect)
Diagnosis
Well Calibrated Strong score. You were right more often than your confidence suggested. Trust your gut more.