Cynddl on Hacker News

1

Our evaluation of OpenAI's GPT-5.5 cyber capabilities (opens in new tab)

(aisi.gov.uk)

2Cynddl1mo ago0

2

Making AI chatbots friendly leads to mistakes and support of conspiracy theories (opens in new tab)

(theguardian.com)

93Cynddl1mo ago80

3

UK Biobank health data keeps ending up on GitHub (opens in new tab)

(biobank.rocher.lc)

197Cynddl2mo ago57

4

ChatGPT Edu feature reveals researchers' project metadata across universities (opens in new tab)

(fastcompany.com)

2Cynddl3mo ago0

5

AI no better than other methods for patients seeking medical advice, study shows (opens in new tab)

(reuters.com)

3Cynddl4mo ago0

6

AI chatbots pose 'dangerous' risk when giving medical advice, study suggests (opens in new tab)

(bbc.co.uk)

4Cynddl4mo ago2

7

Show HN: Small, anonymous app for teams to do retrospective sessions (opens in new tab)

(retrospective.rocher.lc)

1Cynddl4mo ago0

8

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

(arxiv.org)arXiv

1Cynddl7mo ago0

9

AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds (opens in new tab)

(gizmodo.com)

43Cynddl7mo ago17

10

AI's capabilities may be exaggerated by flawed tests, according to new study (opens in new tab)

(nbcnews.com)

3Cynddl7mo ago0

11

Experts find flaws in tests that check AI safety and effectiveness (opens in new tab)

(theguardian.com)

3Cynddl7mo ago0

12

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

(oxrml.com)

3Cynddl7mo ago2

13

The quiet software tooling Renaissance (opens in new tab)

(pdx.su)

3Cynddl9mo ago0

14

Facial recognition works better in the lab than on the street, researchers show (opens in new tab)

(theregister.com)

4Cynddl10mo ago1

15

We Shouldn't Trust Facial Recognition's Glowing Test Scores (opens in new tab)

(techpolicy.press)

2Cynddl10mo ago0

Cynddl

Recent submissions

Our evaluation of OpenAI's GPT-5.5 cyber capabilities (opens in new tab)

Making AI chatbots friendly leads to mistakes and support of conspiracy theories (opens in new tab)

UK Biobank health data keeps ending up on GitHub (opens in new tab)

ChatGPT Edu feature reveals researchers' project metadata across universities (opens in new tab)

AI no better than other methods for patients seeking medical advice, study shows (opens in new tab)

AI chatbots pose 'dangerous' risk when giving medical advice, study suggests (opens in new tab)

Show HN: Small, anonymous app for teams to do retrospective sessions (opens in new tab)

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds (opens in new tab)

AI's capabilities may be exaggerated by flawed tests, according to new study (opens in new tab)

Experts find flaws in tests that check AI safety and effectiveness (opens in new tab)

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

The quiet software tooling Renaissance (opens in new tab)

Facial recognition works better in the lab than on the street, researchers show (opens in new tab)

We Shouldn't Trust Facial Recognition's Glowing Test Scores (opens in new tab)

Recent submissions

Our evaluation of OpenAI's GPT-5.5 cyber capabilities (opens in new tab)

Making AI chatbots friendly leads to mistakes and support of conspiracy theories (opens in new tab)

UK Biobank health data keeps ending up on GitHub (opens in new tab)

ChatGPT Edu feature reveals researchers' project metadata across universities (opens in new tab)

AI no better than other methods for patients seeking medical advice, study shows (opens in new tab)

AI chatbots pose 'dangerous' risk when giving medical advice, study suggests (opens in new tab)

Show HN: Small, anonymous app for teams to do retrospective sessions (opens in new tab)

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds (opens in new tab)

AI's capabilities may be exaggerated by flawed tests, according to new study (opens in new tab)

Experts find flaws in tests that check AI safety and effectiveness (opens in new tab)

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

The quiet software tooling Renaissance (opens in new tab)

Facial recognition works better in the lab than on the street, researchers show (opens in new tab)

We Shouldn't Trust Facial Recognition's Glowing Test Scores (opens in new tab)