GPT‑Rosalind for life sciences research (opens in new tab)

(openai.com)

102 pointsbabelfish23d ago30 comments

30 comments

Is it me or they very carefully do not report performance on GPT-5.4 Pro, only the default GPT-5.4? They also very carefully left Anthropic models out of their comparison.

I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.

[1] https://github.com/jaechang-hits/SciAgent-Skills

jadusm23d ago

Bix Bench seems like a really interesting/useful idea but most of the value for a layperson (like me) is comparing the results of different models on the benchmark. From what I can find there is no centralised & updated model results set. Shame.

furyofantares23d ago

I'm all for naming things in honor of Rosalind Franklin, but this seems like incredible misplaced hubris instead.

peyton23d ago

> GPT‑Rosalind is now available … for qualified customers …

It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.

bombcar23d ago

I'd rather the AI companies make up names, or name their products things like "Clod" than use my name (if they were to ask) - as no matter how good it looks today eventually it'll be some form of laughingstock.

Sanzig23d ago

Claude is most likely a nod to Claude Shannon, father of information theory and an early AI pioneer.

1 more reply

jszymborski23d ago

I hated this too.

an0malous23d ago

“GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert.”

Sam Altman, August 2025

https://www.bbc.com/news/articles/cy5prvgw0r1o

falcor8423d ago

What of it?

For me too, it was around that time last year, with GPT-5, Claude Sonnet 4.5 and then Gemini 3 that I started feeling that these models are clearly becoming great at reasoning. I'm not at all opposed to saying that they are around PhD-level on at least some domains.

kmaitreys23d ago

I think there's a lot of difference between sounding like someone and being someone. The models are excellent at pretending indeed.

falcor8423d ago

I don't think that sama was arguing that ChatGPT actually passed a PhD thesis defense. But arguably, it could make for an interesting benchmark.

1 more reply

0123456789ABCDE23d ago

exactly. this is what whole RL thing is optimizing for, even if that is not the intent.

huslage23d ago

I work for a life sciences company. It will be a long time before anyone trusts a generative model to do the actual science when mathematically provable models are as good as they are today. There is room for AI in the field, but it's not in the science directly.

oofbey23d ago

What would be a good use of AI? Writing code to do the modeling?

ben_w23d ago

Not yet, I think.

Earlier this year I tried to do this for a much simpler target than bioscience, a Farnsworth fusor, and even though I started off with ~"which open source physics libraries do you recommend we use for this?" and it giving me a list, instead of actually bothering to use any of those libraries that it suggested, it decided to roll its own simulation code, and the code it wrote very obviously didn't work.

It may *assist* with coding, but I don't think it could code for them yet.

ninjagoo22d ago

While this model set (GPT-Rosalind) is limited to certain organizations, the announcement also included the release of a Life Sciences Plugin, which is more broadly available on Codex [1].

[1] https://github.com/openai/plugins/tree/main/plugins/life-sci...

modeless23d ago

The voiceover in the promo video on this page seems to be AI generated, with some weird artifacts. Right at the beginning it sounds like it says "cormbiying structure daya retrieval and lirrachure search".

spwa423d ago

If you have something like this, how about you demonstrate a way to really help, and demonstrate (as opposed to claim) what this can do? Make a cheap vaccine against the new resistant forms of TBC, or if you truly want to impress, against HIV. DON'T get it approved at all, just publish how it would work, maybe with a simulation (so it can't be patented). This shouldn't even be so hard, it's just really hard to make money on either of those vaccines, as right 1st world countries have little need for them (HIV, perhaps, but vaccines don't make much money. A TBC vaccine, definitely doesn't make money), so you're not "getting in the way of business" doing that.

Why? AI's reputation would be greatly improved by saving a few 10s of millions of lives (per year, I might add). And either of those advances would do just that.

Oh, and another reason. Do either of these things and you'll have very rich businesses screaming to become your customer coming out of every hole. Guaranteed.

shwn298923d ago

I prefer GPT 5 pro, which i found expert in coding and reasoning.

tonfreed23d ago

Who's at fault when it suggests feeding someone cyanide?

falcor8423d ago

> We want to make these capabilities available to the scientists and research organizations best positioned to advance human health, while maintaining strong safeguards against biological misuse. The Life Sciences model is launching through a trusted-access deployment structure for qualified Enterprise customers in the U.S. to start, with controls around eligibility, access management, and organizational governance.

I'm absolutely ok with a legitimate lab scientist conducting biochemical research getting suggestions about substances that are generally considered dangerous but might be appropriate for their study, and it'll be up to the scientist to discern whether it is indeed appropriate to use.

jostmey23d ago

The real issue isn’t finding therapies but getting them tested in clinical trials

Gethsemane22d ago

I somewhat agree, in that most of these life science adjacent demos are essentially "find good drug targets for $DISEASE", which mostly overfit to existing, well-classified drugs and targets. The biggest gains IMO will be in improved connectors with autonomous lab platforms, better sharing and annotation of relevant data sets, and yes also improving the pathway to clinical trials.

At the moment, it feels like releases like this overcommit and overpromise on "PhD level reasoning", which I wouldn't say is the absolute bottleneck in clinical research.

XenophileJKO23d ago

I would argue that while you still have failed trials, then we have room to improve trial vetting.

j / k navigate · click thread line to collapse

30 comments

Cynddl23d ago

Is it me or they very carefully do not report performance on GPT-5.4 Pro, only the default GPT-5.4? They also very carefully left Anthropic models out of their comparison.

[1] https://github.com/jaechang-hits/SciAgent-Skills

jadusm23d ago

furyofantares23d ago

I'm all for naming things in honor of Rosalind Franklin, but this seems like incredible misplaced hubris instead.

peyton23d ago

> GPT‑Rosalind is now available … for qualified customers …

bombcar23d ago

Sanzig23d ago

Claude is most likely a nod to Claude Shannon, father of information theory and an early AI pioneer.

1 more reply

jszymborski23d ago

I hated this too.

an0malous23d ago

“GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert.”

Sam Altman, August 2025

https://www.bbc.com/news/articles/cy5prvgw0r1o

falcor8423d ago

What of it?

kmaitreys23d ago

I think there's a lot of difference between sounding like someone and being someone. The models are excellent at pretending indeed.

falcor8423d ago

I don't think that sama was arguing that ChatGPT actually passed a PhD thesis defense. But arguably, it could make for an interesting benchmark.

1 more reply

0123456789ABCDE23d ago

exactly. this is what whole RL thing is optimizing for, even if that is not the intent.

huslage23d ago

oofbey23d ago

What would be a good use of AI? Writing code to do the modeling?

ben_w23d ago

Not yet, I think.

It may *assist* with coding, but I don't think it could code for them yet.

ninjagoo22d ago

While this model set (GPT-Rosalind) is limited to certain organizations, the announcement also included the release of a Life Sciences Plugin, which is more broadly available on Codex [1].

[1] https://github.com/openai/plugins/tree/main/plugins/life-sci...

modeless23d ago

spwa423d ago

Why? AI's reputation would be greatly improved by saving a few 10s of millions of lives (per year, I might add). And either of those advances would do just that.

Oh, and another reason. Do either of these things and you'll have very rich businesses screaming to become your customer coming out of every hole. Guaranteed.

shwn298923d ago

I prefer GPT 5 pro, which i found expert in coding and reasoning.

tonfreed23d ago

Who's at fault when it suggests feeding someone cyanide?

falcor8423d ago

jostmey23d ago

The real issue isn’t finding therapies but getting them tested in clinical trials

Gethsemane22d ago

At the moment, it feels like releases like this overcommit and overpromise on "PhD level reasoning", which I wouldn't say is the absolute bottleneck in clinical research.

XenophileJKO23d ago

I would argue that while you still have failed trials, then we have room to improve trial vetting.

j / k navigate · click thread line to collapse