undefined | Better HN

0 pointsutopiah5mo ago0 comments

If I understood correctly you are giving an example of a "success" of using the technology. So that's addressing that the technology is useful or not, powerful or not, but it does not address what it actually does (maybe somebody in ChatGPT is a gnome that solved it, I'm just being provocative here to make the point) or more important that it does something it couldn't do a year ago or 5 years ago because how it is doing something new.

For example if somebody had used GPT2 with the input dataset of GPT5.2 (assuming that's the one used for Erdos problems) rather than the input dataset it had then, could it have solved those same problems? Without doing such tests it's hard to say if it moved fast, or at all. It's not because something new has been solved by it that it's new. Yes it's a reasonable assumption, but it's just that. So going for that to assuming "it" is "moving fast" is just a belief IMHO.

0 comments

2 comments · 2 top-level

utopiahOP5mo ago

Also something that makes the whole process very hard to verify is what I tried to address in a much older comment : whenever LLMs are used (regardless of the input dataset) by someone who is an expert in the domain (rather than an novice) how can one evaluate what's been done by whom or what? Sure again there can be a positive result, e.g a solution to a problem until now unsolved, what does it say about the tool itself versus a user who is, by definition if they are an expert, up to date on the state of thew art?

utopiahOP5mo ago

Also the very fact that https://github.com/teorth/erdosproblems/wiki/AI-contribution... exist totally change the landscape. Because it's public it's safe to assume it's part of the input dataset so from now on, how does one evaluate the pace of progress, in particular for non open source models?

j / k navigate · click thread line to collapse