Skip to content

Top Best Ask Show New Jobs

AlphaProof's Greatest Hits (opens in new tab)

(rishimehta.xyz)

250 pointsrishicomplex1y ago133 comments

133 comments

72 comments · 9 top-level

sega_sai1y ago· 21 in thread

I think the interface of LLM with formalized languages is really the future. Because here you can formally verify every statement and deal with hallucinations.

thesz1y ago

So, you looking for Cyc [1], practically.

[1] https://en.wikipedia.org/wiki/Cyc

trenchgun1y ago

>The Cyc knowledge base involving ontological terms was largely created by hand axiom-writing

>An inference engine is a computer program that tries to derive answers from a knowledge base. The Cyc inference engine performs general logical deduction.[8] It also performs inductive reasoning, statistical machine learning and symbolic machine learning, and abductive reasoning. The Cyc inference engine separates the epistemological problem from the heuristicproblem. For the latter, Cyc used a community-of-agents architecture in which specialized modules, each with its own algorithm, became prioritized if they could make progress on the sub-problem.

Not really.

notTooFarGone1y ago

You reminded me of my pains with OpenCyc in my student job. Thanks :D

auggierose1y ago

Does Cyc have proofs?

I am building Memelang (memelang.net) to help with this as well. I'd love your thoughts if you have a moment!

thesz1y ago

You are building an SQL in disguise.

First, you need to encode "memes" and relations between them at scale. This is not a language problem, it is data handling problem.

Second, at some point of time you will need to query memes and relations between them, again, at scale. While expression of queries is a language problem, an implementation will heavily use what SQL engines does use.

And you need to look at Cyc: https://en.wikipedia.org/wiki/Cyc

It does what you are toing do for 40 (forty) years now.

chvid1y ago

Maybe include a section titled 'how is this different from and how is it related to' ie. relational algebra?

meindnoch1y ago

Looks like an incomplete Prolog.

est1y ago

> formalized languages is really the future

Hmm, maybe it's time for symbolism to shine?

If this were the case, I don't see why we'd need to wait for an AI company to make a breakthrough in math research. The key issue instead is how to encode 'real-life' statements in a formal language - which to me seems like a ludicrous problem, just complete magical thinking.

For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?

Or have I totally misunderstood what people mean when they say that developments in automatic theorem proving will solve LLM's hallucination problem?

nine_k1y ago

You can't talk about formally verifiable truthiness until you solve epistemology. This can be achieved formally in mathematics, with known principal limitations. Here strict theorem-proving, Lean-style, is viable.

It can also be achieved informally and in a fragments way in barely-mathematical disciplines, like biology, linguistics, and even history. We have chains of logical conclusions that do not follow strictly, but with various probabilistic limitations, and under modal logic of sorts. Several contradictory chains follow under the different (modal) assumptions / hypotheses, and often both should be considered. This is where probabilistic models like LLMs could work together with formal logic tools and huge databases of facts and observations, being the proverbial astute reader.

In some more relaxed semi-disciplines, like sociology, psychology, or philosophy, we have a hodgepodge of contradictory, poorly defined notions and hand-wavy reasoning (I don't speak about Wittgenstein here, but more about Freud, Foucault, Derrida, etc.) Here, I think, the current crop of LLMs is applicable most directly, with few augmentations. Still a much, much wider window of context might be required to make it actually productive, by the standards of the field.

benlivengood1y ago

Probabilistic reasoning is possible in a formal setting; It produces a probability distribution over answers. To ground probabilistic logic itself I'm not aware of much progress beyond the initial idea of logical induction[0].

[0] https://arxiv.org/abs/1609.03543

lukeschlather1y ago

People talk about how LLMs need "more data" but data is really sort of a blunt instrument to try and impart the LLM a lot of experience. Unfortunately data isn't actually experience, it's a record of an experience.

Now what we've seen with e.g. Chess and Go, is that when you can give a tensor model "real experience" at the speed of however many GPUs you have, the model is quickly capable of superhuman performance. Automatic theorem proving means you can give the model "real experience" without armies of humans writing down proofs, you can generate and validate proofs as fast as a computer will let you and the LLM has "real experience" with every new proof it generates along with an objective cost function, was the proof correct?

Now, this might not let us give the LLMs real experience with being a teacher or dealing with Marxist revolutionaries, but it would be a sea change in the ability of LLMs to do math, and it probably would let us give LLMs real experience in writing software.

This is a thing I'm working on, so I have some potentially useful thoughts. tl;dr, it doesn't have to be about encoding arbitrary real life statements to be super duper useful today.

> how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language?

Totally out of scope in the any near future for me at least. But that doesn't prevent it from being super useful for a narrower scope. For example:

- How might we take a statement like "(x + 1) (x - 5) = 0" and encode it formally?

- Or "(X X^T)^-1 X Y = B"?

- Or "6 Fe_2 + 3 H_2 O -> ?"?

We can't really do this for a huge swath of pretty narrow applied problems. In the first, what kind of thing is X? Is it an element of an algebraically closed field? In the second, are those matrices of real numbers? In the third, is that 6 times F times e_2 or 6 2-element iron molecules?

We can't formally interpret those as written, but you and I can easily tell what's meant. Meanwhile, current ways of formally writing those things up is a massive pain in the ass. Anything with a decent amount of common sense can tell you what is probably meant formally. We know that we can't have an algorithm that's right 100% of the time for a lot of relevant things, but 99.99% is pretty useful. If clippy says 'these look like matrices, right?' and is almost always right, then it's almost always saving you lots of time and letting lots more people benefit from formal methods with a much lower barrier to entry.

From there, it's easy to imagine coverage and accuracy of formalizable statements going up and up and up until so much is automated that it looks kinda like 'chatting about real-life statements' again. I doubt that's the path, but from a 'make existing useful formal methods super accessible' lens it doesn't have to be.

tucnak1y ago

Grammar sampling has been around for months, and remained largely unexplored. Don't fall into the common trap of thinking in a fixed language, rather think about a superset of possible languages (grammars) and how they could evolve from one another. I bet, if there's a breakthrough, it's probably in "differential grammars," or whatever it would be called: plug that into the backtracking sampler, & you have your System 2.

Ok so Claude says that the Riemann hypothesis is proven and gives you 1200 pages of math symbols backing it up. Now what?

You probably say "now a human can verify it" but what if the humans are wrong? What is the source of truth?

Turneyboy1y ago

The nice thing about formal verification is exactly that. You have a separate tool that's very much like a compiler that can check those 1200 pages and tell you that it's true.

The source of truth here is the code we wrote for the formal verification system.

It's obviously not the future (outside of mathematics research). The whole LLM boom we've seen in the past two years comes from one single fact: peopel don't need to learn a new language to use it.

seizethecheese1y ago

Both comments can be right. People don’t need to know HTML to use the internet.

nickpsecurity1y ago

Natural language -> Formal Language with LLM-assisted tactics/functions -> traditional tools (eg provers/planners) -> expert-readable outputs -> layperson-readable results.

I can imagine many uses for flows where LLM’s can implement the outer layers above.

Groxx1y ago

The difficulty then will be figuring out if the proof is relevant to what you want, or simply a proof of 1=1 in disguise.

throwaway7131y ago· 18 in thread

Anyone else feel like mathematics is sort of the endgame? I.e., once ML can do it better than humans, that’s basically it?

awanderingmind1y ago

The end of an era perhaps, but not 'the end' - another commenter has already mentioned Moravec's paradox: https://en.wikipedia.org/wiki/Moravec%27s_paradox

It will be interesting if/when these models start proving major open problems, e.g. the Riemann Hypothesis. The sociological impact on the mathematical community would certainly be acute, and likely lead to a seismic shift in the understanding of what research-level mathematics is 'for'. This discussion already appears to be in progress. As an outsider I have no idea what the timeline is for such things (2 years? 10? 100?).

On the plus side, AlphaProof has the benefit over ordinary LLMs in their current form in that it does not pollute our common epistemological well, and its output is eminently interrogable (if you know Lean at last).

jdietrich1y ago

Humans are terrible at anything you learn at university and incredibly good at most things you learn at trade school. In absolute terms, mathematics is much easier than laying bricks or cutting hair.

https://en.wikipedia.org/wiki/Moravec%27s_paradox

youoy1y ago

I would say that "narrow" mathematics (finding a proof of a given statement that we suspect has a proof using a formal language) is much easier that "generally" laying brick or cutting hair.

But I cannot see how consistently doing general mathematics (as in finding interesting and useful statements to proof, and then finding the proofs) is easier than consistently cutting hair/driving a car.

We might get LLM level mathematics, but not Human level mathematics, in the same way that we can get LLM level films (something like Avengers, or the final season of GoT), but we are not going to get Human level films.

I suspect that there are no general level mathematics without the geometric experience of humans, so for general level mathematics one has to go through perceptions and interactions with reality first. In that case, general mathematics is one level over "laying bricks or cutting hair", so more complex. And the paradox is only a paradox for superficial reasoning.

looofooo01y ago

Sure but checking everything is correctly wired, plug-in, cut etc. Everything needes is thought of? There is plenty of things an AI could do to help a trades man.

GuB-421y ago

Not the endgame by far. Maybe the endgame for LLMs, and I am not even convinced.

Maths is detached from reality. An AI capable of doing math better than humans may not be able do drive a car, as driving a car requires a good understanding of the world, it has to recognize object and understand their behavior, for example, understanding that a tree won't move but a person might, but it will move slower than another car. It has to understand the physics of the car: inertia, grip, control,... It may even have to understand human psychology and make ethical choices.

Fully autonomous robots would be the endgame.

ykonstant1y ago

>An AI capable of doing math better than humans may not be able do drive a car,

Noo, but my excuse for being unable to drive a car is precisely that I am a quirky mathematician focused on research!

If "better than humans" means when you give it a real world problem, it gives you a mathematical model to describe it (and does it better than human experts), then yes, it's the end game.

If it just solves a few formalized problems with formalized theorems, not so much. You can write a program that solves ALL the problems under formalized theorems already. It just runs very slowly.

titanomachy1y ago

I don’t think you can gloss over the importance of computational tractability here. A human could also start enumerating every possible statement in ZFC, but clearly that doesn’t make them a mathematician.

margorczynski1y ago

I doubt it. Math has the property that you have a way to 100% verify that what you're doing is correct with little cost (as it is done with Lean). Most problems don't have anything close to that.

AlotOfReading1y ago

Math doesn't have a property that you can verify everything you're doing is correct with little cost. Humans simply tend to prefer theorems and proofs that are simpler.

exe341y ago

to be fair, humans also have to run experiments to discover whether their models fit nature - AI will do it too.

Computers have been better than us at calculation since about a week after computers were invented.

If a computer proves the Reimann Hypothesis, someone will say "Oh of course, writing a proof doesn't require intelligence, it's merely the dumb application of logical rules, but only a human could have thought of the conjecture to begin with."

AlexCoventry1y ago

There are important forms of discernment and judgement which aren't captured by mathematics.

technotony1y ago

Yes, because if AI can do maths then it can use that to improve the efficiency/quality of it's algorithms to self improve...

The quality of AI algorithms is not based on formal mathematics at all. (For example, I'm unaware of even one theorem relevant to going from GPT-1 to GPT-4.) Possibly in the future it'll be otherwise though.

SkiFire131y ago

... or it might prove that it's impossible to self-improve given the current constraits

roncesvalles1y ago

I still think humans understand less than half of all the physics there is to understand.

abrookewood1y ago

I mean ... calculators can do better at mathematics than most of us. I don't think they are going to threaten us anytime soon.

wslh1y ago· 9 in thread

If you were to bet on solving problems like "P versus NP" using these technologies combined with human augmentation (or vice versa), what would be the provable time horizon for achieving such a solution? I think we should assume that the solution is also expressible in the current language of math/logic.

Probably a bad example, P vs NP is the most likely of the millennium problems to be unsolvable, so the answer may be "never".

I'll bet the most technical open problems will be the ones to fall first. What AIs lack in creativity they make up for in ability to absorb a large quantity of technical concepts.

wslh1y ago

Thank you for the response. I have a follow-up question: Could these AIs contribute to advancements in resolving the P vs NP problem? I recall that the solution to Fermat’s Last Theorem relied on significant progress in elliptic curves. Could we now say that these AI systems might play a similar role in advancing our understanding of P vs NP?

meindnoch1y ago

Ok, then the AI should formally prove that it's "unsolvable" (however you meant it).

uptownfunk1y ago

The hard part is in the creation of new math to solve these problems not in the use of existing mathematics. So new objects (groups rings fields) etc have to be theorized, their properties understood, and then that new machinery used to crack the existing problems. I think we will get to a place (around 5 years) where AI will be able to solve these problems and create these new objects. I don’t think it’s one of technology I think it’s more financial. Meaning, there isn’t much money to be made doing this (try and justify it for yourself) and so the lack of focus here. I think this is a red herring and there is a gold mine in there some where but it will likely take someone with a lot of cash to fund it out of passion (Vlad Tenev / Harmonic, or Zuck and Meta AI, or the Google / AlphaProof guys) but in the big tech world, they are just a minnow project in a sea of competing initiatives. And so that leaves us at the mercy of open research, which if it is a compute bound problem, is one that may take 10-20 years to crack. I hope I see a solution to RH in my lifetime (and in language that I can understand)

wslh1y ago

I understand that a group of motivated individuals, even without significant financial resources, could attempt to tackle these challenges, much like the way free and open-source software (FOSS) is developed. The key ingredients would be motivation and intelligence, as well as a shared passion for advancing mathematics and solving foundational problems.

> I think we will get to a place (around 5 years) where AI will be able to solve these problems and create these new objects.

For all we know, buried deep in AlphaProof's attempts to solve these toy problems, it already tried and discarded several new ideas.

Davidzheng1y ago

I think there's significant financial incentives for big tech given the scarcity of benchmarks for intelligence which are not saturated

hiddencost1y ago

No one is focused on those. They're much more focused on more rote problems.

You might find them used to accelerate research math by helping them with lemmas and checking for errors, and formalizing proofs. That seems realistic in the next couple of years.

There are some AI guys like Christian Szegedy who predict that AI will be a "superhuman mathematician," solving problems like the Riemann hypothesis, by the end of 2026. I don't take it very seriously, but that kind of prognostication is definitely out there.

nybsjytm1y ago· 5 in thread

Why have they still not released a paper aside from a press release? I have to admit I still don't know how auspicious it is that running google hardware for three days apiece was able to find half-page long solutions, given that the promise has always been to solve the Riemann hypothesis with the click of a button. But of course I do recognize that it's a big achievement relative to previous work in automatic theorem proving.

whatshisface1y ago

I don't know why so few people realize this, but by solving any of the problems their performance is superhuman for most reasonable definitions of human.

Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

zeroonetwothree1y ago

Well, multiply two large numbers instantly is a superhuman feat a calculator can do. I would hope we are going for a higher bar, like “useful”. Let’s see if this can provide proofs of novel results.

GregarianChild1y ago

We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs. So, in order to evaluate AlphaProof's achievements, we'd need to know how much of a shortcut AlphaProof achieved. A good proxy for that would be the total energy usage for training and running AlphaProof. A moderate proxy for that would be the number of GPUs / TPUs that were run for 3 days. If it's somebody's laptop, it would be super impressive. If it's 1000s of TPUs, then less so.

> A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

What does this have to do with a hypothetical automatic theorem prover?

GregarianChild1y ago

Google stopped publishing interesting AI work since they had their AI lead taken away by OpenAI, and mostly with tech that was pioneered, but not monetised by Google like transformers.

I imagine they are under pressure not to make this mistake again.

chompychop1y ago· 3 in thread

Is it currently possible to reliably limit the cut-off knowledge of an LLM (either during training or inference)? An interesting experiment would be to feed an LLM mathematical knowledge only up to the year of proving a theorem, and then see if it can actually come up with the novel techniques used in the proof. For example, having only access to papers prior to 1993, can an LLM come up with Wiles' proof of FLT?

That should be doable, e.g. by semi-automated curation of the pre-training dataset. However, since curating such large datasets and running pre-training runs is so expensive, I doubt that anybody will run such an experiment. Especially since would have to trust that the curation process was correct enough for the end-result to be meaningful. Checking that the curation process is not flawed is probably as expensive as running it in the first place.

n4r91y ago

There's the Frontier Math benchmarks [0] demonstrating that AI is currently quite far from human performance at research-level mathematics.

[0] https://arxiv.org/abs/2411.04872

data_maan1y ago

They didn't demonstrate anything. They haven't even released their dataset, nor mentioned how big it is.

It's just hot air, just like the AlphaProof announcement, where very little is know about their system.

Robotenomics1y ago· 3 in thread

“Only 5/509 participants solved P6”

This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.

(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)

letitgo123451y ago

Think more is made of this asterix than necessary. Quite possible adding 10x more GPUs would have allowed it to solve it in the time limit.

auggierose1y ago

Certainly an interesting information that AlphaProof needed three days. But does it matter for evaluating the importance of this result? No.

chvid1y ago· 2 in thread

Mathematicians have been using computers, programming languages, and proof engines for over half a century; however breakthroughs in mathematics are still made by humans in any meaningful sense, even though the tools they use and make are increasingly complex.

But as things look now, I will be willing to bet that the next major breakthrough in maths will be touted as being AI/LLMs and coming out of one of the big US tech companies rather than some German university.

Why? Simply, the money is much bigger. Such an event would pop the market value of the company involved by a hundred billion - plenty of incentive right there to paint whatever as AI and hire whoever.

But, these AI solutions are trying to solve math problems to prove their AI capabilities, not because they care about mathematics.

Sure. Why do you say "but"? Solving such a math problem (while perhaps massively overstating the role AI actually played in the solution) would be great PR for everyone involved.

sincerely1y ago· 2 in thread

in the first question, why do they even specify ⌊n⌋ (and ⌊2n⌋ and so on) when n is an integer?

rishicomplexOP1y ago

Alpha need not be an integer, we have to prove that it is

sincerely1y ago

Should have read more carefully, thank you!

sbierwagen1y ago

More information about the language used in the proofs: https://en.wikipedia.org/wiki/Lean_(proof_assistant)

j / k navigate · click thread line to collapse