AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms (opens in new tab)

(deepmind.google)

1036 pointsFysi0y ago270 comments

270 comments

From the paper, "Notably, for multiplying two 4 × 4 matrices, applying the algorithm of Strassen recursively results in an algorithm with 49 multiplications, which works over any field...AlphaEvolve is the first method to find an algorithm to multiply two 4 × 4 complex-valued matrices using 48 multiplications."

If you do naive matrix multiplication, you get a sense that you're doing similar work multiple times, but it's hard to quantify just what that duplicated work entails. Compare it to, for example, calculating the size of the union of two sets:

Total size = size(A) + size(B) - size(intersection(A, B))

You have to take out that extra intersection amount because you've counted it twice. What if you could avoid counting it twice in the first place? That's easy, you just iterate over each set once, keeping track of the elements you've already seen.

Strassen's algorithm keeps track of calculations that are needed later on. It's all reminiscent of dynamic programming.

What I find interesting is that it seems the extra savings requires complex values. There must be something going on in the complex plane that is again over-counting with the naive approach.

lf880y ago

By googling "4x4 matrices multiplication 48" I ended up on this discussion on math.stackexchange https://math.stackexchange.com/questions/578342/number-of-el... , where in 2019 someone stated "It is possible to multiply two 4×4 matrix A,B with only 48 multiplications.", with a link to a PhD thesis. This might mean that the result was already known (I still have to check the outline of the algorithm).

alexnovikov0y ago

One of the authors here. We are aware of the Winograd scheme, but note that it only works over commutative rings, which means that it's not applicable recursively to larger matrices (and doesn't correspond to a rank 48 factorization of the <4,4,4> matrix multiplication tensor). The MathOverflow answer had a mistake corrected in the comments by Benoit Jacob.

More details: the Winograd scheme computes (x1+ y2 )(x2+ y1 ) + (x3+y4)(x4+y3)-Ai-Bj, and relies on y2y1 (that comes from expanding the first brackets) cancelling with y1y2 in Bj=y1y2 + y3y4. This is fine when working with numbers, but if you want to apply the algorithm recursively to large matrices, on the highest level of recursion you're going to work with 4x4 block matrices (where each block is a big matrix itself), and for matrices Y2Y1 != Y1Y2 (for general matrices).

Here is a website that tracks fastest (recursively applicable) matrix multiplication algorithms for different matrix sizes, and it stands at 49: https://fmm.univ-lille.fr/4x4x4.html

UPD: s/fields/rings/ and fixed equation rendering

robinhouston0y ago

From some conversations on Twitter, it seems plausible that the rank-48 decomposition of the 4×4 matrix multiplication tensor really is new; and that perhaps where things have gone awry is attempting to summarise this result in a more lay-friendly manner: the algorithm in that post apparently doesn't constitute or imply a rank-48 tensor decomposition.

On the other side, it's claimed here that an algorithm that uses only 46 multiplications has been known since 1970: https://mathstodon.xyz/@fredrikj/114508287537669113

looofooo00y ago

Ironically their AI can cite the relevant paper with 46 steps if asked: https://gemini.google.com/share/b0d5d6a76c87

wbhart0y ago

As already noted in a post by fdej further down, Waksman's algorithm from 1970, which works over the complex numbers, requires only 46 multiplications (and I guess, divisions by 2, which may or may not be relevant depending on your actual ring).

nemonemo0y ago

The answer says "For rings in which division by 2 is permitted". Is there the same constraint for AlphaEvolve's algorithm?

Edit2: Z_2 has characteristics 2.

Edit: AlphaEvolve claims it works over any field with characteristic 0. It appears Waksman's could be an existing work. From the AlphaEvolve paper: "For 56 years, designing an algorithm with fewer than 49 multiplications over any field with characteristic 0 was an open problem. AlphaEvolve is the first method to find an algorithm to multiply two 4 × 4 complex-valued matrices using 48 multiplications."

2 more replies

mik0911mo ago

So, did LLM (namely Gemini-Flash) helepd with the combinatorial optimization process? I'm sure not all of their discoveries (one on kissing numbers, etc.) have previous solutions in some other form, but yeah these findings looks more like very large combinatorial optimization tasks.

jackpirate0y ago

It seems like you have some misconceptions about Strassen's alg:

1. It is a standard example of the divide and conquer approach to algorithm design, not the dynamic programming approach. (I'm not even sure how you'd squint at it to convert it into a dynamic programming problem.)

2. Strassen's does not require complex valued matrices. Everything can be done in the real numbers.

pontus0y ago

I think the OP was pointing out that the reason Strasssen's algorithm works is that it somehow uncovered a kind of repeated work that's not evident in a simple divide and conquer approach. It's by the clever definition of the various submatrices that this "overlapping" work can be avoided.

In other words, the power of Strasssens algorithm comes from a strategy that's similar to / reminiscent of dynamic programming.

kenjackson0y ago

I think the original poster was referring to the AlphaEvolve variant of Strassen's, not the standard Strassen (with respect to complex values).

meindnoch0y ago

>What I find interesting is that it seems the extra savings requires complex values. There must be something going on in the complex plane that is again over-counting with the naive approach.

"The rank of a tensor depends on the field over which the tensor is decomposed. It is known that some real tensors may admit a complex decomposition whose rank is strictly less than the rank of a real decomposition of the same tensor."

https://en.wikipedia.org/wiki/Tensor_rank_decomposition#Fiel...

Zr010y ago

A complex multiplication is "worth" at least 3 real multiplications.

alexnovikov11mo ago

Fair point! A single complex multiplication `(a+bi)(c+di)` indeed requires at least 3 real multiplications to be implemented.

However, when researchers (and systems like AlphaEvolve in this context) analyze fast matrix multiplication algorithms like Strassen's, the primary goal is usually to improve the asymptotic complexity (and understand the space of these algorithms better). This complexity is determined by the number of multiplications in the field over which the matrices are defined. * For real matrices, we count real scalar multiplications. * For complex-valued matrices (as in the 4x4 example where AlphaEvolve found a solution with 48 scalar multiplications), "scalar multiplication" refers to a complex scalar multiplication.

The key is that these are the operations you recurse on. Additions, or the constant factor cost of implementing one field's multiplication, don't change the exponent in the `N^(log_base(multiplications))` complexity. They are constant factors.

Of course, for practical performance on a specific 4x4 matrix, one would absolutely dive into the real operations, additions, memory layout, etc., but making 4x4 matrix multiplication practically faster on some particular hardware was not the focus in this section. (We do improve practical implementation of large matrix multiplications on the target hardware in the “Enhancing AI training and inference” section of the blog post.)

(Disclaimer: one of the authors.)

Davidzheng0y ago

Are you sure the saving needs complex values? I think their algorithm works over any char 0 field. Probably needs to just divide by some divisor of 4!=24 if I had to guess.

meindnoch0y ago

Their decomposition of the (4,4,4) matrix multiplication tensor is explicitly listed in their Colab notebook, and it contains complex numbers.

jasonjmcghee0y ago

> AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models

> In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.

> And in 20% of cases, AlphaEvolve improved the previously best known solutions

These sound like incredible results. I'd be curious what kind of improvements were made / what the improvements were.

Like, was that "up to a 32.5% speedup" on some weird edge case and it was negligible speed up otherwise? Would love to see the benchmarks.

schmidtleonard0y ago

Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is a big win that you often don't get by default, just because the number of important kernels times important GPUs times effort to properly tune one is greater than what people are willing to do for others for free in open source. Not to mention kernel fusion and API boundaries that socially force suboptimal choices for the sake of clarity and simplicity.

It's a very impressive result, but not magic, but also not cheating!

hiddencost0y ago

100%. LLMs are extremely useful for doing obvious but repetitive optimizations that a human might miss.

jerjerjer0y ago

What it essentially does is a debugging/optimization loop where you change one thing, eval, repeat it again and compare results.

Previously we needed to have a human in the loop to do the change. Of course we have automated hyperparameter tuning (and similar things), but that only works only in a rigidly defined search space.

Will we see LLMs generating new improved LLM architectures, now fully incomprehensible to humans?

2 more replies

thesz0y ago

One can have obvious but repetitive optimizations with symbolic programming [1].

[1] https://arxiv.org/abs/1012.1802

Strange that AlphaEvolve authors do not compare their work to what is achievable by equality saturation. An implementation of equality saturation can take interesting integrals with very simple rules [2].

[2] https://github.com/alt-romes/hegg/blob/master/test/Sym.hs#L3...

jasonjmcghee0y ago

Absolutely - not arguing that the results are unreasonable to the point of illegitimacy - just curious to see when they perform as well as reported and how well the presented solutions generalize to different test cases - or if it's routing to different solutions based on certain criteria etc.

othorns0y ago

Hey, do you have any suggestions for resources to learn more about this kind of custom optimisation? Sounds interesting, but not sure where to start?

samvher0y ago

https://ppc.cs.aalto.fi/ covers some of this (overlapping with the topics the person you responded to mentioned, but not covering all, and including some others)

cubefox0y ago

> AlphaEvolve is accelerating AI performance and research velocity. By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time.

cavisne0y ago

From the paper it was a speedup on the XLA GPU kernel they wrote using Jax, which is probably not SOTA. I don't think Jax even has a official flash attention implementation.

yarri0y ago

Not sure what “official” means but would direct you to the GCP MaxText [0] framework which is not what this GDM paper is referring to but rather this repo contains various attention implementations in MaxText/layers/attentions.py

[0] https://github.com/AI-Hypercomputer/maxtext

Amekedl0y ago

I'm thinking reading numbers like this is really just slop lately.

FA achieving a 32.5% speed up? Cool.

Why not submit it as a PR to the Flash Attention repo then? Can I read about it more in detail?

dpflan0y ago

I have not read this linked article, but your comment made me recall a discussion about a speed up of CUDA kernels presented by Sakana AI Labs. The researcher Ravid Shwartz Ziv at NYU posted about it on LinkedIn [1], and here is the Twitter post of interest [2]

""" Yesterday's news about Sakana AI Labs provided an important lesson for all of us working with AI agents. Their announcement of an AI system that could supposedly optimize CUDA kernels to run 100x faster initially seemed like exactly the kind of use cases we've been hoping for in AI-assisted development.

Like many others, I was excited about it. After all, isn't this exactly what we want AI to do - help us optimize and improve our technical systems?

However, careful investigation by the community (on Twitter) revealed a different story. What really happened? The AI-generated CUDA kernel appeared to achieve incredible speedups, but the code was inadvertently reusing memory buffers containing previous results, essentially bypassing the actual computation. When properly evaluated, the kernel actually runs about 3x slower than the baseline. """

[1] https://www.linkedin.com/posts/ravid-shwartz-ziv-8bb18761_ye...

[2] https://x.com/main_horse/status/1892473238036631908

pram0y ago

lmao this is exactly the kind of stuff I always see from Claude. It’s like adding a Skip() to a test and declaring it works now. “Well it’s a lot faster, I met the criteria of my TODOs cya”

I’ve seen it so much I kinda doubt it was “inadvertent” because they’re like seemingly intentional about their laziness, and will gaslight you about it too.

4 more replies

sameermanek0y ago

Exactly, as a great dev once said: "talk is cheap, show me the code"

yarri0y ago

I assume the Gemini results are JAX/PAX-ML/Pallas improvements for TPUs so would look there for recent PRs

qq12as0y ago

This is great.

But how incremental are these advancements?

I picked one at random (B.2 -- the second autocorrelation inequality). Then, I looked up the paper that produced the previous state of the art (https://arxiv.org/pdf/0907.1379). It turns out that the authors had themselves found the upper bound by performing a numerical search using "Mathematica 6" (p.4). Not only did the authors consider this as a secondary contribution (p.2), but they also argued that finding something better was very doable, but not worth the pain:

"We remark that all this could be done rigorously, but one needs to control the error arising from the discretization, and the sheer documentation of it is simply not worth the effort, in view of the minimal gain." (p.5)

So at least in this case it looks like the advancement produced by AlphaEvolve was quite incremental (still cool!).

prvc0y ago

Merely from your telling, it seems it is no longer "not worth the effort", as "the effort" has been reduced drastically. This is itself significant.

AndrewKemendo0y ago

That right and In fact it’s the core purpose of the tool.

This is complex automation which by definition compresses the solution into a computable process that works more efficiently than the non-automated process

That, in fact, is the revolutionary part - you’re changing how energy is used to solve the problem.

codr70y ago

Faster, yes; more efficiently...I guess that's why they're funding nuclear plants then?

2 more replies

SideQuark11mo ago

This advance likely uses more compute than the authors in 2009 could have imagined. It most certainly is not drastically reduced effort.

cman144411mo ago

That assumes that compute = effort, which is not how most people would interpret it I think.

1 more reply

empath750y ago

This is exactly why I think the concerns about AI taking people's jobs are overblown. There is not a limited amount of knowledge work to do or things that can be invented or discovered. There's just work that isn't worth the effort, time or money to do right now, it doesn't mean it's not valuable, it's just not cost effective. If you reduce effort, time and money, then suddenly you can do it.

Like even just for programming. I just had an AI instrument my app for tracing, something I wanted to do for a while, but I didn't know how to do and didn't feel like figuring out how to do it. That's not work we were likely to hire someone to do or that would ever get done if the AI wasn't there. It's a small thing, but small things add up.

geodel0y ago

It is not some very explicit threshold beyond which AI will take job but before it won't. What's already happening is long drawn attrition where tools at different level of code, low code , no code will keep creeping up. And it will start with people are not respected or valued for their work, so they can leave, once left, they will not be replaced or replaced lower skilled folks and at some point that position stop existing altogether.

In a way it is nothing new but natural progression of technology. It is increasing pace of change that is different. Can a person learn some skills by their 20s and apply productively throughout their lifetime? Now at this point it is so thoroughly untrue that I'd be laughed out if I asked for such thing. We are told to up skill few times in career to up-skilling continuously.

As changes are getting faster and faster more people are gonna fall wayside and of course they can blame themselves for their predicament.

1 more reply

luckydata0y ago

not worth the time for a human, but if you can throw AI at all of those "opportunities" it adds up substantially because all the chores can be automated.

runsWphotons0y ago

Yeah for the kissing number stuff people can find slight improvements if they want. It usually isn't worth it because it provides no insight. But maybe when you generate a lot of them one or some family will turn out to be interesting.

baq0y ago

If this is not the beginning of the take off I don’t know what is.

quantumHazer0y ago

Cool, but don't get me wrong, isn't this essentially similar to Google's Co-Scientist, where multiple models are in a loop, passing context back and forth validating things? At its core, it's still a system of LLMs, which is impressive in execution but not fundamentally new.

LLMs are undoubtedly useful at tasks like code "optimisation" and detecting patterns or redundancies that humans might overlook, but this announcement feels like another polished, hypey blog post from Google.

What's also becoming increasingly confusing is their use of the "Alpha" branding. Originally, it was for breakthroughs like AlphaGo or AlphaFold, where there was a clear leap in performance and methodology. Now it's being applied to systems that, while sophisticated, don't really rise to the same level of impact.

edit: I missed the evaluator in my description, but an evaluation method is applied also in Co-Scientist:

"The AI co-scientist leverages test-time compute scaling to iteratively reason, evolve, and improve outputs. Key reasoning steps include self-play–based scientific debate for novel hypothesis generation, ranking tournaments for hypothesis comparison, and an "evolution" process for quality improvement."[0]

[0]: https://research.google/blog/accelerating-scientific-breakth...

chriskanan0y ago

They address this in the AlphaEvolve paper:

"While AI Co-Scientist represents scientific hypotheses and their evaluation criteria in natural language, AlphaEvolve focuses on evolving code, and directs evolution using programmatic evaluation functions. This choice enables us to substantially sidestep LLM hallucinations, which allows AlphaEvolve to carry on the evolution process for a large number of time steps."

latentsea0y ago

If they ever do change their stance on that and give in to vibe coding, at least there is the opportunity to brilliantly rebrand as DeepVibe.

3abiton0y ago

It is interesting how google turned the tide on GenAI race, and seems to be leading the pack, with not only great fundamental research, but also interesting model and products. To what extent these remain a niche/nice to have or become a sensation remains to be seen, but I hope if they don't reach hype status, they might be released to the open weights world.

ActorNightly0y ago

People often forget that Google was behind Mu Zero, which IMO is the most important AI paper of the decade, not the Transformer one, because they effectively showed how models can learn how to search.

For example, for self driving, it makes much more sense to treat it like a game, where the model learns the evolution of the surrounding environment, and learns how its own actions affect it, and can MCTS its way into correct behavior - specifically because once it learns the environment dynamics, it can internally simulate crashes and retrain itself.

If this process is refined (namely the functions that control direction of training) , you can pretty much start training a model on the dataset of real world (sights, sounds, physical interactions, as well as digital ones), and as it learns the environment, it can be further and further refined, and then we get to the point where it can self evolve its decision making to be truly considered "intelligent".

JFingleton0y ago

> It is interesting how google turned the tide on GenAI race, and seems to be leading the pack

I think this is perhaps due to Google combining Google Brain and DeepMind, and putting Demis Hassabis at the helm?

I agree, Google is very much leading the pack in AI now. My worry is that they have mentioned recently that they are less inclined to release research into the open if they think it will give their competition a step-up. Demis is more scientist than business-man, so perhaps there's hope that he will be willing to continue to release research.

Workaccount20y ago

Few things are more Google than having two distinct teams building two distinct products that are essentially the same thing.

coredog640y ago

You can contrast that with Microsoft, where the same team is building the same product with two distinct names.

luckydata0y ago

this is the same team and it's pretty obvious they would apply the same ideas to two different problems that can both benefit from it no?

quantumHazer0y ago

So we are rebranding the same idea every four months and call it a breakthrough?

1 more reply

mistrial90y ago

pardon "Google's Co-Scientist" ? There are multiple projects called that?

quantumHazer0y ago

Yep

https://research.google/blog/accelerating-scientific-breakth...

mistrial90y ago

https://engineering.cmu.edu/news-events/news/2023/12/20-ai-c...

Workaccount20y ago

Interestingly, it seems alphaevolve has already been in use for a year, and it is just now being publicly shown. The paper also mentions that it uses Gemini 2.0 (pro and flash), which creates a situation where Gemini 2.0 was used in a way to train Gemini 2.5.

I don't know if I would call this the fabled "self improving feedback loop", but it seems to have some degree of it. It also begs the question if Alphaevolve was being developed for a year, or has been in production for a year. By now it makes sense to hold back on sharing what AI research gems you have discovered.

baq0y ago

If you have the brain power, the compute and control the hardware, what is there to prevent the take off feedback loop? Deepmind is at this point in the timeline uniquely positioned.

david-gpu0y ago

> If you have the brain power, the compute and control the hardware, what is there to prevent the take off feedback loop?

In the specific context of improving our AI hardware, for example, it's not as simple as coming up with a good idea -- hardware companies hire thousands of people to improve their designs. Prototypes need to be implemented, verified, quantified, compared thoroughly with the alternatives, then the idea is approved for production, which again leads to a cascade of implementation, verification, etc. until they can reach consumers. In order to make these improvements reach the consumer significantly faster you need to accelerate all of the steps of the very simplified pipeline mentioned earlier.

More generally, an argument can be made that we have been in that take off feedback loop for hundreds of years; it's just that the rate of improvement hasn't been as spectacular as we may have hoped for because each incremental step simply isn't that big of a deal and it takes quite a bit of time to reach the next one.

logicchains0y ago

The fact that all computational problems have a best case complexity bound and there are generally diminishing marginal returns as algorithms approach that bound (i.e. lower hanging fruit are found first). E.g. no amount of intelligence is going to find an algorithm that can sort an array of any arbitrary Comparable type on a single CPU thread faster than O(n*log(n)). There's room for improvement in better adapting algorithms to cache hierarchy etc., but there's only a fixed amount of improvement that can be gained from that.

drexlspivey0y ago

They have been doing this for years. Headline from 2016:

"DeepMind AI Reduces Google Data Centre Cooling Bill by 40%"

https://deepmind.google/discover/blog/deepmind-ai-reduces-go...

factibicongue0y ago

Running out of improvements after the first pass would prevent that. Who is to say this Alpha Evolve is not already obsolete, having already served its purpose?

Not to sound metaphysical or anything, but dependency on artificial intelligence seems to be something you would find at the peak of Mount Stupid (where the Darwin Awards are kept).

I am late for a chess game, l8r sk8rs.

throwaway3141550y ago

> which creates a situation where Gemini 2.0 was used in a way to train Gemini 2.5.

The use of synthetic data from prior models to create both superior models and distilled models has been going on since at least OpenAI's introduction of RLHF, and probably before that too.

happyopossum0y ago

> The use of synthetic data from prior models

That’s distinct from those prior models providing actual code to improve the next model

factibicongue0y ago

It is really about autonomy. Can it make changes to itself without human review? If it does, what is the proof such changes won't just stop at some point? All I am seeing here is a coder assist tool, and unsure how helpful inexplicable solutions are in the long run. Could result in an obtuse code base. Is that the point?

Imnimo0y ago

I'm surprised by how little detail is given about the evolution procedure:

>In AlphaEvolve, the evolutionary database implements an algorithm that is inspired by a combination of the MAP elites algorithm [71] and island-based population models [80, 94].

"inspired by" is doing a lot of heavy lifting in this sentence. How do you choose dimensions of variation to do MAP-elites? How do you combine these two algorithms? How loose is the inspiration? It feels like a lot of the secret sauce is in the answers to these questions, and we get a single paragraph on how the evolution procedure works, which is so vague as to tell us almost nothing.

pilooch0y ago

Yes the 2023 reference on island based evolution with LLMs (nature article) https://www.nature.com/articles/s41586-023-06924-6 has more details.

Agreed the dimensions/features are key. These white papers are an insult to science...

hariseldom0y ago

https://arxiv.org/pdf/2501.09891v1 from deepmind in January goes into the evolutionary algorithm a bit - no math though.

letitgo123450y ago

Most straightforward would be to ask the model to generate different evaluation metrics (which they already seem to do) and use each one as one of the dimensions

moritonal0y ago

For the people awaiting the singularity, lines like this written almost straight from science fiction:

> By suggesting modifications in the standard language of chip designers, AlphaEvolve promotes a collaborative approach between AI and hardware engineers to accelerate the design of future specialized chips."

vighneshiyer0y ago

Here is the relevant bit from their whitepaper (https://storage.googleapis.com/deepmind-media/DeepMind.com/B...):

> AlphaEvolve was able to find a simple code rewrite (within an arithmetic unit within the matmul unit) that removed unnecessary bits, a change validated by TPU designers for correctness.

I speculate this could refer to the upper bits in the output of a MAC circuit being unused in a downstream connection (perhaps to an accumulation register). It could also involve unused bits in a specialized MAC circuit for a non-standard datatype.

> While this specific improvement was also independently caught by downstream synthesis tools, AlphaEvolve’s contribution at the RTL stage demonstrates its capability to refine source RTL and provide optimizations early in the design flow.

As the authors admit, this bit-level optimization was automatically performed by the synthesis tool (the equivalent to this in the software-world is dead code elimination being performed by a compiler). They seem to claim it is better to perform this bit-truncation explicitly in the source RTL rather than letting synthesis handle it. I find this dubious since synthesis guarantees that the optimizations it performs do not change the semantics of the circuit, while making a change in the source RTL could change the semantics (vs the original source RTL) and requires human intervention to check semantic equivalence. The exception to this is when certain optimizations rely on assumptions of the values that are seen within the circuit at runtime: synthesis will assume the most conservative situation where all circuit inputs are arbitrary.

I do agree that this reveals a deficiency in existing synthesis flows being unable to backannotate the source RTL with the specific lines/bits that were stripped out in the final netlist so humans can check whether synthesis did indeed perform an expected optimization.

> This early exploration demonstrates a novel approach where LLM-powered code evolution assists in hardware design, potentially reducing time to market.

I think they are vastly overselling what AlphaEvolve was able to achieve. That isn't to say anything about the potential utility of LLMs for RTL design or optimization.

j2kun0y ago

This just means that it operates on the (debug text form of the) intermediate representation of a compiler.

proof_by_vibes0y ago

Not necessarily. Theorem provers provide goals that can serve the same function as "debug text." Instead of interpreting the natural language chosen by the dev who wrote the compiler, these goals provide concrete, type-accurate statements that indicate the progress of an ongoing proof.

j2kun0y ago

I'm referring to what the authors actually claim they did in the paper. They operated on XLA-generated textual IR.

Cf. the second paragraph of 3.3.4 of https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

4b11b40y ago

Sure but remember that this approach only works for exploring an optimization for a function which has a well defined evaluation metric.

You can't write an evaluation function for general "intelligence"...

zaken0y ago

Honestly it's this line that did it for me:

> AlphaEvolve enhanced the efficiency of Google's data centers, chip design and AI training processes — *including training the large language models underlying AlphaEvolve itself*.

Singularity people have been talking for decades about AI improving itself better than humans could, and how that results in runaway compounding growth of superintelligence, and now it's here.

dgacmu0y ago

Most code optimizations end up looking somewhat asymptotic towards a non-zero minimum.

If it takes you a week to find a 1% speedup, and the next 0.7% speedup takes you 2 weeks to find ... well, by using the 1% speedup the next one only takes you 13.86 days. This kind of small optimization doesn't lead to exponential gains.

That doesn't mean it's not worthwhile - it's great to save power & money and reduce iteration time by a small amount. And it combines with other optimizations over time. But this is in no way an example of the kind of thing that the singularity folks envisioned, regardless of the realism of their vision or not.

TimTheTinker0y ago

Exactly - the possible improvements may compound, but they converge logarithmically towards an upper limit absent new insight that establishes a new upper limit.

ActorNightly0y ago

Long way to go Singularity. We don't even know if its possible.

Basically, singularity assumes that you can take the information about the real world "state", and compress it into some form, and predict the state change faster than reality happens. For a subset of the world, this is definitely possible. But for entire reality, it seems that there is a whole bunch of processes that are computationally irreducible, so an AI would never be able to "stay ahead" or so to to speak. There is also the thing about computational ir-reversibility - for example observing a human behavior is seeing the output of a one way hashing function of neural process in our brain that hides a lot of detail and doesn't let you predict it accurately in all cases.

Also, optimization algorithms are nothing new. Even before AI, you could run genetic algorithm or PSO on code, and given enough compute it would optimize the algorithm, including itself. The hard part that nobody has solved this is abstracting this to a low enough level to where its applicable across multiple layers that correspond to any task.

For example, let say you have a model (or rather an algorithm) that has only a single interface, and that is the ability to send ethernet packets, and it hasn't been trained on any real world data at all. If you task it with building you a website that makes money, the same algorithm that iterates over figuring out how to send IP packets then TCP packets then HTTP packets should also be able to figure out what the modern world wide web looks like and what concepts like website and money is, building its knowledge graph and searching on it and interpolating on it to figure out how to solve the problem.

nebula88040y ago

We are further getting to the point where no one on the planet understand how any of this stuff really works. This will last us until a collapse. Then we are done for.

factibicongue0y ago

The singularity has always existed. It is located at the summit of Mount Stupid, where the Darwin Awards are kept. AI is really just psuedo-intelligence; an automated chairlift to peak overconfidence.

pineaux0y ago

I love these confident claims! It sounds like you really know what you are talking about. It's either that or you are projecting. Could you elaborate? I for one find the level of intelligence quite real, I use AIs to do a lot of quite complex stuff for me nowadays. I have an agent that keeps my calendar, schedules appointments with people that want meetings with me, summarizes emails and add these summaries to notion and breaks them up in todo-lists, answers questions about libraries and APIs, writes most of my code (although I do need to hold it's hand and it cannot improve by learning from me).

nowittyusername0y ago

This is an important moment. We now have verifiable evidence that these systems can do new useful research that has actual value in the real world. That 1% savings is only the start as well. I would expect the compounding number of gains to be significant over some time. Also in a way this process was used to make gemini 2.5 pro better, so its like a baby step towards recursive self improvement. Not fully automated yet, but there are hints of where this is going.

_delirium0y ago

Genetic programming systems have periodically made improvements to algorithms (dating back decades). Whether LLM-powered GP, which is effectively what this is, will be a step change or an evolution of that is still an open question I think. I'm also a little wary of reading too much into the recursive self-improvement idea, because "the GP system can use GP to improve the GP system itself!" is a very old idea that just has never worked, although I realize that isn't proof that it won't eventually work.

Some related work from a different company: https://sakana.ai/ai-cuda-engineer/

And some academic papers kind of in this space: https://arxiv.org/abs/2206.08896, https://arxiv.org/abs/2302.12170, https://arxiv.org/abs/2401.07102

sebstefan0y ago

It's always "revolutionizing our internal workflows" or "30% of code at Microsoft is AI now" but never improving a codebase you can actually see

Making a significant improvement to the state of the art of one particular algorithm is one thing, but I've seen new tools do that since the 80s

I'll be convinced when LLMs start making valuable pull requests, non-obvious corner cases or non-trivial bugs in mature FOSS projects

antihipocrat0y ago

Is it new? I'm getting mixed messages from the posts here. On one side there is evidence that 48 and 46 multiplication solutions have been known (and could have found themselves in the model training data).

On the other side I see excitement that the singularity is here.

If the latter were the case surely we wouldn't be reading about it in a published paper, we would already know.

dymk0y ago

Let's assume that the 46 multiplication algorithm was known, prior to AlphaEvolve re-discovering it. AlphaEvolve still has made an improvement to a performance critical area that has had likely had thousands of engineer-hours put into it. None of those engineers apparently knew about the improved algorithm, or were able to implement the algorithm. This is empirical evidence of an LLM outperforming its (domain expert) human counterparts.

amelius0y ago

Isn't this like comparing a human historian to Wikipedia though? Of course the knowledge in Wikipedia will in most cases beat the human. However, that's not the kind of thing we're looking for here.

1 more reply

friederrr0y ago

One of the researchers quoted in Nature link here... in the past, when DeepMind published "AlphaTensor" [1][2] in October 2022, it took a single day (!!), see [3], for improvements to the AlphaTensor-based scheme to be discovered. This was then a few months later generalized into a significantly more comprehensive scheme [4]. I do not know whether the more general scheme that was discovered in [4] made its way back to some improved version of AlphaTensor - but this nonetheless shows that AlphaEvolve may also change, as it becomes absorbed by the community.

[1] Blog: https://deepmind.google/discover/blog/discovering-novel-algo...

[2] Paper: https://www.nature.com/articles/s41586-022-05172-4

[3] arxiv.org/pdf/2210.04045

[4] arxiv.org/abs/2212.01175 Flip graphs for matrix multiplication

(Reposted from here, where I made a mini deep-dive into this: https://x.com/friederrrr/status/1922846803420119410?t=7jZ34P...)

markisus0y ago

The paper does not give that many details about the evolution part. Normally, evolutionary algorithms contain some cross-over component where solutions can breed with each other. Otherwise it's better classified as hill climbing / beam search.

mattdesl0y ago

There's also 'evolutionary strategy' algorithms that do not use the typical mutation and crossover, but instead use a population of candidates (search samples) to basically approximate the gradient landscape.

slickQ11mo ago

The model is fed a few samplings of previous attempts and their evaluations during the optimization of the current algorithm. Using that information, the model is able to combine components of previous attempts into the current attempt at will. That is because all of this is fed into a single prompt, which the LLM can reference arbitrarily. So recombination is well represented here, bringing it closer to a genetic algorithm. In essence, it combines elements from hill climbing, beam search, and genetic algorithms by virtue of its unbounded nature as an LLM.

quantumHazer0y ago

I fear it’s not really evolutionary algorithms in the typical sense.

vessenes0y ago

One intriguing caption mentioned something requiring 16 “mutations”. I’d sure like to know how these mutations work.

Ey7NFZ3P0nzAe0y ago

Seems very exotic for RL or agents, but not for genetic algorithms

xianshou0y ago

Calling it now - RL finally "just works" for any domain where answers are easily verifiable. Verifiability was always a prerequisite, but the difference from prior generations (not just AlphaGo, but any nontrivial RL process prior to roughly mid-2024) is that the reasoning traces and/or intermediate steps can be open-ended with potentially infinite branching, no clear notion of "steps" or nodes and edges in the game tree, and a wide range of equally valid solutions. As long as the quality of the end result can be evaluated cleanly, LLM-based RL is good to go.

As a corollary, once you add in self-play with random variation, the synthetic data problem is solved for coding, math, and some classes of scientific reasoning. No more modal collapse, no more massive teams of PhDs needed for human labeling, as long as you have a reliable metric for answer quality.

This isn't just neat, it's important - as we run out of useful human-generated data, RL scaling is the best candidate to take over where pretraining left off.

resiros0y ago

Skimmed quickly the paper. This does not look like RL. It's a genetic algorithm. In a previous life I was working on compbio (protein structure prediction), we built 100s of such heuristic based algorithm (monte carlo simulated annealing, ga..). The moment you have a good energy function (one that provide some sort of gradient), and a fast enough sampling function (llms), you can do looots of cool optmization with sufficient compute.

I guess that's now becoming true with LLMs.

Faster LLMs -> More intelligence

UncleOxidant0y ago

> This does not look like RL. It's a genetic algorithm.

couldn't you say that if you squint hard enough, GA looks like a category of RL? There are certainly a lot of similarities, the main difference being how each new population of solutions is generated. Would not at all be surprised that they're using a GA/RL hybrid.

vjerancrnjak0y ago

Genetic algorithm is worse than gradient descent.

If variety is sought, why not beam with nice population statistic.

moregrist0y ago

This depends quite a bit of what you’re trying to optimize.

Gradient descent is literally following the negative of the gradient to minimize a function. It requires a continuous domain, either analytical or numerical derivatives of the cost function, and has well-known issues in narrow valleys and other complex landscapes.

It’s also a local minimization technique and cannot escape local minima by itself.

_Stochastic_ gradient descent and related techniques can overcome some of these difficulties, but are still more or less local minimization techniques and require differentiable and continuous scoring functions.

In contrast, genetic algorithms try to find global minima, do not require differentiable scoring functions, and can operate on both continuous and discrete domains. They have their own disadvantages.

Different techniques for different problems. The field of numerical optimization is vast and ancient for a reason.

1 more reply

yorwba0y ago

You also need a base model that can satisfy the verifier at least some of the time. If all attempts fail, there's nothing there to reinforce. The reinforcement-learning algorithms themselves haven't changed much, but LLMs got good enough on many problems that RL could be applied. So for any given class of problem you still need enough human data to get initial performance better than random.

skybrian0y ago

There's no API or product yet, so it seems unlikely that they made it to a "just works" level of polish?

They are having some success in making it work internally. Maybe only the team that built it can get it to work? But it does seem promising.

unignorant0y ago

This technique doesn't actually use RL at all! There’s no policy-gradient training, value function, or self-play RL loop like in AlphaZero/AlphaTensor/AlphaDev.

As far as I can read, the weights of the LLM are not modified. They do some kind of candidate selection via evolutionary algorithms for the LLM prompt, which the LLM then remixes. This process then iterates like a typical evolutionary algorithm.

modeless0y ago

IMO RL can only solve "easy" problems. The reason RL works now is that unsupervised learning is a general recipe for transforming hard problems into easy ones. But it can't go all the way to solutions, you need RL on top for that. Yann LeCun's "cherry on top" analogy was right.

smattiso0y ago

Are there platforms that make such training more streamlined? Say I have some definition of success for a given problem and it’s data how do I go about generating said RL model as fast and easily as possible?

vrm0y ago

We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.

4b11b40y ago

This isn't quite RL, right...? It's an evolutionary approach on specifically labeled sections of code optimizing towards a set of metrics defined by evaluation functions written by a human.

I suppose you could consider that last part (optimizing some metric) "RL".

However, it's missing a key concept of RL which is the exploration/exploitation tradeoff.

TechDebtDevin0y ago

Most things are verifiable, just not with code. I'm not particularly excited for a world where everything is predictable. This is coming from a guy who loves forecasting/prediction modeling too, but one thing I hate about prediction modeling, especially from a hobbyist standpoint is data. Its very hard to get useful data. Investors will literally buy into hospital groups to get medical data for example.

There are monopolies on the coolest sets of data in almost all industries, all the RL in the world won't do us any good if those companies doing the data hoarding are only using it to forecast outcomes that will make them more money, not what can be done to better society.

spyckie20y ago

I think you mean the general class of algorithms that scale with compute times, RL being the chief example. But yes I agree to that point.

obsolete_wagie0y ago

Yup. Its coming. Any verifiable human skill will be done by ai.

HappyPanacea0y ago

Interestingly, they improved matrix multiplication and there was a paper on Arxiv a few days ago [1] that also improved matrix multiplication and the only case common to both is <4,5,6> (multiplying 4x5 matrix with 5x6 matrix) and they both improved it from 93 to 90.

[1]: https://arxiv.org/html/2505.05896v1

bee_rider0y ago

There’s been a ton of work on multiplying very large matrices. But actually, I have no idea—how well explored is the space of multiplying small matrices? I guess I assume that, like, 4x4 is done very well, and everything else is kind of… roll the dice.

catigula0y ago

It's hard to stake out a defensible position on bold claims like these because, if they were as presented, it's hard to see how you haven't simply completed runaway AI.

Philosophically, let's say you talk an old LLM through a new discovery. Thanks to your instruction, the LLM now has access to "new" information not in its training data. It is certainly capable of this. The problem in is that this is just laundered human intelligence.

Davidzheng0y ago

runaway AI is a process not a moment.

lionkor0y ago

Show the training set, and PROVE that the tasks and answers aren't in there. I don't understand why this is not a default first step for proving that this is creating new knowledge.

thatguysaguy0y ago

Are you claiming that for the open problems they give record-breaking solutions for, there were just answers on the web waiting to be found?

lionkor0y ago

No, I'm saying they have a massive database of solutions (the training set) and don't even bother proving that their solution isn't in there. I'm not claiming something, they are failing to provide some necessary information here

thatguysaguy11mo ago

Many of the problems (such as the matmul and packing problems) have existing "scoreboards". It would be quite surprising if so many records had already been broken without it coming to the attention of the people who care about the problem.

1 more reply

Davidzheng0y ago

Well that's harder than maybe solving well-known open problems (whose soln's are presumably not in training set lol) but it seems that their examples are not clearly breaking sota, especially on matmul

thw09j9m0y ago

It's Google. Assume the training set contains, as a subset, the entirety of all public digitized information. How would you like to them to share it?

lionkor0y ago

If they wanna do research where they claim they did something novel, without showing that they didn't just "look it up" in their massive training set, then yes, they should share what is and what isn't contained within.

kookamamie0y ago

As a tar, please.

tananaev0y ago

How can you actually verify it, even if they provide something?

lionkor0y ago

That's my point; you can't. They have no idea if their model came up with any of this or not.

Ericson23140y ago

Why do I get the feeling they are doing the "IBM Watson" thing where different efforts are being put underneath the same brand name?

Not saying it is that egregious, but it's a slippery slope from "well, it didn't do all these different things out of the box, unsupervised".

FeepingCreature0y ago

One thing I've really internalized since IBM Watson is that the first reports of any breakthrough will always be the most skeevy. This is because to be amplified it can be either true or exaggerated, and exaggeration is easier. That is to say, if you model the process as a slowly increasing "merit term" plus a random "error term", the first samples that cross a threshold will always have unusually high errors.

For this reason, hype-driven/novelty-driven sites like HN usually overestimate initial developments, because they overestimate the merit term, and then underestimate later developments - because they now overestimate the error term from their earlier experience.

voidspark0y ago

IBM Watson is old, before deep learning.

Deep learning systems have exceeded the hype. In 2016 we saw potential with models like AlphaGo Zero but no one could foresee the capability of LLMs (a type of deep learning model).

voidspark0y ago

Gemini refers specifically to a family of multimodal LLMs, which is exactly what they are using here.

They have other models with different names used for different purposes.

https://ai.google/get-started/our-models/

rurban0y ago

Did you see that halluzination in the paper?

It optimized

   initializers.normal (0.0

   initializers.normal (0 + 1j * 0,

I thought the results were being reviewed?

Anyway, impressive results. That's why OpenAI and Elon were so frightened about Hassabi.

thatguysaguy0y ago

I mean that is changing the dtype, perhaps that's relevant.

artninja19880y ago

Interesting to see Terence Tao in the authors list. I guess he's fully ai pilled now. Did he check the math results?

ak_1110y ago

He is not in the author list, just acknowledged by the authors.

amelius0y ago

Maybe this is interesting: the whitepaper says:

> Most of these discoveries are on open problems suggested to us by external mathematicians Javier Gomez Serrano and Terence Tao, who also advised on how to best formulate them as inputs to AlphaEvolve. This highlights the potential for synergistic partnerships between AI-driven discovery engines like AlphaEvolve and human mathematical expertise.

kevlened0y ago

I wonder if evolvable hardware [0] is the next step.

In 1996, they optimized an FPGA using a genetic algorithm. It evolved gates disconnected from the circuit, but were required.

The circuit exploited the minuscule magnetic fields from the disconnected gates rather than the logical connections.

[0] https://en.wikipedia.org/wiki/Evolvable_hardware

KeplerBoy0y ago

And nothing came of that. 30 years later and programming FPGAs is still a pain.

rriley0y ago

We are entering a new era of evolutionary algorithms and LLMs. Reminds me of the idea behind: https://github.com/DivergentAI/dreamGPT

saretup0y ago

That’s a really cool idea. I often used https://dannymator.itch.io/randomicon to come up with novel ideas, never thought of feeding random words to llm as a way of doing it.

modeless0y ago

Interesting that this wasn't tested on ARC-AGI. Francois has always said he believed program search of this type was the key to solving it. It seems like potentially this approach could do very well.

spyckie20y ago

My thought as well. How well does it translate into arc agi? If it does well then we have a general purpose super intelligence… so maybe agi?

suddenlybananas11mo ago

Given they didn't report how good it is at it, it's probably not very good at it.

joelthelion0y ago

This looks like something that can (and should) be reimplemented open-source. It doesn't look like a particularly daunting project.

bluecoconut0y ago

I've been working on something very similar as a tool for my own AI research -- though I don't have the success they claim. Mine often plateaus on the optimization metric. I think there's secret sauce in the meta-prompting and meta-heuristic comments from the paper that are quite vague, but it makes sense -- it changes the dynamics of the search space and helps the LLM get out of ruts. I'm now going to try to integrate some ideas based off of my interpretation of their work to see how it goes.

If it goes well, I could open source it.

What are the things you would want to optimize with such a framework? (So far I've been focusing on optimizing ML training and architecture search itself). Hearing other ideas would help motivate me to open source if there's real demand for something like this.

friederrr0y ago

This does seem similar to what has been done in the neural architecture search domain, doesn't it?

In my case, I'd mainly be interested in mathematics: I'd provide a mathematical problem and a baseline algorithm for it and would want an open source framework to be able to improve on that.

mentalgear0y ago

Also definitely interested in open-source ml search: there are so many new approaches (I follow this channel for innovations; it is overwhelming https://www.youtube.com/@code4AI) and it would be great being able to define a use case and having a search come up with the best approaches.

joelthelion0y ago

I work in the field of medical image processing. I haven't thought particularly hard about it, but I'm sure I could find a ton of use cases if I wanted to.

friederrr0y ago

Yep, agree.

Had mentioned the same on X: https://x.com/friederrrr/status/1922850981181784152?t=usXpK1...

chrsw0y ago

This is a much better use of a AI than having it write college essays or generate cartoons.

vrm0y ago

This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.

rtsil0y ago

Maybe the actual solution to the interpretability/blackbox problem is to not ask the llm to execute a given task, but rather to write deterministic programs that can execute the task.

physix0y ago

That is what I think is most interesting about it. You get repeatable efficiency gains rather than burning GPU time in data centres.

fdej0y ago

> From the paper, "Notably, for multiplying two 4 × 4 matrices, applying the algorithm of Strassen recursively results in an algorithm with 49 multiplications, which works over any field...AlphaEvolve is the first method to find an algorithm to multiply two 4 × 4 complex-valued matrices using 48 multiplications."

...but Waksman's algorithm from 1970 [1] multiplies two 4 x 4 complex-valued matrices using only 46 multiplications (indeed, it works in any ring admitting division by 2).

Sloppy by DeepMind and by Nature to publish such a claim - did they not ask someone knowledgeable about matrix multiplication to review the work?

[1] https://doi.org/10.1109/T-C.1970.222926

gjm110y ago

My understanding of the situation is that:

1. Waksman's algorithm works in any commutative ring admitting division by 2.

2. In particular, it won't work when the matrix entries are themselves matrices, which means you can't use it recursively to get an algorithm for n-by-n matrices with large n with a better exponent than you get from Strassen's algorithm.

3. The Deep Mind paper is annoyingly unexplicit about whether the algorithm it reports has that property or not.

4. What they say about tensors suggests that their algorithm can be used recursively to do better than Strassen (but, note, there are other algorithms that are substantially better for very large n which using their algorithm recursively would very much not outperform) but it's possible I've misunderstood.

5. They explicitly talk about complex-valued matrices, but I think they don't mean "complex numbers as opposed to matrices, so you can't do this recursively" but "complex numbers as opposed to real numbers, so our algorithm doesn't get you a 4x4 matmul using 48 real multiplications".

I am not certain about points 4 and 5. The language in the paper is a bit vague. There may be supporting material with more details but I haven't looked.

wbhart0y ago

1. Correct

2. Correct, however you can use Waksman as a basecase and always beat Strassen (though it is not asymptotically better of course).

5. Possible, but even so, there is already an algorithm that will work with 46 real multiplications (and some divisions by 2). The real numbers are commutative and admit division by 2.

gjm110y ago

My point about #5 was just that their emphasis on "complex numbers" isn't necessarily (and I think probably isn't) an admission that their algorithm can't be applied recursively.

If it can be, then it is a genuine advance in the sense that it yields faster large-n matmuls than were previously available just by recursive application of a 4x4 matmul algorithm.

None of which, to be clear, makes it OK that they don't make any mention of earlier work that does 4x4 matmuls over commutative rings faster than their algorithm does. I suspect they didn't check the literature carefully enough before publication. In any case, good scholarship means being clear about what in your work is a genuine advance on what's already known and what isn't, and they failed to do that here.

wbhart0y ago

There's even an Open Source implementation of Waksman's in Flint, the package fdej maintains.

brianjlogan0y ago

Does this remind anyone else of genetic algorithms?

Is this basically a merge of LLM's with genetic algorithm iteration?

1 more reply

6gvONxR4sf7o0y ago

I'm sad not to see any mention of numerical stability. One of the hardest parts of all these automatic optimization of numerical algorithms is getting ensuring numerical stability. Once we have a strong handle on getting the best of both of those, it will be a delight.

ogogmad11mo ago

Improved comparison sort when?

The following algebraic point of view could be utter hogwash, so I might embarrass myself... but if you think about it, the "merge" operation is isomorphic to the product in a free commutative monoid (over a large number of generators, otherwise you can use Counting Sort). So sorting is all about computing a bunch of products (merges) in the optimal order. Now consider mergesort, insertion sort, Timsort.

aseg0y ago

Finally—something directly relevant to my research (https://trishullab.github.io/lasr-web/). Below are my take‑aways from the blog post, plus a little “reading between the lines.”

- One lesson DeepMind drew from AlphaCode, AlphaTensor, and AlphaChip is that large‑scale pre‑training, combined with carefully chosen inductive biases, enables models to solve specialized problems at—or above—human performance.

- These systems still require curated datasets and experts who can hand‑design task‑specific pipelines.

- Conceptually, this work is an improved version of FunSearch (https://github.com/google-deepmind/funsearch/).

- In broad terms, FunSearch (and AlphaEvolve) follow three core design principles:

    - Off‑the‑shelf LLMs can both generate code and recall domain knowledge. The “knowledge retrieval” stage may hallucinate, but—because the knowledge is expressed as code—we can execute it and validate the result against a custom evaluation function.

    - Gradient descent is not an option for discrete code; a zeroth‑order optimizer—specifically evolutionary search—is required.

    - During evolution we bias toward (1) _succinct_ programs and (2) _novel_ programs. Succinctness is approximated by program length; novelty is encouraged via a MAP‑Elites–style “novelty bias,” yielding a three‑dimensional Pareto frontier whose axes are _performance, simplicity,_ and _novelty_ (see e.g. OE‑Dreamer: (https://claireaoi.github.io/OE-Dreamer/).

Pros

- Any general‑purpose foundation model can be coupled with evolutionary search.

- A domain expert merely supplies a Python evaluation function (with a docstring explaining domain‑specific details). Most scientists I've talked with - astronomers, seismologists, neuroscientists, etc. - already maintain such evaluation functions for their own code.

- The output is an interpretable program; even if it overfits or ignores a corner case, it often provides valuable insight into the regimes where it succeeds.

Cons

- Evolutionary search is compute‑heavy and LLM calls are slow unless heavily optimized. In my projects we need ≈ 60 k LLM calls per iteration to support a reasonable number of islands and populations. In equation discovery we offset cost by making ~99 % of mutations purely random; every extra 1 % of LLM‑generated mutations yields roughly a 10 % increase in high‑performing programs across the population.

- Evaluation functions typically undergo many refinement cycles; without careful curation the search may converge to a useless program that exploits loopholes in the metric.

Additional heuristics make the search practical. If your evaluator is slow, overlap it with LLM calls. To foster diversity, try dissimilar training: run models trained on different data subsets and let them compete. Interestingly, a smaller model (e.g., Llama-3 8 B) often outperforms a larger one (Llama‑3 70 B) simply because it emits shorter programs.

nickpsecurity0y ago

Non-expert here who likes reading lots of this kind of research. I have a few questions.

1. Why does it need a zeroth order optimizer?

2. Most GA's I've seen use thousands of solutions. Sometimes ten thousand or more. What leads you to use 60,000 calls per iteration?

3. How do you use populations and "islands?" I never studied using islands.

4. You said the smaller models are often better for "shorter" code. That makes sense. I've seen people extend the context of model with training passes. You think it would help to similarly shrink a larger model to a smaller context instead of using the small models?

aseg0y ago

Happy to answer them!

1. Because we only have blackbox access to the LLM and the evaluation function might not be differentiable.

2. We're trying to search over the space of all programs in a programming language. To cover enough of this huge search space, we need to instantiate (1) a large number of programs in each population and (2) a large number of populations themselves (3) A large number of update steps for each population.

3. I have a couple of graphics motivating, conceptually, what an island/population looks like: https://trishullab.github.io/lasr-web/ . This whitesheet might also be useful: https://arxiv.org/abs/2305.01582

4. This is an interesting question. I believe so. However, my observations were derived from a non turing complete language (mathematical equations). There might be other ways of enforcing a succinctness pressure.

knowaveragejoe0y ago

> The “knowledge retrieval” stage may hallucinate, but—because the knowledge is expressed as code—we can execute it and validate the result against a custom evaluation function.

Can you give a concrete example of this? It's hard for me to conceptualize.

aseg0y ago

Assume you have data for Hooke's law (a spreadsheet with F, x, and other variables) and you want AlphaEvolve to give you the equation ``F = -C_1*x``.

Let's say the model hallucinates in two directions:

1. "There is a trigonometric relationship between variable F and x". It expresses this as ``F = -C_1*sin(x)``. You fit the constant C_1 w.r.t the dataset, execute the program, and your best fit has a high error. You can discard the program.

2. "There is an inverse linear relationship between variable F and x". Now it expresses this as ``F = -C_1*x``. You fit the constant C_1 w.r.t the dataset, execute the program, and your best fit has extremely low error. You now know for sure that you're on the right track.

codelion11mo ago

You can try an open-source implementation - https://github.com/codelion/openevolve

matesz0y ago

> Here, the code between <<<<<<< SEARCH and======= is the exact segment to match in the current program version. The code between======= and >>>>>>> REPLACE is the new segment that will replace the original one. This allows for targeted updates to specific parts of the code.

Anybody knows how they can guarantee uniqueness of searched snipped within code block or is it even possible?

akomtu0y ago

That's 1 year ahead of the ai-2027.com schedule.

brador0y ago

Has scifi covered anything after AI? Or do we just feed the beast with Dyson spheres and this is the end point of the intelligent universe?

bqmjjx0kac0y ago

Yes! Children of Time starts with a AI-capable civilization and proceeds from there. I won't give anything away, and I recommend going in cold! https://www.goodreads.com/book/show/25499718-children-of-tim...

xvilka0y ago

Too bad the code isn't published. I would expect everything from DeepMind to be opensource, except model itself.

Workaccount20y ago

In the past AI wasn't really competing with other AI for user dollars. It was more just a bolted on "feature".

Nowadays it makes much more sense to share less.

simianwords0y ago

I'm surprised I'm not able to find this out - can some one tell me whether AlphaEvolve involves backprop or not?

I honestly have no idea how AlphaEvolve works - does it work purely on the text level? Meaning I might be able to come up with something like AlphaEvolve with some EC2's and a Gemini API access?

dmos620y ago

No, the program and prompt databases use a genetic algorithm.

simianwords0y ago

So with just a server an Gemini access + their code I can achieve the same thing? Nice

dmos620y ago

Yes, but I don't think that their code is accessible. And the whitepaper is light on details.

intrepidsoldier0y ago

This is Google solving Google-sized problems. I am afraid the rest of the world will look at this and say - "yeah we want to be like Google and adopt this". That is how Kubernetes took over the world.

allmathl0y ago

Can someone explain how an "agent" is distinct from a "chatbot"?

I'm reading descriptions of agents and it just seems like the same tech deployed with authority to write and a scheduler

Joel_Mckay0y ago

A >2% bump in algorithmic performance is pretty impressive given the search approach.

Packing problems are hard, and it is fun to see new interest in the area given these show up in weird places. =3

smusamashah0y ago

I wish it can re-write everything in assembly or even binary in a super optimized form, debloat all software including itself and then grow from there.

73737373730y ago

anyone else feel out-evolved yet?

xigency0y ago

Not really, only when looking back at the 60's and 70's when most of the important algorithms I use were invented. For example, LR parsing and A*.

Just wait until the MBA's and politicians learn about this Adam Smith guy. A pipedream now, but maybe in the future schools will be inspired to teach about dialectical reasoning and rediscover Socrates.

[end of snark]

Sorry, I'm getting tired of ad-fueled corporations trying to get me to outsource critical thinking.

omnee0y ago

This is one of the most valuable intellectual skill that humans can develop and is even more relevant in a world where manipulation and misinformation at scale is effectively commoditized. Like you, I'm not going to outsource it to a corpo, and will use LLMs as simply another source to be used critically in learning and/or reaching a decision.

sponnath11mo ago

Machines have been outperforming humans at a variety of tasks for quite a while now. I'm unconvinced that AlphaEvolve can lead to some sort of singularity.

inasio0y ago

It sounds to me like a hyperparameter optimizer (fast evaluator) guided by AI; I wonder if it's related to Google's Vizier

hnthrow903487650y ago

Would love for AI to kill the leetcode interview

boshalfoshal0y ago

AI will indeed kill the leetcode interview - because once it replaces human SWEs you don't really need to give leetcode-style brainteasers to any human anymore.

lawlessone0y ago

You never needed to.

chpatrick0y ago

Not defending it but I think it was more of a test if you have the dedication (and general smarts) to grind them for a few months than software engineering skills.

Similar to hiring good students from famous universities even if most of CS isn't that applicable to day to day programming work, just because it's a signal that they're smart and managed to get through a difficult course.

boshalfoshal0y ago

Yes, and you will never need to.

curtisblaine0y ago

It will just move the leetcode interview to in-person.

esafak0y ago

... and make credentials more important. Be careful what you ask for.

anticensor0y ago

https://www.interviewcoder.co/ already served that.

curtisblaine0y ago

Are you sure? From my experience, no AI assistant is fast enough to handle fast-paced questions on the code the candidate just wrote. Also, frequent requests to adjust variable names and deactivating pasting on the page make it extremely laborious for the candidate to get AI to modify the code on the screen.

asadm0y ago

that was already solved 2 years back.

visarga0y ago

Good method to generate synthetic training data, but only works for domains where validation can be scaled up.

thisisauserid0y ago

It seemed appropriate to use Gemini to make sure my answers were ideal for getting access to the preview.

fundaThree0y ago

What is an "advanced" algorithm? How do you differentiate this from other algorithms?

pawanjswal0y ago

AlphaEvolve sounds like the AI version of a genius coder that never sleeps, insane potential!

BhavdeepSethi0y ago

Surprised they didn't answer if they tried using AlphaEvolve to improve AlphaEvolve!

informal0070y ago

Why it emphasize math and computer science more in training stage?

The problems in math and CS are more suitable for training LLMs?

casey20y ago

Why Gemini(s)? Why not LLMs fine tuned for LARPing as a researcher?

factibicongue0y ago

I find it quite profound that there is no mention of the generation of corresponding code documentation. Without design diagrams, source and commit comments, etc the resulting code and changes will become incomprehensible unmaintainable. Unless that is somehow the point?

fHr0y ago

sheeesh

backendEngineer0y ago

yuge

ldjkfkdsjnv0y ago

Software engineering will be completely solved. Even systems like v0 are astounding in their ability to generate code, and are very primitive to whats coming. I get downvoted on HN for this opinion, but its truly going to happen. Any system that can produce code, test the code, and iterate if needed will eventually outperform humans. Add in the reinforcement learning, where they can run the code, and train the model when it gets code generation right, and we are on our way to a whole different world.

mattlondon0y ago

"Coding" might be solved, but there is more to software engineering than just churning out code - i.e. what should we build? What are the requirements? Are they right? Whats the other dependencies we want to use - AWS or GCP for example? Why those and not others - whats the reason? How does this impact our users and how they use the system? What level of backwards/forwards compatibility do we want? How do we handle reliability? Failover? Backups? and so on and so on.

Some of these questions change slightly, since we might end up with "unlimited resources" (i.e. instead of having e.g. 5 engineers on a team who can only get X done per sprint, we effectively have near-limitless compute to use instead) so maybe the answer is "build everything on the wish-list in 1 day" to the "what should we prioritize" type questions?

Interesting times.

My gut is that software engineers will end up as glorified test engineers, coming up with test cases (even if not actually writing the code) and asking the AI to write code until it passes.

proof_by_vibes0y ago

Testing in general is quickly being outmoded by formal verification. From my own gut, I see software engineering pivoting into consulting—wherein the deliverables are something akin to domain-specific languages that are tailored to a client's business needs.

static_void0y ago

Indeed, reasoning in the small and reasoning in the large are different skills. Architecture abstracts over code.

ldjkfkdsjnv0y ago

Generally the product decisions are not given to the engineers. But yeah, engineers will be tuning, prodding, and poking ai systems to generate the code to match the business requirements.

IncreasePosts0y ago

> Any system that can produce code, test the code, and iterate if needed

That isn't every problem in software engineering.

nevertoolate0y ago

It is not that you get downvoted because they don’t understand you, it is because you sell your opinion as fact, like an apostle. For example what does it mean that software engineering is solved?

jpnc0y ago

Check his profile.

> about: I believe in the creation of a machine god

Sounds about right.

linhns0y ago

I wonder if he’s a machine himself?

sannysanoff0y ago

Prophets are always beaten by average citizens, because prophecy is always unpleasant. It can't be otherwise. At the same time, you can't tell right away whether a person is really a prophet, because it becomes known much later. That's probably why beating them (the simplest solution) turns out to be the most observed.

handfuloflight0y ago

> because prophecy is always unpleasant.

Not necessarily. 'Gospel' is translated as good news. The unpleasant news tends towards those within the power structure that the prophet challenges.

squidbeak0y ago

> it is because you sell your opinion as fact.

The guy's making a prediction. Classifying it as some kind of religious zealotry isn't fair to his point or him.

1 more reply

sannysanoff0y ago

it's known idiom, it means: optimal algorithm is found; like in "tic tac toe is solved problem".

nevertoolate0y ago

If I squint I can see some connection between Go (game) and (Software) Engineering (field).

bossyTeacher0y ago

What about brownfield development though? What about vague requirements or cases with multiple potential paths or cases where some technical choices might have important business consequences that shareholders might need to know about? Can we please stop pretending that software engineering happens in a vacuum?

starlust20y ago

The thing with vague requirements is that the real problem is that making decisions is hard. There are always tradeoffs and consequences. Rarely is there a truly clear and objective decision. In the end either you or the LLM are guessing what the best option is.

bossyTeacher0y ago

Yes, decisions are but they need to be made. Ideally shareholders will be given as much context so they can decide. This communication is as vital as having good programming skills imo. Your beautiful code means nothing if it does not adequately solve the business problem

kaibee0y ago

> What about vague requirements or cases with multiple potential paths or cases where some technical choices might have important business consequences that shareholders might need to know about?

If the cost of developing the software is 0, you can just build both.

bossyTeacher0y ago

or you can build 20 different versions. Your non technical person won't be happy about this though. They wanted 1 software system not 20 nor 2. Just one

squidbeak0y ago

Isn't what you describe eventually just a context contraint problem?

jackphilson0y ago

Everyone will just turn into a problem solver until there are no more problems.

fHr0y ago

lol

DGAP0y ago

There's cope in the comments about possibility of some software adjacent jobs remaining, which is possible, but the idea of a large number of high paying software jobs remaining by 2030 is a fantasy. Time to learn to be a plumber.

ldjkfkdsjnv0y ago

Some huge percentage of all venture capital in the united states is moving towards solving this problem

owentbrown0y ago

AlphaEvolve is confirming evidence of an intelligence explosion.

The key ingredient for an intelligence explosion is AI accelerating development of AI.

This is it. It’s happening.

4b11b40y ago

That's possibly a bit too general and an over statement...

Remember this approach only works for exploring an optimization for an already defined behavior of a function which has an accordingly well defined evaluation metric.

You can't write an evaluation function for each individual piece of or general "intelligence"...

mycatisblack0y ago

I just hope there’s enough time between an actual AI and the “Let’s butcher this to pump out ads”-version to publish a definitive version of wikipedia. After a few days with gemini 2.0 delving into the guts of a spectrum analyser, I’m very impressed by the capabilities. But my cynicism gland is fed by the nature of this everything-as-a-service. To run an LLM on your computer, locally, without internet, is just a few clicks. But that’s not the direction these software behemoths are going.

TheMiddleMan0y ago

Yes, but is the inflection point in 12 months or 12 years?

Either way, it's pretty wild.

nprateem0y ago

Maybe this one can stop writing a fucking essay in code comments.

I'm now no longer surprised just how consistently all the gemini models overcomplicate coding challenges or just plain get them wrong.

Claude is just consistently spot on. A few salient comments for tricky code instead of incessantly telling me what it's changed and what I might want to do, incorrect assumptions when it has the code or is something we've discussed, changing large amounts of unrelated code (eg styles). I could go on.

Shame I'm too tight to pay for Claude RN though...

mattlondon0y ago

Just ask it to only add comments on complex parts (or not at all). Prompt engineering.

nprateem0y ago

Duh why didn't I think of that. Oh wait it was the first thing I tried and it makes NO difference

energy1230y ago

The comment spam is likely a byproduct of RL, it lets the model dump locally relevant reasoning while writing code.

You can try asking it to not do that, but I would bet it would slightly degrade code quality.

Workaccount20y ago

The model likely is doing it more for itself than for you.

You can take the code and give it to another LLM instance and ask it to strip all comments.

nprateem0y ago

It can remove them itself after it's written the code, but basically seems incapable of writing clean code to begin with, unlike Claude

j / k navigate · click thread line to collapse

270 comments

aabajian0y ago

Total size = size(A) + size(B) - size(intersection(A, B))

Strassen's algorithm keeps track of calculations that are needed later on. It's all reminiscent of dynamic programming.

What I find interesting is that it seems the extra savings requires complex values. There must be something going on in the complex plane that is again over-counting with the naive approach.

lf880y ago

alexnovikov0y ago

Here is a website that tracks fastest (recursively applicable) matrix multiplication algorithms for different matrix sizes, and it stands at 49: https://fmm.univ-lille.fr/4x4x4.html

UPD: s/fields/rings/ and fixed equation rendering

robinhouston0y ago

On the other side, it's claimed here that an algorithm that uses only 46 multiplications has been known since 1970: https://mathstodon.xyz/@fredrikj/114508287537669113

looofooo00y ago

Ironically their AI can cite the relevant paper with 46 steps if asked: https://gemini.google.com/share/b0d5d6a76c87

wbhart0y ago

nemonemo0y ago

The answer says "For rings in which division by 2 is permitted". Is there the same constraint for AlphaEvolve's algorithm?

Edit2: Z_2 has characteristics 2.

2 more replies

mik0911mo ago

jackpirate0y ago

It seems like you have some misconceptions about Strassen's alg:

2. Strassen's does not require complex valued matrices. Everything can be done in the real numbers.

pontus0y ago

In other words, the power of Strasssens algorithm comes from a strategy that's similar to / reminiscent of dynamic programming.

kenjackson0y ago

I think the original poster was referring to the AlphaEvolve variant of Strassen's, not the standard Strassen (with respect to complex values).

meindnoch0y ago

>What I find interesting is that it seems the extra savings requires complex values. There must be something going on in the complex plane that is again over-counting with the naive approach.

https://en.wikipedia.org/wiki/Tensor_rank_decomposition#Fiel...

Zr010y ago

A complex multiplication is "worth" at least 3 real multiplications.

alexnovikov11mo ago

Fair point! A single complex multiplication `(a+bi)(c+di)` indeed requires at least 3 real multiplications to be implemented.

(Disclaimer: one of the authors.)

Davidzheng0y ago

Are you sure the saving needs complex values? I think their algorithm works over any char 0 field. Probably needs to just divide by some divisor of 4!=24 if I had to guess.

meindnoch0y ago

Their decomposition of the (4,4,4) matrix multiplication tensor is explicitly listed in their Colab notebook, and it contains complex numbers.

jasonjmcghee0y ago

> AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models

> In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.

> And in 20% of cases, AlphaEvolve improved the previously best known solutions

These sound like incredible results. I'd be curious what kind of improvements were made / what the improvements were.

Like, was that "up to a 32.5% speedup" on some weird edge case and it was negligible speed up otherwise? Would love to see the benchmarks.

schmidtleonard0y ago

It's a very impressive result, but not magic, but also not cheating!

hiddencost0y ago

100%. LLMs are extremely useful for doing obvious but repetitive optimizations that a human might miss.

jerjerjer0y ago

What it essentially does is a debugging/optimization loop where you change one thing, eval, repeat it again and compare results.

Previously we needed to have a human in the loop to do the change. Of course we have automated hyperparameter tuning (and similar things), but that only works only in a rigidly defined search space.

Will we see LLMs generating new improved LLM architectures, now fully incomprehensible to humans?

2 more replies

thesz0y ago

One can have obvious but repetitive optimizations with symbolic programming [1].

[1] https://arxiv.org/abs/1012.1802

[2] https://github.com/alt-romes/hegg/blob/master/test/Sym.hs#L3...

jasonjmcghee0y ago

othorns0y ago

Hey, do you have any suggestions for resources to learn more about this kind of custom optimisation? Sounds interesting, but not sure where to start?

samvher0y ago

https://ppc.cs.aalto.fi/ covers some of this (overlapping with the topics the person you responded to mentioned, but not covering all, and including some others)

cubefox0y ago

cavisne0y ago

From the paper it was a speedup on the XLA GPU kernel they wrote using Jax, which is probably not SOTA. I don't think Jax even has a official flash attention implementation.

yarri0y ago

[0] https://github.com/AI-Hypercomputer/maxtext

Amekedl0y ago

I'm thinking reading numbers like this is really just slop lately.

FA achieving a 32.5% speed up? Cool.

Why not submit it as a PR to the Flash Attention repo then? Can I read about it more in detail?

dpflan0y ago

Like many others, I was excited about it. After all, isn't this exactly what we want AI to do - help us optimize and improve our technical systems?

[1] https://www.linkedin.com/posts/ravid-shwartz-ziv-8bb18761_ye...

[2] https://x.com/main_horse/status/1892473238036631908

pram0y ago

lmao this is exactly the kind of stuff I always see from Claude. It’s like adding a Skip() to a test and declaring it works now. “Well it’s a lot faster, I met the criteria of my TODOs cya”

I’ve seen it so much I kinda doubt it was “inadvertent” because they’re like seemingly intentional about their laziness, and will gaslight you about it too.

4 more replies

sameermanek0y ago

Exactly, as a great dev once said: "talk is cheap, show me the code"

yarri0y ago

I assume the Gemini results are JAX/PAX-ML/Pallas improvements for TPUs so would look there for recent PRs

qq12as0y ago

This is great.

But how incremental are these advancements?

So at least in this case it looks like the advancement produced by AlphaEvolve was quite incremental (still cool!).

prvc0y ago

Merely from your telling, it seems it is no longer "not worth the effort", as "the effort" has been reduced drastically. This is itself significant.

AndrewKemendo0y ago

That right and In fact it’s the core purpose of the tool.

This is complex automation which by definition compresses the solution into a computable process that works more efficiently than the non-automated process

That, in fact, is the revolutionary part - you’re changing how energy is used to solve the problem.

codr70y ago

Faster, yes; more efficiently...I guess that's why they're funding nuclear plants then?

2 more replies

SideQuark11mo ago

This advance likely uses more compute than the authors in 2009 could have imagined. It most certainly is not drastically reduced effort.

cman144411mo ago

That assumes that compute = effort, which is not how most people would interpret it I think.

1 more reply

empath750y ago

geodel0y ago

As changes are getting faster and faster more people are gonna fall wayside and of course they can blame themselves for their predicament.

1 more reply

luckydata0y ago

not worth the time for a human, but if you can throw AI at all of those "opportunities" it adds up substantially because all the chores can be automated.

runsWphotons0y ago

baq0y ago

If this is not the beginning of the take off I don’t know what is.

quantumHazer0y ago

edit: I missed the evaluator in my description, but an evaluation method is applied also in Co-Scientist:

[0]: https://research.google/blog/accelerating-scientific-breakth...

chriskanan0y ago

They address this in the AlphaEvolve paper:

latentsea0y ago

If they ever do change their stance on that and give in to vibe coding, at least there is the opportunity to brilliantly rebrand as DeepVibe.

3abiton0y ago

ActorNightly0y ago

JFingleton0y ago

> It is interesting how google turned the tide on GenAI race, and seems to be leading the pack

I think this is perhaps due to Google combining Google Brain and DeepMind, and putting Demis Hassabis at the helm?

Workaccount20y ago

Few things are more Google than having two distinct teams building two distinct products that are essentially the same thing.

coredog640y ago

You can contrast that with Microsoft, where the same team is building the same product with two distinct names.

luckydata0y ago

this is the same team and it's pretty obvious they would apply the same ideas to two different problems that can both benefit from it no?

quantumHazer0y ago

So we are rebranding the same idea every four months and call it a breakthrough?

1 more reply

mistrial90y ago

pardon "Google's Co-Scientist" ? There are multiple projects called that?

quantumHazer0y ago

Yep

https://research.google/blog/accelerating-scientific-breakth...

mistrial90y ago

https://engineering.cmu.edu/news-events/news/2023/12/20-ai-c...

Workaccount20y ago

baq0y ago

If you have the brain power, the compute and control the hardware, what is there to prevent the take off feedback loop? Deepmind is at this point in the timeline uniquely positioned.

david-gpu0y ago

> If you have the brain power, the compute and control the hardware, what is there to prevent the take off feedback loop?

logicchains0y ago

drexlspivey0y ago

They have been doing this for years. Headline from 2016:

"DeepMind AI Reduces Google Data Centre Cooling Bill by 40%"

https://deepmind.google/discover/blog/deepmind-ai-reduces-go...

factibicongue0y ago

Running out of improvements after the first pass would prevent that. Who is to say this Alpha Evolve is not already obsolete, having already served its purpose?

Not to sound metaphysical or anything, but dependency on artificial intelligence seems to be something you would find at the peak of Mount Stupid (where the Darwin Awards are kept).

I am late for a chess game, l8r sk8rs.

throwaway3141550y ago

> which creates a situation where Gemini 2.0 was used in a way to train Gemini 2.5.

The use of synthetic data from prior models to create both superior models and distilled models has been going on since at least OpenAI's introduction of RLHF, and probably before that too.

happyopossum0y ago

> The use of synthetic data from prior models

That’s distinct from those prior models providing actual code to improve the next model

factibicongue0y ago

Imnimo0y ago

I'm surprised by how little detail is given about the evolution procedure:

>In AlphaEvolve, the evolutionary database implements an algorithm that is inspired by a combination of the MAP elites algorithm [71] and island-based population models [80, 94].

pilooch0y ago

Yes the 2023 reference on island based evolution with LLMs (nature article) https://www.nature.com/articles/s41586-023-06924-6 has more details.

Agreed the dimensions/features are key. These white papers are an insult to science...

hariseldom0y ago

https://arxiv.org/pdf/2501.09891v1 from deepmind in January goes into the evolutionary algorithm a bit - no math though.

letitgo123450y ago

Most straightforward would be to ask the model to generate different evaluation metrics (which they already seem to do) and use each one as one of the dimensions

moritonal0y ago

For the people awaiting the singularity, lines like this written almost straight from science fiction:

vighneshiyer0y ago

Here is the relevant bit from their whitepaper (https://storage.googleapis.com/deepmind-media/DeepMind.com/B...):

> AlphaEvolve was able to find a simple code rewrite (within an arithmetic unit within the matmul unit) that removed unnecessary bits, a change validated by TPU designers for correctness.

> This early exploration demonstrates a novel approach where LLM-powered code evolution assists in hardware design, potentially reducing time to market.

I think they are vastly overselling what AlphaEvolve was able to achieve. That isn't to say anything about the potential utility of LLMs for RTL design or optimization.

j2kun0y ago

This just means that it operates on the (debug text form of the) intermediate representation of a compiler.

proof_by_vibes0y ago

j2kun0y ago

I'm referring to what the authors actually claim they did in the paper. They operated on XLA-generated textual IR.

Cf. the second paragraph of 3.3.4 of https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

4b11b40y ago

Sure but remember that this approach only works for exploring an optimization for a function which has a well defined evaluation metric.

You can't write an evaluation function for general "intelligence"...

zaken0y ago

Honestly it's this line that did it for me:

> AlphaEvolve enhanced the efficiency of Google's data centers, chip design and AI training processes — *including training the large language models underlying AlphaEvolve itself*.

Singularity people have been talking for decades about AI improving itself better than humans could, and how that results in runaway compounding growth of superintelligence, and now it's here.

dgacmu0y ago

Most code optimizations end up looking somewhat asymptotic towards a non-zero minimum.

TimTheTinker0y ago

Exactly - the possible improvements may compound, but they converge logarithmically towards an upper limit absent new insight that establishes a new upper limit.

ActorNightly0y ago

Long way to go Singularity. We don't even know if its possible.

nebula88040y ago

We are further getting to the point where no one on the planet understand how any of this stuff really works. This will last us until a collapse. Then we are done for.

factibicongue0y ago

pineaux0y ago

nowittyusername0y ago

_delirium0y ago

Some related work from a different company: https://sakana.ai/ai-cuda-engineer/

And some academic papers kind of in this space: https://arxiv.org/abs/2206.08896, https://arxiv.org/abs/2302.12170, https://arxiv.org/abs/2401.07102

sebstefan0y ago

It's always "revolutionizing our internal workflows" or "30% of code at Microsoft is AI now" but never improving a codebase you can actually see

Making a significant improvement to the state of the art of one particular algorithm is one thing, but I've seen new tools do that since the 80s

I'll be convinced when LLMs start making valuable pull requests, non-obvious corner cases or non-trivial bugs in mature FOSS projects

antihipocrat0y ago

On the other side I see excitement that the singularity is here.

If the latter were the case surely we wouldn't be reading about it in a published paper, we would already know.

dymk0y ago

amelius0y ago

Isn't this like comparing a human historian to Wikipedia though? Of course the knowledge in Wikipedia will in most cases beat the human. However, that's not the kind of thing we're looking for here.

1 more reply

friederrr0y ago

[1] Blog: https://deepmind.google/discover/blog/discovering-novel-algo...

[2] Paper: https://www.nature.com/articles/s41586-022-05172-4

[3] arxiv.org/pdf/2210.04045

[4] arxiv.org/abs/2212.01175 Flip graphs for matrix multiplication

(Reposted from here, where I made a mini deep-dive into this: https://x.com/friederrrr/status/1922846803420119410?t=7jZ34P...)

markisus0y ago

mattdesl0y ago

slickQ11mo ago

quantumHazer0y ago

I fear it’s not really evolutionary algorithms in the typical sense.

vessenes0y ago

One intriguing caption mentioned something requiring 16 “mutations”. I’d sure like to know how these mutations work.

Ey7NFZ3P0nzAe0y ago

Seems very exotic for RL or agents, but not for genetic algorithms

xianshou0y ago

This isn't just neat, it's important - as we run out of useful human-generated data, RL scaling is the best candidate to take over where pretraining left off.

resiros0y ago

I guess that's now becoming true with LLMs.

Faster LLMs -> More intelligence

UncleOxidant0y ago

> This does not look like RL. It's a genetic algorithm.

vjerancrnjak0y ago

Genetic algorithm is worse than gradient descent.

If variety is sought, why not beam with nice population statistic.

moregrist0y ago

This depends quite a bit of what you’re trying to optimize.

It’s also a local minimization technique and cannot escape local minima by itself.

In contrast, genetic algorithms try to find global minima, do not require differentiable scoring functions, and can operate on both continuous and discrete domains. They have their own disadvantages.

Different techniques for different problems. The field of numerical optimization is vast and ancient for a reason.

1 more reply

yorwba0y ago

skybrian0y ago

There's no API or product yet, so it seems unlikely that they made it to a "just works" level of polish?

They are having some success in making it work internally. Maybe only the team that built it can get it to work? But it does seem promising.

unignorant0y ago

This technique doesn't actually use RL at all! There’s no policy-gradient training, value function, or self-play RL loop like in AlphaZero/AlphaTensor/AlphaDev.

modeless0y ago

smattiso0y ago

vrm0y ago

4b11b40y ago

This isn't quite RL, right...? It's an evolutionary approach on specifically labeled sections of code optimizing towards a set of metrics defined by evaluation functions written by a human.

I suppose you could consider that last part (optimizing some metric) "RL".

However, it's missing a key concept of RL which is the exploration/exploitation tradeoff.

TechDebtDevin0y ago

spyckie20y ago

I think you mean the general class of algorithms that scale with compute times, RL being the chief example. But yes I agree to that point.

obsolete_wagie0y ago

Yup. Its coming. Any verifiable human skill will be done by ai.

HappyPanacea0y ago

[1]: https://arxiv.org/html/2505.05896v1

bee_rider0y ago

catigula0y ago

It's hard to stake out a defensible position on bold claims like these because, if they were as presented, it's hard to see how you haven't simply completed runaway AI.

Davidzheng0y ago

runaway AI is a process not a moment.

lionkor0y ago

Show the training set, and PROVE that the tasks and answers aren't in there. I don't understand why this is not a default first step for proving that this is creating new knowledge.

thatguysaguy0y ago

Are you claiming that for the open problems they give record-breaking solutions for, there were just answers on the web waiting to be found?

lionkor0y ago

thatguysaguy11mo ago

1 more reply

Davidzheng0y ago

thw09j9m0y ago

It's Google. Assume the training set contains, as a subset, the entirety of all public digitized information. How would you like to them to share it?

lionkor0y ago

kookamamie0y ago

As a tar, please.

tananaev0y ago

How can you actually verify it, even if they provide something?

lionkor0y ago

That's my point; you can't. They have no idea if their model came up with any of this or not.

Ericson23140y ago

Why do I get the feeling they are doing the "IBM Watson" thing where different efforts are being put underneath the same brand name?

Not saying it is that egregious, but it's a slippery slope from "well, it didn't do all these different things out of the box, unsupervised".

FeepingCreature0y ago

voidspark0y ago

IBM Watson is old, before deep learning.

Deep learning systems have exceeded the hype. In 2016 we saw potential with models like AlphaGo Zero but no one could foresee the capability of LLMs (a type of deep learning model).

voidspark0y ago

Gemini refers specifically to a family of multimodal LLMs, which is exactly what they are using here.

They have other models with different names used for different purposes.

https://ai.google/get-started/our-models/

rurban0y ago

Did you see that halluzination in the paper?

It optimized

   initializers.normal (0.0

   initializers.normal (0 + 1j * 0,

I thought the results were being reviewed?

Anyway, impressive results. That's why OpenAI and Elon were so frightened about Hassabi.

thatguysaguy0y ago

I mean that is changing the dtype, perhaps that's relevant.

artninja19880y ago

Interesting to see Terence Tao in the authors list. I guess he's fully ai pilled now. Did he check the math results?

ak_1110y ago

He is not in the author list, just acknowledged by the authors.

amelius0y ago

Maybe this is interesting: the whitepaper says:

kevlened0y ago

I wonder if evolvable hardware [0] is the next step.

In 1996, they optimized an FPGA using a genetic algorithm. It evolved gates disconnected from the circuit, but were required.

The circuit exploited the minuscule magnetic fields from the disconnected gates rather than the logical connections.

[0] https://en.wikipedia.org/wiki/Evolvable_hardware

KeplerBoy0y ago

And nothing came of that. 30 years later and programming FPGAs is still a pain.

rriley0y ago

We are entering a new era of evolutionary algorithms and LLMs. Reminds me of the idea behind: https://github.com/DivergentAI/dreamGPT

saretup0y ago

That’s a really cool idea. I often used https://dannymator.itch.io/randomicon to come up with novel ideas, never thought of feeding random words to llm as a way of doing it.

modeless0y ago

Interesting that this wasn't tested on ARC-AGI. Francois has always said he believed program search of this type was the key to solving it. It seems like potentially this approach could do very well.

spyckie20y ago

My thought as well. How well does it translate into arc agi? If it does well then we have a general purpose super intelligence… so maybe agi?

suddenlybananas11mo ago

Given they didn't report how good it is at it, it's probably not very good at it.

joelthelion0y ago

This looks like something that can (and should) be reimplemented open-source. It doesn't look like a particularly daunting project.

bluecoconut0y ago

If it goes well, I could open source it.

friederrr0y ago

This does seem similar to what has been done in the neural architecture search domain, doesn't it?

In my case, I'd mainly be interested in mathematics: I'd provide a mathematical problem and a baseline algorithm for it and would want an open source framework to be able to improve on that.

mentalgear0y ago

joelthelion0y ago

I work in the field of medical image processing. I haven't thought particularly hard about it, but I'm sure I could find a ton of use cases if I wanted to.

friederrr0y ago

Yep, agree.

Had mentioned the same on X: https://x.com/friederrrr/status/1922850981181784152?t=usXpK1...

chrsw0y ago

This is a much better use of a AI than having it write college essays or generate cartoons.

vrm0y ago

rtsil0y ago

Maybe the actual solution to the interpretability/blackbox problem is to not ask the llm to execute a given task, but rather to write deterministic programs that can execute the task.

physix0y ago

That is what I think is most interesting about it. You get repeatable efficiency gains rather than burning GPU time in data centres.

fdej0y ago

...but Waksman's algorithm from 1970 [1] multiplies two 4 x 4 complex-valued matrices using only 46 multiplications (indeed, it works in any ring admitting division by 2).

Sloppy by DeepMind and by Nature to publish such a claim - did they not ask someone knowledgeable about matrix multiplication to review the work?

[1] https://doi.org/10.1109/T-C.1970.222926

gjm110y ago

My understanding of the situation is that:

1. Waksman's algorithm works in any commutative ring admitting division by 2.

3. The Deep Mind paper is annoyingly unexplicit about whether the algorithm it reports has that property or not.

I am not certain about points 4 and 5. The language in the paper is a bit vague. There may be supporting material with more details but I haven't looked.

wbhart0y ago

1. Correct

2. Correct, however you can use Waksman as a basecase and always beat Strassen (though it is not asymptotically better of course).

5. Possible, but even so, there is already an algorithm that will work with 46 real multiplications (and some divisions by 2). The real numbers are commutative and admit division by 2.

gjm110y ago

My point about #5 was just that their emphasis on "complex numbers" isn't necessarily (and I think probably isn't) an admission that their algorithm can't be applied recursively.

If it can be, then it is a genuine advance in the sense that it yields faster large-n matmuls than were previously available just by recursive application of a 4x4 matmul algorithm.

wbhart0y ago

There's even an Open Source implementation of Waksman's in Flint, the package fdej maintains.

brianjlogan0y ago

Does this remind anyone else of genetic algorithms?

Is this basically a merge of LLM's with genetic algorithm iteration?

1 more reply

6gvONxR4sf7o0y ago

ogogmad11mo ago

Improved comparison sort when?

aseg0y ago

Finally—something directly relevant to my research (https://trishullab.github.io/lasr-web/). Below are my take‑aways from the blog post, plus a little “reading between the lines.”

- These systems still require curated datasets and experts who can hand‑design task‑specific pipelines.

- Conceptually, this work is an improved version of FunSearch (https://github.com/google-deepmind/funsearch/).

- In broad terms, FunSearch (and AlphaEvolve) follow three core design principles:

    - Off‑the‑shelf LLMs can both generate code and recall domain knowledge. The “knowledge retrieval” stage may hallucinate, but—because the knowledge is expressed as code—we can execute it and validate the result against a custom evaluation function.

    - Gradient descent is not an option for discrete code; a zeroth‑order optimizer—specifically evolutionary search—is required.

    - During evolution we bias toward (1) _succinct_ programs and (2) _novel_ programs. Succinctness is approximated by program length; novelty is encouraged via a MAP‑Elites–style “novelty bias,” yielding a three‑dimensional Pareto frontier whose axes are _performance, simplicity,_ and _novelty_ (see e.g. OE‑Dreamer: (https://claireaoi.github.io/OE-Dreamer/).

Pros

- Any general‑purpose foundation model can be coupled with evolutionary search.

- The output is an interpretable program; even if it overfits or ignores a corner case, it often provides valuable insight into the regimes where it succeeds.

Cons

- Evaluation functions typically undergo many refinement cycles; without careful curation the search may converge to a useless program that exploits loopholes in the metric.

nickpsecurity0y ago

Non-expert here who likes reading lots of this kind of research. I have a few questions.

1. Why does it need a zeroth order optimizer?

2. Most GA's I've seen use thousands of solutions. Sometimes ten thousand or more. What leads you to use 60,000 calls per iteration?

3. How do you use populations and "islands?" I never studied using islands.

aseg0y ago

Happy to answer them!

1. Because we only have blackbox access to the LLM and the evaluation function might not be differentiable.

knowaveragejoe0y ago

> The “knowledge retrieval” stage may hallucinate, but—because the knowledge is expressed as code—we can execute it and validate the result against a custom evaluation function.

Can you give a concrete example of this? It's hard for me to conceptualize.

aseg0y ago

Assume you have data for Hooke's law (a spreadsheet with F, x, and other variables) and you want AlphaEvolve to give you the equation ``F = -C_1*x``.

Let's say the model hallucinates in two directions:

codelion11mo ago

You can try an open-source implementation - https://github.com/codelion/openevolve

matesz0y ago

Anybody knows how they can guarantee uniqueness of searched snipped within code block or is it even possible?

akomtu0y ago

That's 1 year ahead of the ai-2027.com schedule.

brador0y ago

Has scifi covered anything after AI? Or do we just feed the beast with Dyson spheres and this is the end point of the intelligent universe?

bqmjjx0kac0y ago

xvilka0y ago

Too bad the code isn't published. I would expect everything from DeepMind to be opensource, except model itself.

Workaccount20y ago

In the past AI wasn't really competing with other AI for user dollars. It was more just a bolted on "feature".

Nowadays it makes much more sense to share less.

simianwords0y ago

I'm surprised I'm not able to find this out - can some one tell me whether AlphaEvolve involves backprop or not?

I honestly have no idea how AlphaEvolve works - does it work purely on the text level? Meaning I might be able to come up with something like AlphaEvolve with some EC2's and a Gemini API access?

dmos620y ago

No, the program and prompt databases use a genetic algorithm.

simianwords0y ago

So with just a server an Gemini access + their code I can achieve the same thing? Nice

dmos620y ago

Yes, but I don't think that their code is accessible. And the whitepaper is light on details.

intrepidsoldier0y ago

allmathl0y ago

Can someone explain how an "agent" is distinct from a "chatbot"?

I'm reading descriptions of agents and it just seems like the same tech deployed with authority to write and a scheduler

Joel_Mckay0y ago

A >2% bump in algorithmic performance is pretty impressive given the search approach.

Packing problems are hard, and it is fun to see new interest in the area given these show up in weird places. =3

smusamashah0y ago

I wish it can re-write everything in assembly or even binary in a super optimized form, debloat all software including itself and then grow from there.

73737373730y ago

anyone else feel out-evolved yet?

xigency0y ago

Not really, only when looking back at the 60's and 70's when most of the important algorithms I use were invented. For example, LR parsing and A*.

[end of snark]

Sorry, I'm getting tired of ad-fueled corporations trying to get me to outsource critical thinking.

omnee0y ago

sponnath11mo ago

Machines have been outperforming humans at a variety of tasks for quite a while now. I'm unconvinced that AlphaEvolve can lead to some sort of singularity.

inasio0y ago

It sounds to me like a hyperparameter optimizer (fast evaluator) guided by AI; I wonder if it's related to Google's Vizier

hnthrow903487650y ago

Would love for AI to kill the leetcode interview

boshalfoshal0y ago

AI will indeed kill the leetcode interview - because once it replaces human SWEs you don't really need to give leetcode-style brainteasers to any human anymore.

lawlessone0y ago

You never needed to.

chpatrick0y ago

Not defending it but I think it was more of a test if you have the dedication (and general smarts) to grind them for a few months than software engineering skills.

boshalfoshal0y ago

Yes, and you will never need to.

curtisblaine0y ago

It will just move the leetcode interview to in-person.

esafak0y ago

... and make credentials more important. Be careful what you ask for.

anticensor0y ago

https://www.interviewcoder.co/ already served that.

curtisblaine0y ago

asadm0y ago

that was already solved 2 years back.

visarga0y ago

Good method to generate synthetic training data, but only works for domains where validation can be scaled up.

thisisauserid0y ago

It seemed appropriate to use Gemini to make sure my answers were ideal for getting access to the preview.

fundaThree0y ago

What is an "advanced" algorithm? How do you differentiate this from other algorithms?

pawanjswal0y ago

AlphaEvolve sounds like the AI version of a genius coder that never sleeps, insane potential!

BhavdeepSethi0y ago

Surprised they didn't answer if they tried using AlphaEvolve to improve AlphaEvolve!

informal0070y ago

Why it emphasize math and computer science more in training stage?

The problems in math and CS are more suitable for training LLMs?

casey20y ago

Why Gemini(s)? Why not LLMs fine tuned for LARPing as a researcher?

factibicongue0y ago

fHr0y ago

sheeesh

backendEngineer0y ago

yuge

ldjkfkdsjnv0y ago

mattlondon0y ago

Interesting times.

My gut is that software engineers will end up as glorified test engineers, coming up with test cases (even if not actually writing the code) and asking the AI to write code until it passes.

proof_by_vibes0y ago

static_void0y ago

Indeed, reasoning in the small and reasoning in the large are different skills. Architecture abstracts over code.

ldjkfkdsjnv0y ago

Generally the product decisions are not given to the engineers. But yeah, engineers will be tuning, prodding, and poking ai systems to generate the code to match the business requirements.

IncreasePosts0y ago

> Any system that can produce code, test the code, and iterate if needed

That isn't every problem in software engineering.

nevertoolate0y ago

It is not that you get downvoted because they don’t understand you, it is because you sell your opinion as fact, like an apostle. For example what does it mean that software engineering is solved?

jpnc0y ago

Check his profile.

> about: I believe in the creation of a machine god

Sounds about right.

linhns0y ago

I wonder if he’s a machine himself?

sannysanoff0y ago

handfuloflight0y ago

> because prophecy is always unpleasant.

Not necessarily. 'Gospel' is translated as good news. The unpleasant news tends towards those within the power structure that the prophet challenges.

squidbeak0y ago

> it is because you sell your opinion as fact.

The guy's making a prediction. Classifying it as some kind of religious zealotry isn't fair to his point or him.

1 more reply

sannysanoff0y ago

it's known idiom, it means: optimal algorithm is found; like in "tic tac toe is solved problem".

nevertoolate0y ago

If I squint I can see some connection between Go (game) and (Software) Engineering (field).

bossyTeacher0y ago

starlust20y ago

bossyTeacher0y ago

kaibee0y ago

> What about vague requirements or cases with multiple potential paths or cases where some technical choices might have important business consequences that shareholders might need to know about?

If the cost of developing the software is 0, you can just build both.

bossyTeacher0y ago

or you can build 20 different versions. Your non technical person won't be happy about this though. They wanted 1 software system not 20 nor 2. Just one

squidbeak0y ago

Isn't what you describe eventually just a context contraint problem?

jackphilson0y ago

Everyone will just turn into a problem solver until there are no more problems.

fHr0y ago

lol

DGAP0y ago

ldjkfkdsjnv0y ago

Some huge percentage of all venture capital in the united states is moving towards solving this problem

owentbrown0y ago

AlphaEvolve is confirming evidence of an intelligence explosion.

The key ingredient for an intelligence explosion is AI accelerating development of AI.

This is it. It’s happening.

4b11b40y ago

That's possibly a bit too general and an over statement...

Remember this approach only works for exploring an optimization for an already defined behavior of a function which has an accordingly well defined evaluation metric.

You can't write an evaluation function for each individual piece of or general "intelligence"...

mycatisblack0y ago

TheMiddleMan0y ago

Yes, but is the inflection point in 12 months or 12 years?

Either way, it's pretty wild.

nprateem0y ago

Maybe this one can stop writing a fucking essay in code comments.

I'm now no longer surprised just how consistently all the gemini models overcomplicate coding challenges or just plain get them wrong.

Shame I'm too tight to pay for Claude RN though...

mattlondon0y ago

Just ask it to only add comments on complex parts (or not at all). Prompt engineering.

nprateem0y ago

Duh why didn't I think of that. Oh wait it was the first thing I tried and it makes NO difference

energy1230y ago

The comment spam is likely a byproduct of RL, it lets the model dump locally relevant reasoning while writing code.

You can try asking it to not do that, but I would bet it would slightly degrade code quality.

Workaccount20y ago

The model likely is doing it more for itself than for you.

You can take the code and give it to another LLM instance and ask it to strip all comments.

nprateem0y ago

It can remove them itself after it's written the code, but basically seems incapable of writing clean code to begin with, unlike Claude

j / k navigate · click thread line to collapse