Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models (opens in new tab)

(dani2442.github.io)

171 pointssebzuddas2mo ago53 comments

53 comments

33 comments · 5 top-level

measurablefunc2mo ago· 10 in thread

It's not clear or obvious why continuous semantics should be applicable on a digital computer. This might seem like nitpicking but it's not, there is a fundamental issue that is always swept under the rug in these kinds of analysis which is about reconciling finitary arithmetic over bit strings & the analytical equations which only work w/ infinite precision over the real or complex numbers as they are usually defined (equivalence classes of cauchy sequences or dedekind cuts).

There are no dedekind cuts or cauchy sequences on digital computers so the fact that the analytical equations map to algorithms at all is very non-obvious.

jampekka2mo ago

Continuous formulations are used with digital computers all the time. Limited precision of floats sometimes causes numerical instability for some algorithms, but usually these are fixable with different (sometimes less efficient) implementations.

Discretizing e.g. time or space is perhaps a bigger issue, but the issues are usually well understood and mitigated by e.g. advanced numerical integration schemes, discrete-continuous formulations or just cranking up the discretization resolution.

Analytical tools for discrete formulations are usually a lot less developed and don't as easily admit closed-form solutions.

shiandow2mo ago

It is definitely not obvious, but I wouldn't say it is completely unclear.

For instance we know that algorithms like the leapfrog integrator not only approximate a physical system quite well but even conserve the energy, or rather a quantity that approximates the true energy.

There are plenty of theorems about the accuracy and other properties of numerical algorithms.

measurablefunc2mo ago

How do they apply in this case?

sfpotter2mo ago

This is what the field of numerical analysis exists for. These details definitely have been treated, but this was done mainly early in the field's history; for example, by people like Wilkinson and Kahan...

magicalhippo2mo ago

I just took some basic numerical courses at uni, but every time we discretized a problem with the aim to implement it on a computer, we had to show what the discretization error would lead to, eg numerical dispersion[1] etc, and do stability analysis and such, eg ensure CFL[2] condition held.

So I guess one might want to do a similar exercise to deriving numerical dispersion for example in order to see just how discretizing the diffusion process affects it and the relation to optimal control theory.

[1]: https://en.wikipedia.org/wiki/Numerical_dispersion

[2]: https://en.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E2%...

phreeza2mo ago

Doesn't continuous time basically mean "this is what we expect for sufficiently small time steps"? Very similar to how one would for example take the first order Taylor dynamics and use them for "sufficiently small" perturbations from equilibrium. Is there any other magic to continuous time systems that one would not expect to be solved by sufficiently small time steps?

measurablefunc2mo ago

You should look into condition numbers & how that applies to numerical stability of discretized optimization. If you take a continuous formulation & naively discretize you might get lucky & get a convergent & stable implementation but more often than not you will end up w/ subtle bugs & instabilities for ill-conditioned initial conditions.

1 more reply

tsimionescu2mo ago

Infinity has properties that finite approximations of it just don't have, and this can lead to serious problems for certain theorems. In the general case, the integral of a continuous function can be arbitrarily different from the sum of a finite sequence of points sampled from that function, regardless of how many points you sample - and it's even possible that the discrete version is divergent even if the continous one is convergent.

I'm not saying that this is the case here, but there generally needs to be some justification to say that a certain result that is proven for a continuous function also holds for some discrete version of it.

For a somewhat famous real-world example, it's not currently known how to produce a version of QM/QFT that works with discrete spacetime coordinates, the attempted discretizations fail to maintain the properties of the continuous equations.

cubefox2mo ago

Real numbers mostly appear in calculus (e.g. the chain rule in gradient descent/backpropagation), but "discrete calculus" is then used as an approximation of infinitesimal calculus. It uses "finite differences" rather than derivatives, which doesn't require real numbers:

https://en.wikipedia.org/wiki/Finite_difference

I'm not sure about applications of real numbers outside of calculus, and how to replace them there.

imtringued2mo ago

I can't tell if this a troll attempt or not.

If your definition of "algorithm" is "list of instructions", then there is nothing surprising. It's very obvious. The "algorithm" isn't perfect, but a mapping with an error exists.

If your definition of "algorithm" is "error free equivalent of the equations", then the analytical equations do not map to "algorithms". "Algorithms" do not exist.

I mean, your objection is kind of like questioning how a construction material could hold up a building when it is inevitably bound to decay and therefore result in structural collapse. Is it actually holding the entire time or is it slowly collapsing the entire time?

lain982mo ago· 9 in thread

I find myself completely outclassed by mathematicians in my own field. I tried to learn a little math on the side after my regular software engineer gig but I'm completely outclassed by phd's.

I am unsure of the next course of action or if software will survive another 5 years and how my career will look like in the future. Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

numbers_guy2mo ago

I guess I have the opposite experience. I have a post-graduate level of mathematical education and I am dismayed at how little there is to be gained from it, when it comes to AI/ML. Diffusion Models and Geometric Deep Learning are the only two fields where there's any math at all. Many math grads are struggling to find a job at all. They aren't outclassing programmers with their leet math skillz.

porridgeraisin2mo ago

The real use is in actually seeing connections. Every field has their own maths and their own terminologies, their own assumptions for theorems, etc.

More often than not this is duplicated work (mathematically speaking) and there is a lot to be gained by sharing advances in either field by running it through a "translation". This has happened many times historically - a lot of the "we met at a cafe and worked it out on a napkin" inventions are exactly that.

Math proficiency helps a lot at that. The level of abstraction you deal with is naturally high.

Recently, the problem of actually knowing every field enough, just cursorily, to make connections is easier with AI. Modern LLMs do approximate retrieval and still need a planner + verifier, the mathematician can be that.

This is somewhat adjacent to what terry tao spoke about, and the setup is sort of what alpha evolve does.

You get that impression because such advances are high impact and rare (because they are difficult). Most advances come as a sequence of field-specific assumption, field-specific empirical observation, field-specific theorem, and so on. We only see the advances that are actually made, leading to an observation bias.

srean2mo ago

Don't worry when stochastic grads get stuck math grads get going.

(One of) The value(s) that a math grad brings is debugging and fixing these ML models when training fails. Many would not have an idea about how to even begin debugging why the trained model is not working so well, let alone how to explore fixes.

1 more reply

ecshafer2mo ago

IMO Computer Science doesn't have enough mathematics in the core curriculum. I think more CS students should be double majoring or minoring in Physics and/or Math. The skills you gain in analyzing problems and constructing models in Physics, finding truth/false values and analyzing problems in math, and the algorithmic skills in CS really compliment each other.

Instead of people "hacking" university education to make them purely fotm job training centers. The real hack would be something that really drills down at the fundamentals. CS, Math, Physics, and Philosophy to get an all around education in approaching problems from fundamentals I think would be the optimal school experience.

AndrewKemendo2mo ago

The big thing that made it all click for mathematics was that I stopped thinking about mathematics the way that it was taught to me and I started thinking about it the way that it naturally felt correct to me

So in my specific case I stopped thinking about mathematics as: how to interpret a sequence of symbols

But instead I decided to start thinking about it as “the symbols tell me about the multidimensional topological coordinate space that I need to inhabit

So now when I look at a equation (or whatever) my first step is “OK how do I turn this into a topology so that I can explore the toplogical space the way that a number would”

Kind of like if you were to extend Nagle’s “what it’s like to be a bat” but instead of being a bat you’re a number

dsign2mo ago

> Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

The way I like to look at it is that I'm engaged in the ice trade and they are about to invent everything else that will end mine and every other current trade. Which leaves me with two practical options: a) deep despair. b) to become a Jacks of all trades, master of none, but oftentimes better than a master of one. The Jacks can, for now, capitalize in the thing that the Machines currently lack, which is agency.

1 more reply

rsp19842mo ago

Don't despair. The key to becoming proficient in advanced subjects like this one is to first try to understand the fundamentals in plain language and pictures in your mind. Ignore the equations. Ask AI to explain the topic at hand at the most fundamental level.

Once the fundamental concepts are understood, what problem is being solved and where the key difficulties are, only then the equations will start to make sense. If you start out with the math, you're making your life unnecessarily hard.

Also, not universally true but directionally true as a rule of thumb, the more equations a text contains the less likely it is that the author itself has truly grasped the subject. People who really grasp a subject can usually explain it well in plain language.

griffzhowl2mo ago

> People who really grasp a subject can usually explain it well in plain language.

That's very much a matter of style. An equation is often the plainest way of expressing something

1 more reply

RA_Fisher2mo ago

AI makes it easier to catch up. :)

lukko2mo ago· 8 in thread

I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level.

I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful.

Thank you!

brandonb2mo ago

OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/

It includes both mathematical formulas and PyTorch code.

I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.

jmalicki2mo ago

Cool!

It's also nice that Sutton & Barto belabors a lot of old stuff that is no longer obsessed over, and this skims through that and gets to the stuff that is much more relevant today.

empiricus2mo ago

Even this OpenAI course is from 2020? Are there no useful recent updates on the subject, especially now with everyone working and using RL?

ActivePattern2mo ago

Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field.

Read here: http://incompleteideas.net/book/the-book-2nd.html

sardukardboard2mo ago

I worked thru David Silver’s RL course a while back, it’s got great explanations as he builds up the equations. It’s light on implementation, but the intuitive side really complements more code-heavy examples that lack the “why” behind the equations.

https://davidstarsilver.wordpress.com/teaching/

porridgeraisin2mo ago

The bellman equations (exactly as written above) are not found in ML libraries.

This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL.

Apart from the resources you use (good ones in other comments already), try to get the initial mental model of the whole field right, that is important since everything you read can then fit in the right place of that mental model. I will try to give one below.

- the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand.

- first basic bellman equation solving (this is code wise just solving a system of linear equations)

- an algo you will come across called policy iteration (code wise, a bunch of for loops..)

- here you will be able to see how different parts of the algo become impossible in different setups, and what approximations can be done for each of them (and this is where the first neural network - called "function approximator" in RL literature - comes into play). Here you can recognise approximate versions of the bellman equation.

- here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance.

- you also notice problems of this approach that arise because a) you don't have much high quality data and b) learning recursivelt with neural networks is very unstable, this motivates stuff like PPO.

- then you can take a step back, look at deep RL, and re-cast everything in normal ML terms. For example, techniques like TD learning (the term you would have used so far) can be re-cast as simply "data augmentation", which you do in ML all the time.

- at this point you should get in the weeds of actually engineering at scale real RL algos. Stuff like atari benchmarks. You will find that in reality, the algos as learnt are more or less a template and you need lots of problems specific detailing to actually make it work. And you will also learn engineering tricks that are crucial. This is mostly computer science stuff (increasing throughout on gpu etc - but correctly! without changing the model assumptions)

- learn goal conditioned RL, imitation learning, some model based RL like alphazero/dreamer after all of the above. You will be able to easily understand it in the overall context at this point. First two are used in robotics quite a bit. You can run a few small robotics benchmarks at this point.

- learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.

in-silico2mo ago

> The bellman equations (exactly as written above) are not found in ML libraries. This is because they work assuming you know a model of the data. Most real world RL is model-free RL.

Q-learning (the usual application of the Bellman equation) is generally model-free. It is also commonly found in reinforcement learning libraries.

1 more reply

srean2mo ago

I would recommend that you start with one of the classics (not much of deep RL)

https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto...

This will have a gentler learning curve. After this you can move on to more advanced material.

The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming.

Happy reading.

Cloudly2mo ago· 1 in thread

Ever since the control bug bit me in my EE undergrad years I am happy to see how useful the knowledge remains. Of course the underlying math of optimization remains general but the direct applications of control theory made it much more appetizing for me to struggle through.

sebzuddasOP2mo ago

My favourite subject!

jesuslop2mo ago

Nice summary, saving it. If author is around, Bellman equation label ended overlapped to eqn., and pargraph quoting signs got into HJB displayed one. Suggest changes is 404 not found. Liked the presentation overall, thank you!

j / k navigate · click thread line to collapse

53 comments

33 comments · 5 top-level

measurablefunc2mo ago· 10 in thread

There are no dedekind cuts or cauchy sequences on digital computers so the fact that the analytical equations map to algorithms at all is very non-obvious.

jampekka2mo ago

Analytical tools for discrete formulations are usually a lot less developed and don't as easily admit closed-form solutions.

shiandow2mo ago

It is definitely not obvious, but I wouldn't say it is completely unclear.

There are plenty of theorems about the accuracy and other properties of numerical algorithms.

measurablefunc2mo ago

How do they apply in this case?

sfpotter2mo ago

magicalhippo2mo ago

[1]: https://en.wikipedia.org/wiki/Numerical_dispersion

[2]: https://en.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E2%...

phreeza2mo ago

measurablefunc2mo ago

1 more reply

tsimionescu2mo ago

cubefox2mo ago

https://en.wikipedia.org/wiki/Finite_difference

I'm not sure about applications of real numbers outside of calculus, and how to replace them there.

imtringued2mo ago

I can't tell if this a troll attempt or not.

If your definition of "algorithm" is "list of instructions", then there is nothing surprising. It's very obvious. The "algorithm" isn't perfect, but a mapping with an error exists.

If your definition of "algorithm" is "error free equivalent of the equations", then the analytical equations do not map to "algorithms". "Algorithms" do not exist.

lain982mo ago· 9 in thread

I find myself completely outclassed by mathematicians in my own field. I tried to learn a little math on the side after my regular software engineer gig but I'm completely outclassed by phd's.

numbers_guy2mo ago

porridgeraisin2mo ago

The real use is in actually seeing connections. Every field has their own maths and their own terminologies, their own assumptions for theorems, etc.

Math proficiency helps a lot at that. The level of abstraction you deal with is naturally high.

This is somewhat adjacent to what terry tao spoke about, and the setup is sort of what alpha evolve does.

srean2mo ago

Don't worry when stochastic grads get stuck math grads get going.

1 more reply

ecshafer2mo ago

AndrewKemendo2mo ago

So in my specific case I stopped thinking about mathematics as: how to interpret a sequence of symbols

But instead I decided to start thinking about it as “the symbols tell me about the multidimensional topological coordinate space that I need to inhabit

So now when I look at a equation (or whatever) my first step is “OK how do I turn this into a topology so that I can explore the toplogical space the way that a number would”

Kind of like if you were to extend Nagle’s “what it’s like to be a bat” but instead of being a bat you’re a number

dsign2mo ago

> Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

1 more reply

rsp19842mo ago

griffzhowl2mo ago

> People who really grasp a subject can usually explain it well in plain language.

That's very much a matter of style. An equation is often the plainest way of expressing something

1 more reply

RA_Fisher2mo ago

AI makes it easier to catch up. :)

lukko2mo ago· 8 in thread

I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level.

I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful.

Thank you!

brandonb2mo ago

OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/

It includes both mathematical formulas and PyTorch code.

I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.

jmalicki2mo ago

Cool!

It's also nice that Sutton & Barto belabors a lot of old stuff that is no longer obsessed over, and this skims through that and gets to the stuff that is much more relevant today.

empiricus2mo ago

Even this OpenAI course is from 2020? Are there no useful recent updates on the subject, especially now with everyone working and using RL?

ActivePattern2mo ago

Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field.

Read here: http://incompleteideas.net/book/the-book-2nd.html

sardukardboard2mo ago

https://davidstarsilver.wordpress.com/teaching/

porridgeraisin2mo ago

The bellman equations (exactly as written above) are not found in ML libraries.

This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL.

- the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand.

- first basic bellman equation solving (this is code wise just solving a system of linear equations)

- an algo you will come across called policy iteration (code wise, a bunch of for loops..)

- here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance.

- learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.

in-silico2mo ago

> The bellman equations (exactly as written above) are not found in ML libraries. This is because they work assuming you know a model of the data. Most real world RL is model-free RL.

Q-learning (the usual application of the Bellman equation) is generally model-free. It is also commonly found in reinforcement learning libraries.

1 more reply

srean2mo ago

I would recommend that you start with one of the classics (not much of deep RL)

https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto...

This will have a gentler learning curve. After this you can move on to more advanced material.

The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming.

Happy reading.

Cloudly2mo ago· 1 in thread

sebzuddasOP2mo ago

My favourite subject!

jesuslop2mo ago

j / k navigate · click thread line to collapse