Introduction to Reinforcement Learning (2015) (opens in new tab)

(deepmind.com)

243 pointsarithma5y ago44 comments

44 comments

29 comments · 13 top-level

dgb235y ago· 5 in thread

Asking from a layman's perspective:

I've read a bit about genetic algorithms or evolutionary computation at some point. Apparently it achieves good results as it can find discrete solutions for complex, well defined problems.

Reinforcement learning is something I know even less about. But from what I gathered it is also most successful in well defined problems and systems (such as games).

So my question is: How do they relate? Is there overlap and what are the most significant conceptual differences?

currymj5y ago

a big conceptual point in RL is the focus on the Bellman equation. value of a state equals immediate reward plus discounted future value. if you know the value of every state, just always move to pick the highest value.

well known methods like Q-learning are basically just iterative, approximate methods to find solutions to the Bellman equation — i.e. a measure of value for every state of the world, such that the Bellman equation is satisfied.

policy optimization methods don’t do this, but there are still mathematical connections back to the Bellman equation (there is a duality relationship between value functions and policies).

I would say this focus is a big part of what makes the field of RL unique.

computerphage5y ago

Hmm... One way that I look at it is evolutionary computation is an optimization strategy. It's characterized by tracking a population of candidates, discarding the lowest scoring, mutating the survivors, and cross-combining elements from multiple candidates.

RL is an optimization domain. It's the name of the problem, not the solution. You can straightforwardly use evolutionary algorithms on RL problems. However, a lot of the recent success in RL has come from using deep learning to try to solve various RL problems, not from trying evolutionary computation.

suref5y ago

While RL is about the problem it's also about the solution. Problems/Environments are formulated in a way where methods can be applied easily (i.e. Markov decision process) and thus the solutions are directly connected to the way the problem is formulated.

Deep learning is used for function approximation and is not in contrast with evolutionary computation. You can train a neutral network policy (mapping states to actions) with an evolutionary algorithm, but most of the success has come from methods that utilize the internal structure of the problem as mentioned earlier and evolutionary algorithms do not, which is what makes these optimization strategies both weak and powerful.

dgb235y ago

That makes sense thank you! It's easy to conflate the concept/problem with the tool as an outsider.

1 more reply

Buttons8405y ago

Genetic algorithms randomly change things and then test to see if they're better. Reinforcement learning does analysis on past observations and then makes deliberate improvements.

jsemrau5y ago· 4 in thread

That's one of these moments in life where you see tech and you know it will change the world, but don't see the problem yet.

patrick_halina5y ago

RL is a good theoretical solution for personalization: given a user state, select an action that maximizes a long term reward (eg. revenue/engagement.) It’s tricky building the implementations because unlike Go/Chess/Atari it’s hard to simulate humans. So you have to train the agents with batches of data offline (ie. using historic data from the agent’s past actions.) This is challenging because you don’t get as many chances to try different hyper parameters. It’s starting to be used more in industry though.

jmeister5y ago

I’ve not kept up with the recent developments in this field - is Vowpal Wabbit widely used now? Any competitors? Or do people build their own in-house systems?

Thanks

1 more reply

vojta_letal5y ago

Does world really work like that?

jsemrau5y ago

When the first PC with Basic launched in the 80s many people wanted to develop for it.

When the iPhone Appstore launched, many people started to build apps in the ecosystem.

While it might be it bit too early to compare RL to those advances in technology. I personally feel there is huge potential. I might be wrong though. And I am fine with that.

2 more replies

ilaksh5y ago· 3 in thread

I used to be a bit more excited about RL. I mean, it's still definitely something I have to learn, but one aspect of it _seems_ lacking to me and is messing with my motivation to learn it. I'm sure someone will happily explain all the ways I am ignorant.

It seems like there is a lot of emphasis on "direct RL" or whatever where they don't even really think about the model much, but it's I guess often inside of the policy or something?

But it seems to me as someone who has just started learning about robotics, that I absolutely need to first verify that I have an accurate model of the environment which I can inspect. It seems like a lot of RL approaches might not even be able to supply that.

I mean what I am stuck on as far as creating a robot (or virtual robot) is having a vision system that does all of the hard things I want. I feel like if I can detect edges and surfaces and shapes in 3D, parts of objects and objects, with orientation etc., and in a way I can display and manipulate it, that level of understanding will give me a firm base to build the rest of learning and planning on.

I know all of that is very hard. It seems like they must have tried that for awhile and then kind of gave up to head down the current direction of RL? Or just decided it wasn't important. I still think it's important.

bearzoo5y ago

I don't think people have given up on model based RL, it is just that describing a proper model is (like you are saying) very difficult.

in the case you haven't seen or read the following: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

zekrioca5y ago

You do not necessarily need to fully know the environment you are in, but you need to be able to evaluate how good the actions that you can take are in terms of an utility function. That’s how a RL algorithm can learn that going through a wall is a bad decision (reward(“ahead”) <= “$0“), and then decides for something else such as turning right or left (reward(“Left” || “right”) > “$0”).

I think the main problem with RL is deciding if an utility function — as precise as it may be — can fully capture/estimate all nuances of an environment. Another problem is at adapting to the environment by having new actions added dynamically into your model and having it to converge as quickly as possible.

howlin5y ago

One thing to keep in mind about direct (learn the policy/behavior) versus indirect (learn the model and then simulate behaviors on the model to choose the best) is that sometimes it's much easier to find a good enough policy than it is to learn an accurate enough model for simulation. Driving is a good example of this. Most of the time all you need to do is stay in your lane and obey the rules for intersections. A simulation of a driving environment, on the other hand, is quite difficult.

captn3m05y ago· 2 in thread

I did the course during my RC batch this year. Highly recommended if you're looking to learn RL.

ilaksh5y ago

What's an RC batch?

captn3m05y ago

https://www.recurse.com/

>The Recurse Center is a self-directed, community-driven educational retreat for programmers based in New York City and currently operating online.

foobaw5y ago· 1 in thread

David Silver is awesome! Loved him in the Alpha Go documentary as well

ddon5y ago

Just posting a link to a documentary:

https://www.youtube.com/watch?v=WXuK6gekU1Y

platz5y ago· 1 in thread

Hrm so are markov decision processes RL?

computerphage5y ago

That's one of RL's traditional formulations, yes. Bandits problems are another one. They've been generalized together into POMDPs partially observable Markov decision processes.

jointpdf5y ago

This is (roughly) the same course but the lectures are from 2018. The audio/video quality is quite a a bit better:

https://youtube.com/playlist?list=PLqYmG7hTraZBKeNJ-JE_eyJHZ...

hfkldjsjfkdj5y ago

When it comes to education online I prefer these, where they publish actual university course lectures and assignments than what you can find on Coursera and alike. There is still a big gap in the quality and depth.

in3d5y ago

I watched this course and David Silver is a great lecturer, better than anybody else I’ve seen actually. I hope he does more publicly viewable courses in the future.

hideo77465y ago

On one hand you're right, methods like Q-learning are model-free and do not necessarily encode much about state dynamics. The Q-function is a feature (function) of state and while ita may not say much about the model, it does encode the most important aspect of the model in terms of solving the task. Namely, it predicts the accumulated reward conditional on next actions actions. That makes it a somewhat narrow representation of state on its own. But, if you consider an environment that has many reward signals, and you learn Q functions for each, this ensemble of Q functions can consitute a rich representation of state. Depending on what the reward functions are, the associated Q functions may be sufficient to construct a full model. so I guess my point is that the learned quantities in RL encode key aspects of state, and when you expand beyond the single task/single reward RL setting the lines between value and model can become blurred.

luplex5y ago

I'm taking an adaptation of this class. My professor is simply reusing Silver's slides, so I'm watching the original lecture instead. Highly recommend!

visarga5y ago

Watched the course and it's great, probably the best intro to RL. Multiple watches needed as the subject is very deep.

spicyramen5y ago

Just watched his interview with Lex, inspiring engineer

j / k navigate · click thread line to collapse

44 comments

29 comments · 13 top-level

dgb235y ago· 5 in thread

Asking from a layman's perspective:

I've read a bit about genetic algorithms or evolutionary computation at some point. Apparently it achieves good results as it can find discrete solutions for complex, well defined problems.

Reinforcement learning is something I know even less about. But from what I gathered it is also most successful in well defined problems and systems (such as games).

So my question is: How do they relate? Is there overlap and what are the most significant conceptual differences?

currymj5y ago

policy optimization methods don’t do this, but there are still mathematical connections back to the Bellman equation (there is a duality relationship between value functions and policies).

I would say this focus is a big part of what makes the field of RL unique.

computerphage5y ago

suref5y ago

dgb235y ago

That makes sense thank you! It's easy to conflate the concept/problem with the tool as an outsider.

1 more reply

Buttons8405y ago

Genetic algorithms randomly change things and then test to see if they're better. Reinforcement learning does analysis on past observations and then makes deliberate improvements.

jsemrau5y ago· 4 in thread

That's one of these moments in life where you see tech and you know it will change the world, but don't see the problem yet.

patrick_halina5y ago

jmeister5y ago

I’ve not kept up with the recent developments in this field - is Vowpal Wabbit widely used now? Any competitors? Or do people build their own in-house systems?

Thanks

1 more reply

vojta_letal5y ago

Does world really work like that?

jsemrau5y ago

When the first PC with Basic launched in the 80s many people wanted to develop for it.

When the iPhone Appstore launched, many people started to build apps in the ecosystem.

While it might be it bit too early to compare RL to those advances in technology. I personally feel there is huge potential. I might be wrong though. And I am fine with that.

2 more replies

ilaksh5y ago· 3 in thread

It seems like there is a lot of emphasis on "direct RL" or whatever where they don't even really think about the model much, but it's I guess often inside of the policy or something?

bearzoo5y ago

I don't think people have given up on model based RL, it is just that describing a proper model is (like you are saying) very difficult.

in the case you haven't seen or read the following: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

zekrioca5y ago

howlin5y ago

captn3m05y ago· 2 in thread

I did the course during my RC batch this year. Highly recommended if you're looking to learn RL.

ilaksh5y ago

What's an RC batch?

captn3m05y ago

https://www.recurse.com/

>The Recurse Center is a self-directed, community-driven educational retreat for programmers based in New York City and currently operating online.

foobaw5y ago· 1 in thread

David Silver is awesome! Loved him in the Alpha Go documentary as well

ddon5y ago

Just posting a link to a documentary:

https://www.youtube.com/watch?v=WXuK6gekU1Y

platz5y ago· 1 in thread

Hrm so are markov decision processes RL?

computerphage5y ago

That's one of RL's traditional formulations, yes. Bandits problems are another one. They've been generalized together into POMDPs partially observable Markov decision processes.

jointpdf5y ago

This is (roughly) the same course but the lectures are from 2018. The audio/video quality is quite a a bit better:

https://youtube.com/playlist?list=PLqYmG7hTraZBKeNJ-JE_eyJHZ...

hfkldjsjfkdj5y ago

in3d5y ago

I watched this course and David Silver is a great lecturer, better than anybody else I’ve seen actually. I hope he does more publicly viewable courses in the future.

hideo77465y ago

luplex5y ago

I'm taking an adaptation of this class. My professor is simply reusing Silver's slides, so I'm watching the original lecture instead. Highly recommend!

visarga5y ago

Watched the course and it's great, probably the best intro to RL. Multiple watches needed as the subject is very deep.

spicyramen5y ago

Just watched his interview with Lex, inspiring engineer

j / k navigate · click thread line to collapse