While I'm primarily experienced in Genetic Algorithms and NNs (so not re-enforcement learning, so much), 50 steps (or generations in a non-steady-state GA) is a very short amount of time, and so learning to properly coordinate multiple degrees of freedom into a successful activity in only 50 steps is, to me, pretty impressive.
We also have the advantage of having some 'physics simulation' software if you will that lets us do some runs in our head before doing it physically.
I work in construction and I regularly throw and catch objects like rolls of tape - duct tape sized rolls - 18V drills, levels, etc. but it's only when we throw something light (IE that air-resistance has an effect) and then you start getting screwed up. We don't expect things to slow down dramatically when it's thrown or falls.
(I think re-enforcement learning was more the point than 'can we teach a robot to flip pancakes')
My preferred neural network structure/design is NEAT (developed by Dr. Kenneth Stanley) and its various derivatives. If I were to work more with neural networks, I would work to expand that design. (my work interests have expanded to include other stochastic algorithms, such as Monte Carlo Localization, as well)
* I was a doctoral student, but I couldn't acquire the advisor I desired (funding) and so I reduced to master's level
Seems like there is always some non-negligible probability that within the factors any learning robot takes into account as part of its success is a factor which is actually totally irrelevant. That's probably somewhat how our own brains operate as well.
It is general a hard problem in adaptive learning algorithms that they only produce results. You can't tell the superstition from the intuition.
Evolution works in the same way: there's no analysis from the phenotypical world back to the genes ("oh, we should be taller; let's just tweak this gene", i.e. Lamarckism). Instead, it's just massively scalable trial and error.
In fact, the logical disconnect between effect and cause is probably a strength rather than a weakness.
Absolutely. I remember reading about an artificial life program which had evolving creatures try to survive in a very harsh artificial world. The strategies developed seemed less than optimal on the face of it, but when the writers of the program hand-coded their own supposedly "perfect" strategy, the increased efficiency of their strategy actually led to a lower overall survival rate. Only after that could they see why.
A deformable pancake would make the experiment batter.
It's interesting to combine the idea of trial-and-error learning with "save that strategy in permanent memory."
1. In the failure trials the robot does not move its arm toward the object as it falls. This is distinctly inhuman! When we initiate a motor program we are constantly checking our prediction for the behavior of both our body and external objects against reality and then updating that program in real time. This bot apparently updates its program only after each trial.
2. The robot moves back to its standard starting position after each successful trial. This demonstrates the specificity of the motor pattern it's developed. Due to time pressure and the complexity of variations we deal with it is usually advantageous to learn a generalized pattern rather than a single pattern that works for a constrained set of starting conditions.
As a side note on the difficulty of this task, I agree with sukuriant that the paucity of the information the robot has, especially lack of fine grained touch, is a huge impediment. Secondly recall that in the human brain about 50% of the neurons live in the cerebellum which is strongly implicated in storing and updating fine grained motor patterns. (Gross patterns and intentions being initiated in the motor cortex).
> "After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan."
http://video.google.com/videoplay?docid=3757897210640719617#
The control uses the dynamics of the robot to optimize a trajectory to increase the weight that the robot can lift.
Sarcasm aside, I'm sure stressed-out mothers and financiers alike rejoice...