A child knows that if it says "Mama food," it is likely to get attention, and if it gets attention, it is likely to minimize its hunger. Right now, a neural network can be trained to know that "Mama" occurs often in human dialogue, what words occur around it, even its dictionary definition and images of mothers. But it's not making the deeper connection to a strategy that minimizes hunger.
When I think about this, I wonder if insights from the world of gaming "AI" would be useful in developing the training datasets for real AI. Because you can't be a mother to a billion virtual babies, but you might be able to program a set of heuristics to be a mother to a billion virtual babies. Then you have some system that trains on their life experiences...? All speculation, but very interesting stuff.