I disagree. What he is saying is that there is a special rule for languages that he doesn't think you would get at without an enormous amount of data. So a passive learning algorithm wouldn't uncover this structure in a reasonable amount of time or data (I guess it is poor sample efficiency he is worried about). A learning algorithm that has a distribution over it's own internal model of language would be able to ask questions that minimize the uncertainty of the model.