This is pretty neat! It's great to see relatively recent advancements in machine learning, put to use for child education.
A few questions for @Cherian:
1. I see the ASR usage, but where does computer vision come into play?
2. Are you training and/or fine tuning asr models to deal with the speech characteristics of children and new speakers?
3. Is the asr all cloud side, or do you have it running locally in some fashion?