> I am thinking of making a scaled down version of [1] so I can "talk" to long videos, like conference speeches or university lectures. Should take an hour or two to cook it up since it will reuse most of the code from [1].
This would be useful to me as well. If you do end up making it, could you reference it on your existingGithub