Yes, solid ideas, I was thinking along similar lines.
There are two possible meta approaches for this:
a) hard-code the whole progression, which is tedious and not very adaptive to the learner's level
b) use algorithmic exercise/question selection. Then, suddenly, degrees of freedom explode: You had 5 exercises in a row with 6 images, maybe you'd want something easier. It was a long time since we practiced anything with "apple". But "cutting" was just practiced, it would be boring to bring it up again. So here is a set with all possible "apple" questions in the database. Some include words not yet practiced. Which one do we pick? And so on. It's a fascinating problem, far more complex than I anticipated. And if you use simple shortcuts, you quickly end in very boring loops :)