Thinking a bit more about it, I guess you could have an algorithm that detects the start of a fragment based on speech (e.g. you say "fragment start!") and you could rank the fragments at the end, e.g. "fragment stop score 8!" I suppose you could do that with open source speech-to-text tools. And use ffmpeg to cut and stitch everything together.