That sounds expensive and inefficient. Peoples' interpretations of music (and abstract art more generally) can be shockingly different; I suspect the model would not get a clear signal from the result.
But that makes me wonder to what extent labeling can be programmed -- extracting chord changes, dynamics changes, tempo, gross timbral characteristics, etc.