Ok so that's interesting in itself. Apologies if you go into this in the paper, not had time to read it yet, but does this tell us something about the models themselves? Is there a benchmark lurking here? It feels like this is revealing something about the training, but I'm not sure exactly what.