The intuition pump surely works just fine as an intuition pump. But from a purely scientific view:
if you were to try to stage it as an actual scientific experiment it fails to hold up. No control (there's just the one room), no single or double blind (the researcher self-reports) , and badly defined elements (the contents of the notebook are not specified). Of course you reach the conclusion that room doesn't understand.
Compare to Turings imitation game experiment: Two participants (principal and control if you will), Double blind (they're in closed rooms so the judges can't see them), and you can have multiple people doing the scoring. We can conduct this IRL, and in fact if you've used IRC or discord, it's almost a natural experiment there.
Looking at it as a scientific experiment isn't so strange, Searle is responding to an executable experiment with an intuition pump. Why should the latter win?
Note that there are many ways to cut this, but this is one of mine.