I mean, we're under the assumption here that the app that's listening isn't owned by the attacker. If it was, you already know the person's identity (and location, and probably a lot more) because you control the app on their device. It would be a lot of work to target and infect someone's phone with malware just so you could confirm that they did in fact visit a page on Tor. Probably the same amount of work to just infect their computer with malware and take a screenshot.
If we assume it's a third-party ad network, which is the only plausible explanation for why there's an app listening for ultrasonic cues on a user's device, it would need to be listening all the time. That is what the article describes.