Well, it's trivial for the app to simply add a flag that there is a photo or video in the encrypted content. End to end encryption doesn't mean the app doesn't know what you're sending, it means the company can't decrypt the contents.
But that does not explain how the WhatSim guys can detect multimedia content and charge accordingly. They can't decrypt the traffic and can't modify the app itself.
If the APP itself taks care of monitoring when pix and videos are sent, then the SIM doesn't have to. And since the SIM is only for use with the app, your problem is taken care of.