They wouldn't necessarily need to serve different data to each client when they control the whole playback stack, they could get clever by including duplicate frame data with subtle differences and making each device key only able to decrypt one of the variants. Repeat that throughout a show to add additional bits to the signature until it's uniquely identifiable.
But they don't control the playback stack, once the attacker has the keys. The attacker brings their own stack, decrypting the data with their own software.