In the draft[1], it seems the content decryption module (CDM) is where the action takes place but naturally the keys would be requested from some other key server and passed to the CDM, but according to this draft they keys will be passed via the client media stack.
I'm a bit bemused by this because:
a) CDM's are apparently just proprietary plugins by another name, defeating & poisoning the purpose of the <video> tag.
b) How can you protect anything if you hand over the safe and the keys at the same time?
[1] http://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-medi...