The inbox/outbox terminology is further inherently tied to async message passing with a message pump/queue, and so on.
You may not be sure it was conceptualised this way, but it really is irrelevant whether or not it was conceptualised this way because the spec-as-written describes an architecture that to someone with experience in the field is most reasonably interpreted that way as it is inherent to the vocabulary used, and where interpreting it that way creates a significantly more cohesive architecture. You're free to interpret it otherwise, but it's now been pointed out to you that treating it this way simplifies the understanding of it - if you choose to ignore that advice, then so be it.
I might be sympathetic to the argument that it is may not be spelled out sufficiently clearly to someone without experience in this area or unfamiliar with the actor model, but there's a question of to what extent a w3c working group should feel a need to assume readers are not familiar with these kinds of models..