Good thinking, but I think that shifts the issue--namely, that each inter-thread message uses atomic compare and swap to create the message. I assume there'd be a similar bottleneck on the actor that generates the transactionid limited by the number of messages it can send & receive.
Instead, a friend and I have been thinking about how to perhaps modify MVCC to work with distinct transactionid's per partition. Namely, I'm already generating what I call "subtransactionid"'s for each partition involved in a transaction. And those must be ordered for synchronous replication, so I think the way to implement a variation on MVCC may already be mostly there.
I know I still owe you an architectural doc...fixin' ta, ya know.