Right.
I was tempted to say that our pace was somewhat reduced (not to 50%), and our code was greatly improved, because that was our impression. We did not adjust old estimates to account in any way for pairing, and we saw a trend toward improvement in |actual - estimated| costs.
Then again, every feature/bug and programmer is different. We didn't have many old estimates for unfinished work. We didn't consciously account for pairing in new estimates, but we didn't have any sort of experimental rigor. The non-pairing programmers definitely wrote worse code, but they were the weaker programmers in the first place. Maybe our improved estimation accuracy was because our product was maturing.
There are some reviewed papers covering experiments on pairing. Laurie Williams [http://collaboration.csc.ncsu.edu/laurie/] is an author of many of them. Here's one not by her: http://portal.acm.org/citation.cfm?doid=1145293.