It seems to have more details (it's the paper before the linked one) about the actual training, but I'm scanning it and this isn't my field so maybe it's too light also.
Not really, that's for the binary version of the algorithm, the ternary version can propagate a lot more information in the backwards pass using the fact outputs either -1, 0, 1.
But I imagine they are using the same thing since a bunch of the authors are the same.