undefined | Better HN

0 pointsIanCal2y ago0 comments

Does this help? https://arxiv.org/abs/2310.11453

It seems to have more details (it's the paper before the linked one) about the actual training, but I'm scanning it and this isn't my field so maybe it's too light also.

0 comments

1 comments · 1 top-level

llm_trw2y ago

Not really, that's for the binary version of the algorithm, the ternary version can propagate a lot more information in the backwards pass using the fact outputs either -1, 0, 1.

But I imagine they are using the same thing since a bunch of the authors are the same.

j / k navigate · click thread line to collapse