I've used rfc1751[0] which is word-based rather than sentence-based, but it's pretty convenient. I use it for my password sharing tool[1] which creates prompts that look like
=== secrets.vm ===
common name: secrets.vm
fingerprint: b957e10c998faa9909cff3ba4ec35485d04708c3ecc7481fe14d7f07bc0229cd
public key: c15e697e4807793ef8a9461a7b2c6cf2266d1ec1480a594e83b54e7b75e07702
public sign: f1db594eb55fe97657c57f2aa01afd1210a46d42d80d5552ac4d548162d4968e
mnemonic: AM ROBE KIT OMEN BATE ICY TROY RON WHAT HIP OMIT SUP LID CLAY AVER LEAR CAVE REEL CAN PAM FAN LUND RIFT ACME
does that look right? [y/n]
where "mnemonic" is the rfc1751 mnemonic of the sha256 of the other fields and is designed to be shouted across a room.I'd definitely be interested in a standardised sentence-based fingerprinting system akin to rfc1751
I'm not sure if you read my writeup but I attempted to address that "users only glance at one or two characters" by suggesting the client show the users which characters to compare. It's a little kludgy with a text UI, however.
The idea is that the field of characters is large enough that comparing only a few is fine-- so long as they're selected in a way which isn't predictable to the attacker.