This sounds very cool, and I can imagine one way they might have done it: instrument a program probably including with the 286 in-circuit emulator like features, then link its code and data for maximum locality in 64 KiB chunks to minimize the segment switching hit.
And/or make better use of the extra segment and maybe stack segment, compiler technology there. I'd imagine other compiler hacks would be useful, for example compare to today's bigger systems where the premium is on having as much in the most local caches as possible. Cleverness once you're in them is overwhelmed by the penalty of going to a more distant cache, memory, or taking a TLB miss. "Cache is the new RAM, RAM is the new disk, disk is the new tape." and all that.