Yes, the far jump was never necessary on any processor, only a convention. You can stay in the same segment as in real mode and it will continue to work. But some kind of control transfer to flush the queue must be done shortly after the LMSW / MOV CR0, or things may break in ways that I'm not entirely clear on.
My test code looked like this:
mov ax,1 ;new MSW
mov bx,TestSel ;pointer to selector value into BX
mov dx,[bx] ;and load into DX
mov cl,31 ;shift count for delay
cli ;disable interrupts
lgdt [Gdtr]
lidt [Idtr]
jmp enter_pm ;flush queue now
align 2
enter_pm: ;go!
rol cl,cl ;delay while following instructions decode
lmsw ax ;set PE bit
mov es,[bx] ;should load selector 0x0010 into ES
mov ds,dx ;should set DS base to 0x00100 [NOPE]
str ax ;should trap because not allowed in real mode
ud2 ;trap anyway in case it didn't
On the 286, this always caused the processor to reset. Replacing one of the two segment load instructions with a same-length "mov ax,ax" didn't change that, but removing one of them did.
In that case the "str ax" acted as the control transfer that flushed the queue (it was still decoded in real mode, so it went to the "invalid opcode" entry point). No clue as to what exactly happens to cause the reset when three instructions are run from the queue, some timing issue related to when the PE bit actually changes vs. what the decoder is doing at this point?