I think this whole efficiency thing is a common misconception. Only the public API appears "immediate mode", the internal implementation doesn't need to be, and usually isn't (e.g. it keeps mutating state between frames instead of building everything from scratch again).
The user-side code basically describes what the UI should look like in the current frame, those "instructions" are recorded, and this recording is reasonably cheap.
The UI backend can then figure out how to "diff" the new instruction stream against the current internal state and render this with the least changes to the screen.
However some immediate mode UI systems came to the conclusion that it might actually be cheaper to just render most things from scratch instead of spending lots of processing resources to figure out what needs to be updated.
In conclusion: "Immediate Mode UI" doesn't say anything how the UI is actually rendered or generally how the internals are implemented, it only describes how the public API works.