The point of functional data structures is that if you modify, you don't copy the whole structure, you only copy the path to the things that changed.
A simple example would be parsing a sequence of characters and storing that in a linked list. When you get an additional character you don't destructively modify the list, instead you create a new list node that points to the existing list.
def process(char, state):
return Node(char, state)
The process function takes a character and a state, and returns a new state. The old state is still usable, as nothing was mutated. On the other hand no large data structure had to be copied.
Your method is probably more efficient. Because you don't need access to all historical versions simultaneously, you only have to keep a stack of undo operations. So you only have O(1) slowdown per operation. With functional data structures you have in the worst case a O(log n) slowdown per operation.
For example if you implement a bit vector, then update and lookup can both be O(log n), e.g. a red-black tree. If you want update to be O(1) then lookup becomes O(n), e.g. a linked list. If you want lookup to be O(1) then update becomes O(n), e.g. copying the entire vector on update.
(please correct me if I'm wrong and if you can do better than O(n) for those cases)