undefined | Better HN

0 pointstravem12y ago0 comments

Live migration doesn't have to require this. VMware vSphere for example includes storage vMotion capabilities which remove the need for shared storage.

0 comments

3 comments · 1 top-level

pcl12y ago· 2 in thread

How does vMotion pull this off? When I hear the phrase "live migration", my assumption is that the instance is serving traffic during the migration. If the instance is using local disk, then I would expect that there must be some shared state in the system, or alternately a brief outage. The latter would not be a truly live migration IMO.

regularfry12y ago

Very few things would qualify as a "truly live migration" under those criteria. The only systems I can think of which would count are those which sync cpu operations across different hosts.

I don't know precisely how vmotion does it, but doing a live disc migration is basically:

    - copy a snapshot of the disc image across
    - pause IO in the vm
    - sync any writes that have happened since taking the snapshot
    - reconnect IO to the new remote
    - unpause IO in the vm

Obviously you want the delay between the pause and unpause to be as short as possible, and there are many tricks to achieving that, but this hits all the fundamentals.

pcl12y ago

Agreed re: your steps. My point is just that this doesn't sound "live" to me, for non-marketing definitions of the word "live".

Looking at VMware's marketing literature [1], they claim "less than two seconds on a gigabit Ethernet network." But it sounds like that's just for the memory / cpu migration. The disk migration section of their literature doesn't have any readily-visible timing claims.

My experience with zero-downtime upgrades has always involved either bringing new stateless servers online that talk to shared storage, or adding storage nodes to an existing cluster. In both cases, this involves multiple VMs and shared state.

What does the downtime typically look like for vMotion storage migration? Do they do anything intelligent to allow checkpointing and then fast replay of just the deltas during the outage, or does "migration" really just mean "copy"? And if the former, do they impose any filesystem requirements?

[1] http://www.vmware.com/products/vsphere/features-vmotion

1 more reply

j / k navigate · click thread line to collapse