What happens when a physical machine fails in a virtual environment?

Question

I'm getting started with virtualization so bear with me.

In virtual environments applications run in a hypervisor's layer. So a single physical machine could have many virtual machines on it running multiple applications.

So far so good?

So what happens when a physical machine fails? Wouldn't that make many applications fail all from a single machine?

I'm searching for developing a private cloud with OpenStack, but I want to fully understand virtualization first.

Joel Coel · Accepted Answer · 2015-08-24T21:31:31.517

The specifics depend on which exact virtualization solution you use, but the idea is that you have a virtual farm, where there are a number of physical hosts with several virtual machines each. You then use some of the efficiency you gained by not needing a physical host for every VM so that you have enough overhead left to cover in the case where a physical machine goes down.

Additionally, you can locate the VHDs for each VM on a common (redundant) SAN. The hypervisors on each physical host can be set to talk with each other and share memory from different VMs. There is some latency, and much of the memory will be backed by disk, but if one of the physical hosts go down you're not even waiting for the VMs from that host to boot back up. Instead, those VMs will be automatically distributed among the remaining hosts. The ultimate goal is that these machines will pick up from almost where they left off, with little to no downtime at all. In a sense, all of your VMs are already running on at least two physical hosts. In practice, right now hypervisors can only do this kind of migration one machine at a time, when they know it's coming before the host fails... but make no mistake: instant migration on hardware failure is the ultimate goal for all of the major hypervisors.

This is why you sometimes see a server virtualized to a single physical host in a farm. You may not gain any hardware efficiency (you may even lose some performance), but you make up for it in terms of management consistency and built-in high-availability.

score 13 · Answer 2 · answered Aug 22 '15 at 21:47

All virtual servers running on a physical host will go offline if the host has any sort of failure.

That said, most platforms offer a high-availability solution for a single VM. Other times a system is built with multiple nodes to prevent service disruption in the event that one node goes down.

If two VM nodes make up a highly available service, it is possible to configure the hyper visor to ensure that the two nodes are not reliant on the same physical infrastructure (fault tolerance). This could be more than just physical server fault tolerance, including different network paths, all the way up to geographically disparate location.

Henrik Pingel · Answer 3 · 2015-08-22T21:49:03.827

5

You are right with your assumption that if the physical machine fails also the VMs get unavailable.

But openstack can take care of that and start the VMs of the failed physical server on a other server or you can use a hypervisor system which is already distributed, I think vsphere can do that.

You should read the openstack documentation on HA for more information.

edited Aug 22 '15 at 21:49

answered Aug 22 '15 at 21:39

Henrik Pingel

9,965

Dzmitry Savinkou · Answer 4 · 2015-08-26T00:53:22.310

Regarding your question - yes, you will loose access for all machines within this physical host. Of course, it depends which component failed. If it is disk - it is kind of problem, if it motherboard - it is much easier. In general hardware recovery is easier as hypervisor is hardware-agnostic. At this point of time there are a lot of vendor specific technologies you can use to have highly available services.

Resource Pools (vmware) - are NOT able to aggregate multiple physical host resources (cpu,memory,etc) as somebody mentioned above, so if you have 2 physical host (let's say 1CPU quad cores without hyperthreading - 8GBRAM each) it will NOT be possible to have 5vCPU-12Gb VM there. Resource pools are logical ones, they are not able create supercomputing systems. Right now, this is a way of controlling resource utilization.

Availability (vmware) - it is possible to use technologies like High Availability (HA) which allow you to have automated recovery (based on my experience within 1-2min) of all VMs in cluster automatically, IF you are using Storage Array (NAS,iSCSI,FC) and keep all VM files there. More over HA works only in case CPU, RAM, Motherboard failure, it is obvious it will will not work of Storage Array goes down. To prevent RAID/Controllers failures people use Replication, Storage LUNs mirroring etc.

If recovery within 1-2 min is not an option there are technologies like Fault Tolerance (FT) which allow to achieve ZERO downtime of VM in case of failure by keeping shadow(running) copy of configured VM. But this technology also has a lot of restrictions - problem of fault tolerating VMs with multiple vCPUs is not fully solved.

Overall, each solution depends on your goal.

What happens when a physical machine fails in a virtual environment?

4 Answers4