Thursday, April 21, 2011

Replacing UCS Hardware on VMware with No Capacity Loss

So the other day we had a DIMM go bad in one of our B200 blades. This particular blade happened to be one of our production vSphere host. The normal procedure for replacing faulty hardware on a vSphere system using DRS is pretty simple; just throw the host into maintenance mode, watch your VMs start running on another host in the cluster, replace the faulty hardware, and then take the host out of maintenance mode.

The only problem with this common scenario is what happens to your clusters capacity with that host down. Depending on how heavily utilized your hosts are, you might be a bit nervous with one of your systems not able to help distribute the workload. What would happen if you were to lose another host while you had one down for maintenance? What if the technician working on the hardware pulls the wrong blade? Would your cluster be OK, or would your systems and apps suffer because they can't get the CPU and memory they need?

Lucky for us we have a few blades that are not powered on in our UCS chassis's. We like to reserve at least one server in the event we ever need to spin up a bare metal system (this is becoming far less common btw). In this case, here is how we went about replacing the bad DIMM.

  1. Remove the offending server from the UCS server pool
  2. Turn on LED indicator light on server
  3. Put host in maintenance mode, and then power down
  4. Destroy service profile for the offending server
  5. Provision another server using the same service profile template
  6. Exit host maintenance mode

It takes about 3-5 minutes to spin up the new server in UCS, but since our servers are stateless the exact same vSphere host we put into maintenance mode and shutdown was the exact same host we powered back up; only on different physical hardware this time. Having this level of simplicity and flexibility in our server environment is one of the reasons we chose UCS, and cases like this only help validate that decision.


0 comments: