Apr 282010

I was assisting a customer this week in upgrading to vSphere and installing and running vReplicator from Vizioncore.  vReplicator is not a complex product but works well for what it does: replicate VMs.  During the install of vReplicator, we setup replication for a few VMs.  The product has a few options for how to determine what to replicate.  Since we were now on ESX4 on source and target, I suggested we use Changed Block Tracking mode (CBT) for replication.

When I suggested CBT to the customer they asked, “Why that one?” and how it worked.  So I explained:  When we replicate from source to target, the first copy is a full copy of the data (the “seed” it is often called).  When we go to replicate the next time, we don’t want to replicate the whole thing again, just what has changed since the last time we replicated (often called a “differential”).  The replication software needs to determine what’s changed.  Prior to ESX 4, there was not a built in method to do this.  The software would have to find another method, such as compare snapshot information and determine which blocks are new.  That uses CPU cycles on the ESX hosts and takes time (differential mode in vReplicator takes  roughly 1 minute per GB of VM data).  On the other hand, CBT is a feature in ESX4 that tracks the block changes that have occurred since a point in time.  It does not keep a copy of the changed data in a separate location, just a log that the blocks in question have changed.  This is a huge help to backup and replication technologies who typically have to determine what has changed on the disks via their own methods.  Now, ESX can tell them directly what has changed and they can get right to copying those changed blocks.  This makes the overall replication and backup jobs much quicker.

Now for a few lessons learned in using it.  First, it requires hardware version 7 VM’s (HW7) and ESX4.  VM’s need to have their VMtools upgraded to the latest version and then you can upgrade the VMs to HW7 when they are powered off via right clicking them (this updates the virtual hardware presented to the VMs and will require another reboot in Windows after powering it on when the OS discovers the new virtual HW and loads the drivers – thanks Microsoft!).  Second, CBT it is not on by default.  It is set per VM and is an advanced option you can set in the VM’s config.  Some software have the capability to change the CBT setting for you.  In our case, vReplicator has this option on the CBT options page.  On that page, it will check every VM that it can see and if they are HW7.  If they are HW7, they will show as supported.  On that screen, you will also see a checkbox for the “enabled” field.  When you click the enabled box on your HW7 VMs, vReplicator makes the change for you in the VM’s configuration.  However, as mentioned earlier, you must completely power down that VM and power it back on.  The reason for this is that, to start using it, ESX needs to create the tracking log for each disk (the log is about .5MB for ever GB of VMDK or Virtual Mapped RDM and it’s stored with the VM) and ESX only does this setup process at VM boot time.  So make note, a restart won’t work.  It has to be a VM power down and VM power back on.  There is a great article that taught me a few things on CBT by Eric Siebert that goes into a little more technical detail and you can find it here.

Once we got this process completed, my customer’s replication jobs ran MUCH faster.  The data being copied from the source to the target was the same, but the time it took vReplicator to determine what to replicate went from minutes to seconds.  Great news too was that we were able to change the replication method on the fly (from Differential to CBT, if you’re using hybrid, I think you need to re-seed).

My final advice, is make sure you understand if your backup/replication software can use CBT and what you need to enable it.  It does take a bit of work to upgrade the tools and virtual hardware (use Update Manager!).  However it’s well worth it in the long run.

Apr 222010

I was talking to another great customer today who was excited to upgrade from two single ESX hosts to a cluster of 3 with vCenter.  We were talking back and forth about the storage and it turns out his current datastores were a bit unique.  The customer had migrated from physical slowly, perhaps a few physicals a week.  Each time a new host was converted, the customer created a new LUN and datastore and p2v’d the physical drives to a single LUN/datastore on their EVA SAN.  That LUN was also unmasked to just one of the hosts (remember, 2 single hosts – no vMotion yet).  As I talked thru their current configuration with them you can imagine the look on my face.  I was perplexed, surely there must be something completely wrong with this design.  My years at EMC and NetApp were failing me, I knew this was not a good idea but no good reason came to mind.

Then it hit me, a single ESX host currently can see up to 256 LUNs.  Initially I thought, “but they’re never going to run more than 256 VMs on a host.”  No, but they did want to start using vMotion.  Now the LUNs will need to be presented to all hosts.  This 256 LUN limit no longer relates to the single host but to the cluster as a whole.  With all LUNs presented to all hosts, as long as they keep provisioning one-LUN-per-VM, they will be limited to 255 VM’s for the cluster (one of the LUNs is for booting ESX).  This was a limit they were most certainly going to hit (and at an accelerated pace, now that they have vMotion).

This made sense quickly to the customer.  The story has a happy ending: next week we’re upgrading them to vSphere and going to storage vMotion those VMs to a place with a better design.  There’s one thing I’ve learned about storage and virtualization is that there are no wrong designs.  However, there are ones that limit functionality.

The moral of the story is to know thy vmware maximums!  Make sure to check if a single host’s limitation could affect the design of an entire cloud.

Happy Earth Day!