Avamar and vSphere Change Block Tracking

Storage, Tips and Tricks Comments Off

Background: Our oldest focus at RoundTower Technologies is backup.  Because of this, we are very familiar with backup systems and since my background is in VMware, I specialize in backing up virtualized environments.  As you know, Change Block Tracking (CBT) in vSphere allows your backup and replication processes to be much more efficient.  CBT basically sets a marker when a backup or replication occurs and tracks which disk blocks have been changed.  When the next backup or replication occurs, CBT tells the app exactly which blocks have changed.  This is a huge benefit to backup and replication as those apps used to have to figure out which blocks changed by comparing snapshots which can take a long time and use a lot of CPU.

You may know a little about Avamar.  It’s a backup solution that uses source-based deduplication to perform backups.  It basically always takes full backups and it only stores pieces of files that it has not seen before in the entire environment.  Every thing that it has seen across your organization is tracked.  This includes which client it was seen on and when, but only one copy of the file piece is stored on disk.  This creates extremely efficient and rapid backup.  For VMware enviornments, Avamar can take file level backups by running a client in the GuestOS or a VM image level backup by running a proxy VM in the infrastructure.

When you combine these two technologies together, the result is the best of both worlds.  Specifically referring to the image level backups with CBT enabled.  This means that Avamar only backs up the pieces of the vmdk files that it has not seen before and with CBT, it only scans the blocks that have changed from the last backup when looking for pieces to deduplicate.  Very efficient and very optimized – we’re talking hundreds of GB in just minutes.

Here the issue I ran into: I added a client to Avamar and setup a policy to do Image Level backups of the VM.  I kicked one off and the Avamar starts by creating a snapshot of the VM and mounting the snap to the Avamar Proxy VM.  Avamar then queries CBT on vSphere and gets the list of blocks that changed since the last backup.  The proxy then scan thru only the blocks that changed and only send the file segments within those blocks that it has not seen before to the actual disks for backup.  When finished, Avamar unmounts the snap from the proxy and deletes the snap.  When I ran thru this procedure at the customer site, the first backup took about 15 minutes for 100GB on their system.  This is expected as there is no CBT information yet so the proxy must read thru the entire 100GB to determine what file pieces it has and has-not seen before and that takes the majority of the 15 minutes.  On the second backup however, I expect that CBT will only show the proxy the blocks that changed and then it will dedupe only those and store all of the other blocks in Avamar from the inventory of blocks it already has (as CBT said those blocks have not changed).  When I did go and run the second backup it took 15 minutes. It should have taken only a minute or two.  What’s the deal?

The solution:  I did some hard digging on the net for a solution. I was sent this article on the EMC support site from one of our other Engineers (thanks Judson!).  Basically it said that VMware has an issue (documented here) with CBT and VMware snapshots.  In a very specific scenario, a customer could restore a snapshot of a VM from vCenter and it’s CBT information would be inaccurate.  When the backup or replication was looking to CBT for the blocks that changed, it could provide incorrect information.  This would backup or replicate incomplete information without showing an error of any kind.  That’s bad.

Avamar knows about this issue and protects people.  It does this by looking to see if CBT is in use and if there are any VMware snapshots older than the last retained backup of the data in Avamar.  If there is an older snap, Avamar assumes that a customer could revert to it any time (or already did) and that the CBT data could be invalid – so it ignores the CBT info and reads thru the entire VM.  This is why my backup above took 15 minutes each time.  I had snapshots on that VM older than the oldest Avamar backup retained.  When I removed the snaps the next backup took 15 minutes (I later found that this was to reset the CBT information).  The next backup after that took 47 seconds.  Now we’re in business.

If you see these kinds of performance issues on Image Level backups in Avamar, try cleaning out the VMware Snapshots.  This issue does not affect file level backups only Image-level. I hope this helps out the users who are trying to run Image level with Avamar.  Now you’ll know what to try when performance for the backups slows down for no apparent reason.

Thanks and good computing.

Separating the Windows Page File for Site Recovery Manager replication

Administration, Disaster Recovery, Storage 5 Comments »

I had a very interesting discussion with a customer about optimizing their storage replication for use with Site Recovery Manager.  We discussed the best practice of separating the VMware ESX VM swap files as per The SRM Best Practices Guide.  He was aware of that design suggestion and had already taken the initial steps to implement it.  He then went on to ask me if it would be beneficial to seperate out the Windows Page File onto a non-replicated datastore.  I had never heard of that suggestion before.  It seemed logical to do so.  If we shouldn’t replicate the VM swap file, why replicate the Windows Paging file?  They both perform similar functions at different layers of the software stack.  I powered up my web browser and headed over to Google for some searching.

I found a few references here and there.  Most customers keep the paging file inside the standard VM disks to avoid making the environment too complex.  I was about to give up and suggest he not separate the paging file, until I came across this discussion in the VMware communities.   Read the rest of this entry »

Upgrading to View 4.5 with existing user-data disk issue

Desktop Virtualization, Storage 4 Comments »

I found an interesting issue in the lab today and I think it’s very important that users who have View deployed recognize it.  In researching the issue, I came across this note in the View 4.5 documentation: “View Manager can manage persistent disks from linked-clone pools that were created in View Manager 4.5.  Persistent disks that were created in earlier versions of View Manager cannot be managed and do not appear on the Persistent Disks page in View Administrator.”

Before today, I did not know that.  What this means is that if you have an existing View 4.0 pool of desktops that use user data disks, 4.5 will not recognize them as persistent disks when you upgrade. Read the rest of this entry »

The vPaper Report for June

Administration, Desktop Virtualization, Network, Storage, vPaper Report Comments Off

In the past, I have reviewed all of the technical papers on the VMware site.  I’ve decided to change direction a little and I only plan on reviewing papers that would apply to the everyday VM Admin.  I’m also going to throw in my own ranking on each article (*****, 1 to 5 stars).  You will also notice a “vKeeper” reference in some of the papers.  This award is for the papers that I keep a local copy of on my computer for reference when I need them.  They are the docs that all admins should read thru and use as a reference as needed.  I have also added a section to my admin bookmark page just for the vKeeper docs.

PCoIP Display Protocol: Information and Scenario-Based Network Sizing Guide – (12 pages) A good paper with very good insight on the PCoIP protocol used in VMware View.  It gives some good suggestions and the required bandwidths needed to satisfy the end users on their desktop experience.  A must have for view deployments.  (****, 4 of 5 stars)

Application Presentation to VMware View Desktops with Citrix XenApp – (3 pages) This is a whitepaper to show how to deploy applications in VMware View desktops from XenApp.  While I can see this being useful for View admins who use XenApp, the description and instructions are very minimal.  Probably something better suited for a KB article. (**, 2 of 5 stars)

Timekeeping in VMware Virtual Machines – (26 pages) This is a very important topic for all VM Admins to know.  Time is relevant to everything in a VM, whether you are trying to authenticate to Active Directory or troubleshooting using event logs, accurate time is very important.  This paper goes into some really great detail on how VMware maintains accurate time in VMs.  If you are a VMware admin, this should be a standard read.   (*****, 5 of 5 stars, vKeeper)

SAN System Design and Deployment Guide – (244 pages of storage goodness)  I have a storage background so I specifically enjoy this one.  If you are running ESX on SAN shared storage (you should be on some type of shared storage) then this is a must read.  This whitepaper is also very helpful if you are studying for the VCP or one of the new VCAP exams.  This is another paper I keep local and definitely one all VM admins with SAN should review.  (*****, 5 of 5 stars, vKeeper)

Best Practices for Running vSphere on NFS Storage – (14 pages) On the heels of the SAN design and deployment guide, this paper describes the best practices for running NFS on vSphere.  I like the fact that this article references outdated best practices that have changed and why they have changed.  This is a HUGE help to admins who google a topic only to find conflicting information.  My only regret on this paper is that I would like to see more detail on the advanced options and how they affect the performance of NFS.  Still a important doc for VM Admins using NFS storage.  Should be reviewed by all of them to make sure they are current in their deployment of NFS best practices.  (****, 4 of 5 stars)

Location Awareness in VMware View 4 – (8 pages) Good information for View Admins to know where to find out where their clients are connecting from.  This is a common request from hospitals to have printers “follow the user” as they float from terminal to terminal.  There are some advanced topics in this article and some Active Directory knowledge is definitely required especially when using loopback mode in group policy processing.  Good info and hopefully View will include some GUI-based  native features in the future to assist with this.  (***, 3 of 5 stars)

VMware vSphere 4.0 Security Hardening Guide – (70 pages) This is a outstanding reference for any VM Admin.  Security affects everyone’s environment, from the 3-man shop to the largest infrastructure.  Setting the precedence of a solid, secure enviornment from the ground up will provide you with a infrastructure that is solid as a rock. I recommend reviewing this paper often and keeping this one handy   (*****, 5 of 5 stars, vKeeper)

VMware vStorage Virtual Machine File System – Technical Overview and Best Practices – (13 pages) This is a entry level paper on some of the very basics of VMFS and how they relate to RDMs.  This should be a good introduction to VMFS to new VM Admins.  I hoped with “Best Practices” in the title that there would be more technical references (advanced options for VMFS and how tweaking them affects the storage performance for instance).  I was also disappointed to see the LUN size question answered vaguely, suggesting to refer to the storage vendor to size your LUNs appropriately.  I prefer Duncan’s approach to LUN sizing and it’s what I recommend to all of my customers.  (***, 3 of 5 stars)

Look for the vPaper Report again next quarter (hopefully with some new releases in between). Until then, happy reading!

Change Block Tracking and why you care

Disaster Recovery, Performance, Storage, Tips and Tricks 3 Comments »

I was assisting a customer this week in upgrading to vSphere and installing and running vReplicator from Vizioncore.  vReplicator is not a complex product but works well for what it does: replicate VMs.  During the install of vReplicator, we setup replication for a few VMs.  The product has a few options for how to determine what to replicate.  Since we were now on ESX4 on source and target, I suggested we use Changed Block Tracking mode (CBT) for replication.

When I suggested CBT to the customer they asked, “Why that one?” and how it worked.  So I explained:  When we replicate from source to target, the first copy is a full copy of the data (the “seed” it is often called).  When we go to replicate the next time, we don’t want to replicate the whole thing again, just what has changed since the last time we replicated (often called a “differential”).  The replication software needs to determine what’s changed.  Prior to ESX 4, there was not a built in method to do this.  The software would have to find another method, such as compare snapshot information and determine which blocks are new.  That uses CPU cycles on the ESX hosts and takes time (differential mode in vReplicator takes  roughly 1 minute per GB of VM data).  On the other hand, CBT is a feature in ESX4 that tracks the block changes that have occurred since a point in time.  It does not keep a copy of the changed data in a separate location, just a log that the blocks in question have changed.  This is a huge help to backup and replication technologies who typically have to determine what has changed on the disks via their own methods.  Now, ESX can tell them directly what has changed and they can get right to copying those changed blocks.  This makes the overall replication and backup jobs much quicker.

Now for a few lessons learned in using it.  First, it requires hardware version 7 VM’s (HW7) and ESX4.  VM’s need to have their VMtools upgraded to the latest version and then you can upgrade the VMs to HW7 when they are powered off via right clicking them (this updates the virtual hardware presented to the VMs and will require another reboot in Windows after powering it on when the OS discovers the new virtual HW and loads the drivers – thanks Microsoft!).  Second, CBT it is not on by default.  It is set per VM and is an advanced option you can set in the VM’s config.  Some software have the capability to change the CBT setting for you.  In our case, vReplicator has this option on the CBT options page.  On that page, it will check every VM that it can see and if they are HW7.  If they are HW7, they will show as supported.  On that screen, you will also see a checkbox for the “enabled” field.  When you click the enabled box on your HW7 VMs, vReplicator makes the change for you in the VM’s configuration.  However, as mentioned earlier, you must completely power down that VM and power it back on.  The reason for this is that, to start using it, ESX needs to create the tracking log for each disk (the log is about .5MB for ever GB of VMDK or Virtual Mapped RDM and it’s stored with the VM) and ESX only does this setup process at VM boot time.  So make note, a restart won’t work.  It has to be a VM power down and VM power back on.  There is a great article that taught me a few things on CBT by Eric Siebert that goes into a little more technical detail and you can find it here.

Once we got this process completed, my customer’s replication jobs ran MUCH faster.  The data being copied from the source to the target was the same, but the time it took vReplicator to determine what to replicate went from minutes to seconds.  Great news too was that we were able to change the replication method on the fly (from Differential to CBT, if you’re using hybrid, I think you need to re-seed).

My final advice, is make sure you understand if your backup/replication software can use CBT and what you need to enable it.  It does take a bit of work to upgrade the tools and virtual hardware (use Update Manager!).  However it’s well worth it in the long run.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in