Jul 222013

Horizon Mirage is a part of the Horizon Suite from VMware and it is generating a lot of buzz.  I’m not going to go into the benefits why, you can read the link I’ve provided for that.  However, one of the most amazing things about Mirage is that it user a technology called sourced-based deduplication in order to backup all of the desktop endpoints.  Let’s talk about that technology, how it works and when it works best.

Source-based deduplication works by having a server in the datacenter with a lot of capacity attached to it.  We’ll refer to this server as the “repository.”  Now for the endpoints (which, in the case of Mirage, are Windows-based desktop/laptops.)  The client will begin by taking backups of the endpoints (Mirage calls them snapshots) and copying them to the repository.  It’s this process and how it works that is so amazing.  You would immediately think that when I take a backup of a endpoint that is 10GB on disk, the system will send 10GB over the network.  For the FIRST machine that you backup, it typically does.  It sends practically the whole image of the endpoint to the server for the first endpoint you backup.  It’s when you go to backup the second endpoint where the magic starts to happen.  Once the first endpoint has been “ingested”, for any additional endpoints added, the repository will use the data it has already seen to comprise all future backups.  I know this can be somewhat confusing, you can look at this article for some comparisons of different deduplication technologies.  For our example, let’s go a little deeper into exactly what happens during this process.

We will begin with the first Windows desktop that is 10GB on disk total and back it up.  The repository will “ingest” the files from the endpoint.  When it does this, it runs a hashing algorithm against the file to give it a hash code.  Once it does that for every file, the client also breaks the file into “blocks” or “chunks.”  It then runs a hashing algorithm against those chunks.  After all this it stores the backup down on disk in the repository.  Now, for the next (and every subsequent) client we want to backup or capture:  The client will ask the server for it’s hash table of files.  This is a small amount of data sent from the server to the client because the hash table is a list of all of the hash codes for all of the files in the repository not the actual data in the files.  The client then takes this data and analyzes each file on the second endpoint’s file system.  It develops a list of files that it has never seen before in the repository (and tells the repository which files are on this endpoint that the repository has seen before.)  Typical we see about 90-95% common files between images.  This is where it starts to get even more crazy efficient.  So the client has figured out which files the server already has in the repository and has told the server a list of those files that are on Endpoint #2 that the server has seen before.  Now the client looks at the files that the server has not seen before.  Let’s suppose there are 100 files that list that the server has not seen before.  The client will separate those files into blocks at the client (this is why it’s called sourced-based, the majority of the processing and checking for deduplicated data happens at the enddpoint, not the server).  So the client has separated the 100 files into blocks and runs the same hashing algorithm on the blocks.  Now the client compares the blocks to the blocks the server has in the repository and develops a list of blocks that the server has not seen before.  Let’s say the client finds 10 blocks that the server has never seen before.  It tells the server to mark down all of the blocks that are on this endpoint as being part of this endpoints backup.  Note: to this point in the process, the client has not sent any of the backup actual data to the server yet.  The last step is to take the blocks of files that are unique to this endpoint and compress them and send them to the server for storage, thus completing the backup, inventorying all of the common data and sending the unique data.

Whew!  What does all this look like in reality?  Let’s take a look at this log entry from a Proof-of-concept we are running for a customer right now:

Screen Shot 2013-07-21 at 10.17.42 AMThis is a initial first upload from a client to the Mirage repository.  This endpoint is running a Windows 7 base image.  It is about 7,634 MB on disk (listed by the total change size.)  Since this is the first time this endpoint has been backed up, all of the data on the endpoint is listed in the total change size.  On all subsequent backups, this capacity will be the size of the files that have changed since the last backup.  The next statistic is the killer number: Data Transferred is 29MB!  Mirage took a full backup of this system’s 7,634 MB and only sent 29MB (the unique data) over the network to the repository!

Here’s how it got there: Mirage inventoried 36,436 files on the endpoint that had changed since the last backup (all the files on the endpoint had “changed” since there was no previous backup of this endpoint.) Mirage ran the hash on all of those files and found that there were 2,875 files that it had not seen before in the repository  (the Unique Files number).  These 2,875 files totaled 221MB (the Size after file dedupe number).  Then Mirage pulled those files apart and looked for the blocks of those 2,875 files that it had not seen before.  Once Mirage found those unique blocks they wittled down the 221MB of files that were unique to 95MB of blocks that were unique (the Size after Block Dedupe number).  Mirage then takes the 95MB of unique blocks (which is the real uniqueness of this endpoint) and compresses it.  Every single step in processing at this point has happened at the client.  The last step is to send the unique data to the Mirage Server (repository).  This data sent is 29MB of actual data for a full backup! (the Size after compression number)  This whole process took 5 minutes and 11 seconds on the client.  This first backup of the endpoint will take longer because the hashing has to happen on all of the changed files (36,436 files for this backup).  However, all subsequent backups from this machine will only look at the files that have changed since the last backup because we already have a copy of the files that have not changed.

Where source-based dedupe works and where it does not

Sourced-based dedupe works the best when we have tons of endpoints with very similar OSes, apps and data (this is why it’s perfect for desktops and laptops).  Where source-based dedupe has it’s challenges is when the files are big and really unique.  Audio and video files are like this.  Unless the files are copies, no two video files are alike, at all.  Not all is lost if your users perform video or audio editing or just work with a lot of these files.  There are ways to accommodate that as well.  We would typically recommend using folder redirection or persona management to move those files to a network drive where we would backup with the typical methods and offload them from the endpoints.  We can also exclude certain file types from being backed up at all by Mirage.

Screen Shot 2013-07-21 at 11.14.29 AM

As shown above, Mirage includes an upload policy which allows you to set rules on file types you do not want to protect from the endpoints.  Some standard ones included already are media files (however as you see in rule exceptions, media files in the c:\windows directory will be backed up).

Mirage is definitely the way to go for any mobile endpoints or branch office endpoints where bandwidth limits and connectivity reliability make  VDI a less-than-optimal choice for the management a recoverability of these endpoints.  I don’t recommend products that don’t work as advertised.  Once the light bulb kicks on and customers understand this technology the real value of it shines thru.  Make no mistake, Mirage is not a mirage, it’s a reality and a really good one at that.

Jul 162013

I do a lot of work with customers who want to share files between all of their user’s devices.  There are a number of commercial solutions available on the market like DropBox, Box, SkyDrive, iCloud, or Google Drive which utilize the public cloud to provide this data storage.  Unfortunately for them, the latest revelation from Edward Snowden was that allegedly, Microsoft was working closely with the NSA to provide direct access to Office 365, Skype and Skydrive (which Microsoft has since refuted).  Wither true or not, this does not create a good public relations experience for the world of public cloud storage.

Customers that I work with are always concerned with public cloud data leakage.  Data leakage is the possible release of company information caused by the unavoidable release of control over the security of the company’s data when stored in the public cloud.  The fear is that once this data is stored in the public cloud, the customer has no control over where it is stored or who has access to it.  As Edward Snowden revealed last week, it is possible that the NSA has access to files you store in the public cloud.  The problem is not that the NSA has this access, the problem is that the NSA is not impervious to data leakage themselves, as Mr. Snowden has shown.  Even though public cloud storage companies state that your data is protected, they are required to by the Foreign Intelligence Surveillance Act court orders.  Not exactly installing me with a load of confidence.

So what’s a customer to do?  Intro: Horizon Workspace Data and Citrix Sharefile.  Horizon Workspace Data from VMware is private cloud only and does not contain any public cloud components.  It allows customers to share files between all of their user devices(Tablets, desktops, laptops, smartphones, etc) while storing the main copy of the data on private cloud servers in your datacenter.   Citrix Sharefile can store your data in the public cloud or on-premise storage zones.  However, even if you do use your own on-premise storage zones, Sharefile does house a directory inventory on the control plane in the public cloud.  So while the data can be stored in the private cloud, the directory listing gets shared with the public cloud.  Either way, the data itself is in your datacenter and not in the public cloud.

These two solutions (as well as a host of others) are looking more and more enticing to customers looking to provide access to their data for their users while still maintaining as much control as possible.  In the meantime, the public cloud alternatives will need to bandage their image for a while.  The bottom line is that there is no guarantee that our data is 100% private when it traverses the internet.  Maybe we should follow Russia and go back to using typewriters.  Or maybe we learn to accept the fact that this is the world we live in and that our data is never 100% secure.

Oct 312012

I was doing a little research for a customer yesterday and they asked a great question: “Now that View 5.1 supports vSphere 5.1, can we use 32-node clusters with View 5.1?”

First a little background.  Today you can only create 8-Node Clusters in vSphere if you are using iSCSI or Fiber Channel storage.  The reason is because the number of hosts that can share a file on VMFS in vSphere 5.0 and prior is limited to 8.  If you are using NFS with View 5.1 and vSphere 5.0 or higher, you can have 32-node clusters for your View desktops.  vSphere 5.1 changed this limitation by changing the file locking mechanism and now allows for 32-node clusters to share a read-only file.  Secondly, vSPhere 5.1 introduced Sparse Virtual Disks.  These Sparse Disks are virtual disks presented to the desktops that can shrink as easily as they expand.  This is a real benefit and may rival linked clones one day to most efficently deliver a pool of desktops on storage.  You can read more about these two technologies from Cormac Hogan, Sr. Technical Marketing Architect at VMware.

As I’m sure you know by now, as of last week, View 5.1 and 5.1.1 are now supported with vSphere 5.1.0a.  So the question was asked: now that they are supported together, do we get those cool new features that Cormac was referring to?  Unfortunately, the answer is no, not yet.  I searched for a long time trying to find an answer.  When I came up empty, I just emailed Cormac directly.  He responded this morning, “…both features are waiting on a future release of View.”

Bummer.  Hopefully we’ll get the chance to get these two great features very soon.

Have a great and safe Halloween.

Dec 092011

vSphere Replication and Site Recovery Manager make it very easy to replicate your VMs to your DR site (ahem, once they are set up).  Some customers asked me if there is any way to throttle the bandwidth used for replication.  The good news is that there is a way in vMware software but it cannot be found in SRM.  Unfortunately, it can only be found in the Enterprise Plus Edition of vSphere 5.  It’s Network I/O Control in the Distributed vSwitch (DvS) in v5.  I’m not going to go into a deep dive on Network I/O Control but I will recommend that you read the Network I/O Control best practices doc here.

To enable Network I/O Control we need to have a DvS in place.  If we select the distributed switch and then select the Resource Application tab on the right, this gives us the “properties” option on the far right.  By selecting the Properties option, you can enable Network I/O Control on the DvS.  Once enabled you can see all of the System network resource pools.  There is one at the bottom of the list labeled “vSphere Replication (VR) Traffic”.  Selecting it and then clicking the “Edit Settings” link just below it opens up the settings window.

From here, you can edit the adapter shares.  The shares are to balance the bandwidth so that network flows can use the bandwidth thats available from a given dvuplink.  The shares are for a given dvUplink.

Alternatively, you can uncheck the Unlimited checkbox and set a host limit.  Keep in mind that this is Megabits per sec, not MegaBytes.  This is also the limit of the combined set of dvUplinks on a given host.

Lastly, a QOS priority tag can be used.  The traffic will have a 802.1p tag applied to it.  The IEEE does not standardize or mandate the use of the priority tag applied to the packets but the switches should treat higher tags with higher priority.  The choices are None, 1-7.

While not the granular controls that we may wish for, say individual bandwidth controls on a per VM or per-site replication limits, these settings and options are a start.  Hopefully in the future in vSphere Replication v2 we will have more granular controls for bandwidth throttling but until then, these are what we can use.  Happy computing.


Jul 152011

VMware saw an issue with the SMB customers in that some were not adopting the higher editions of their software because most of the features required shared storage and some SMBs might not have been ready to bite off the costs of that storage.  So VMware decided to get creative and create a redundant shared storage solution using local storage.

Here are some of the features:

  • Deploys as an appliance, very easy to install
  • Must be deployed on a new ESXi 5.0 installation
  • Deploys a VSA Cluster Service on the vCenter server
  • The VSA Cluster Service can deploy the VSA “Agent VMs” to each of the ESXi 5.0 hosts
  • The appliance will use the local space available and present the storage on the network as an NFS datastore
  • Replicates the local storage to the local storage on another host in the cluster for redundancy.
  • If a host fails, the appliance storing the replica will immediately take over the failed “Agent VM’s” IP address and share the storage from the replica
  • v1.0 supports 2 or 3 ESXi hosts in a cluster (Typically for the essentials kits)
  • Sold as a separate SKU with one price with no license capacity restrictions (no technical size limits that I could find)
  • Supports 25 VMs (configured on 2 ESXi hosts) or 35 VMs (configured on 3 ESXi hosts)
  • It is the only scenario where VMware recommends running vCenter on a physical or standalone ESXi hypervisor (To protect you from running into a Catch-22 as vCenter is managing the VSAs
  • Recommended to use RAID10 on the hardware RAID controllers in the hosts (to protect from a single drive failure)
  • Uses RAID 1 (Mirroring) between hosts for redundancy
  • Supports Storage vMotion for when you are ready to migrate to hardware shared storage
  • Can put the whole VSA cluster in maintenance mode or just a single node.  Can also replace a node and have the VSA rebuild onto it for redundancy or for rolling upgrades.

Here’s how it works: Imagine I have 3 hosts numbered 1,2 and 3.  Once the VSA gets installed, it creates two volumes on the available local storage on each host.  So host 1 will have volumes 1A and 1B, host 2 has 2A and 2B, host 3 has 3A and 3B.  Once the VSAs are configured, they will be redundant so that 1A (which stores VMs) mirrors to 2B, 2A mirrors to 3B and 3A mirrors to 1B.  If any VSA get’s dropped, the VSA running the mirror copy takes the IP address of the failed VSA and keeps right on chugging.

My Take

The Pros: Great solution for SMBs without shared storage to take advantage of HA, vMotion, etc.  I also think this is an outstanding solution for companies with remote offices who want to have redundancy in 2 or 3 ESXi hosts but don’t want to put shared storage in each site.

The Cons:  Way too much overhead.  VMware is recommending hardware RAID10 from the local drives if possible.  If I have 4 x 1TB drives in a server (4TB RAW disk capacity).  I use RAID10 as per VMware’s recommendation, this means 2TB gets presented to the ESXi host.  Now the VSA uses half of that storage for VMs and half as a target to mirror the VSA from one of the other hosts.  So out of 4TB of RAW disk, I get <1TB of capacity to store VMs on (don’t forget, I need room to store ESXi itself).  Thats a 75% reduction from RAW capacity = too much overhead.

Overall I still think it’s worth it.  It’s still going to be less expensive that a shared storage frame (even with the overhead loss).  I think for remote sites, you can’t beat it.  I can’t wait to see what they add to it in v2.0.