Jan 242012

Infrastructor Navigator has been released and can be found here.  Rather than go thru the details, I’ll repost the features from the release notes:

VMware vCenter™ Infrastructure Navigator is an application awareness plug-in to vCenter Server, and provides continuous dependency mapping of applications. Infrastructure Navigator offers application context to the virtual infrastructure administrators to monitor and manage the virtual infrastructure inventory objects and actions. Administrators can use Infrastructure Navigator to understand the impact of the change on the virtual environment in their application infrastructure. Infrastructure Navigator helps virtual infrastructure administrators perform the following tasks:

  • Make accurate first-level triage to help either eliminate the problem or associate the problem with the virtual infrastructure when business service users report problems.
  • Assess change impact, manage, and communicate virtual infrastructure issues for critical applications.
  • Understand the application and business impact of changes to the virtual infrastructure on applications.

The Open Source Licenses (OSL) file for the virtual appliance is available at /root/open_source_licenses.txt. You can retrieve the file by running the scp root@<appliance IP>:open_source_licenses.txt command.

Infrastructure Navigator is supported on vCenter Server 5.0 with the vSphere Web Client. The supported ESX versions include ESX/ESXi 3.5 (build 425420), ESX/ESXi 4.0 (build 398348), ESX/ESXi 4.1 (build 433742), and all builds of ESXi 5.x.


This section describes the key features for the Infrastructure Navigator 1.0.0 release.

Simplifies and automates the deployment and the discovery process and keeps manages Application Component Knowledge Base (KB) current

  • Eliminates physical switch spanning or credential based discovery.
  • Discovers and maps the application components and dependencies using KBs and presents this knowledge through maps or search for relevant use cases.

Provide Infrastructure Navigator data for vCenter Server and related solutions

  • Ensures that the application and dependency data is available to the rest of the vCenter Server entities and its various solutions through the vCenter extensibility APIs.
  • Supports SRM integration to set up more focused and accurate site recovery and backup plans.


Dec 092011

vSphere Replication and Site Recovery Manager make it very easy to replicate your VMs to your DR site (ahem, once they are set up).  Some customers asked me if there is any way to throttle the bandwidth used for replication.  The good news is that there is a way in vMware software but it cannot be found in SRM.  Unfortunately, it can only be found in the Enterprise Plus Edition of vSphere 5.  It’s Network I/O Control in the Distributed vSwitch (DvS) in v5.  I’m not going to go into a deep dive on Network I/O Control but I will recommend that you read the Network I/O Control best practices doc here.

To enable Network I/O Control we need to have a DvS in place.  If we select the distributed switch and then select the Resource Application tab on the right, this gives us the “properties” option on the far right.  By selecting the Properties option, you can enable Network I/O Control on the DvS.  Once enabled you can see all of the System network resource pools.  There is one at the bottom of the list labeled “vSphere Replication (VR) Traffic”.  Selecting it and then clicking the “Edit Settings” link just below it opens up the settings window.

From here, you can edit the adapter shares.  The shares are to balance the bandwidth so that network flows can use the bandwidth thats available from a given dvuplink.  The shares are for a given dvUplink.

Alternatively, you can uncheck the Unlimited checkbox and set a host limit.  Keep in mind that this is Megabits per sec, not MegaBytes.  This is also the limit of the combined set of dvUplinks on a given host.

Lastly, a QOS priority tag can be used.  The traffic will have a 802.1p tag applied to it.  The IEEE does not standardize or mandate the use of the priority tag applied to the packets but the switches should treat higher tags with higher priority.  The choices are None, 1-7.

While not the granular controls that we may wish for, say individual bandwidth controls on a per VM or per-site replication limits, these settings and options are a start.  Hopefully in the future in vSphere Replication v2 we will have more granular controls for bandwidth throttling but until then, these are what we can use.  Happy computing.


Sep 142011

VMWorld keeps rolling on and on.  VMware has made Site Recovery Manager 5.0 available for download here.  As I’ve mentioned time and time again, SRM is my favorite non-vSphere product from VMWare.  This one does not disappoint.  You can grab the download here.  Here’s the What’s New Section from the release notes:

VMware vCenter Site Recovery Manager 5.0 enhances your ability to build, manage and execute reliable disaster recovery plans for your virtual environment. With the release of version 5.0, VMware has expanded the capabilities of Site Recovery Manager to provide unprecedented levels of protection. New use cases have been made possible through the addition of the following capabilities:

  • vSphere Replication. When used in conjunction with VMware vSphere 5.0, Site Recovery Manager 5.0 introduces a new capability to utilize the vSphere 5.0 host to perform replication of powered-on virtual machines over the network to another vSphere 5.0 host, without the requirement for storage array-based replication. As virtual machines change with use, the changed blocks are replicated to a shadow copy of the virtual machine resident at the recovery site, in accordance with a Recovery Point Objective set as a property of the virtual machine itself.
  • Planned Migration. A new workflow designed to deliver migration while minimizing the risk of data loss. Planned migration will stop the workflow from continuing if an error is encountered, providing an opportunity to fix the problem, ensuring that systems are properly quiescent and that all data changes have been completely replicated.
  • Automated Re-Protection. Re-protection is a new extension to recovery plans for use only with array-based replication. Automated re-protect enables the environment at the recovery site to establish replication and protection of the environment back to the original protected site through a single click.
  • Automated Failback. Automated failback returns the entire environment to the originally protected primary site. This can only happen after re-protection has ensured that data replication and synchronization have been established to the original primary site. Failback will run the same workflow that was used to migrate the environment to the protected site, ensuring that the critical systems encapsulated by the recovery plan are returned to their original environment. Automated failback, like re-protection, is only available for use with array-based replication protected virtual machines.
  • Enhanced Dependency Definition. This includes the addition of more (5) priority groups, and the ability to set virtual machine dependencies within a priority group. Virtual machine dependencies can be defined to ensure that required systems are available before dependent virtual machines are powered on. This enables highly organized workflow control, ensuring that required services are available before dependent virtual machines are powered on.


Aug 162011

As you probably know by now SRM5 is just over the horizon.  You have probably heard me mention numerous times how SRM has always been my favorite non-vSphere product from VMware.  Some great news is that they have made some great improvements in SRM5 and added the most-requested functionality.  Here we go:

  • vSphere Replication – The biggest feature add.  An additional replication option which allows you to replicate your VMs without having the storage perform the replication.  Even allows you to replicate to/from local storage on the ESXi hosts.  There are some important limits to vSphere Replication.  It’s not for everything/everyone but it does do quite a bit for the first release.
    • Requires vSphere 5
    • Managed from the vSphere client directly
    • ISOs and Floppys are not replicated
    • Powered off/Suspended VMs are not replicated
    • Non-critical files are not replicated (swap files, dumps, logs, etc.)
    • VMs can have snapshots on the protected side but they are automatically collapsed on the recovery side
    • Physical RDMs not supported (but virtual RDMs are)
    • Fault Tolerant VMs, Linked Clones and VM Templates are not supported
    • Automated Failback of vSphere Replicated VMs is not supported in SRM 5.0
    • Requires VM Hardware version 7 or 8 (required for Change Block Tracking)
    • Supports up to 500 VMs
    • Asynchronous only
    • Minimum replication frequency is every 15 minutes, max is every 24 hours
    • Initial copy can be seeded by sneaker net (taking the initial on a portable HD and importing at the destination, i.e. does not need to seed the initial copy over the wire)
    • File-level consistency (except for planned migration – see below) quiesces OS file system before sending changed blocks to the DR site (does not quiesce applications)
    • Included in both Standard and Enterprise Editions of SRM
    • vSphere Replication is not available outside SRM5
  • Scalability Improvements
    • 1000 Total Protected VMs (Same as SRM4.1)
    • 500 Protected VMs in a single protection group (same as SRM v4.1)
    • 250 Protection Groups (Up from 150 in v4.1)
    • 30 Simultaneous running recovery plans (Up from 3 in v4.1 – this is the biggest improvement in scalability)
  • Planned Migration – This is a big feature add.  This is another option when you are going failover.  In 4.1 the only option was to start up the VMs from the last good replication and go.  This option now allow you to migrate when there is an impending disaster and the protected side is still up.  Planned migration will shut down the VMs on the protected side then initiate a replication of the storage frames (or vSphere Replication) to get the last drop of changed data to the recovery side before powering on the VMs and bringing them up.  One extremely important advantage to this method: the VMs are always in a application-consistent state when they come up in DR.  (Absolutely love this feature)
  • Failback – the single most-requested feature in SRM4.  Once a failover occurs, the admin clicks the “Reprotect” link to reset the recovery plan for failback and reverse replication.  Once completed, the recovery plan can be tested or run in the reverse direction and recovery the VMs to the origional protected site.  (This is outstanding for enterprises that are required to do a true failover for DR testing.
  • User Interface improvements – Slightly different look and feel.
    • both sides are visible without vCenter linked mode
    • IP changes for VMs during recovery can now be entered in the GUI (thank you VMware!)
    • Placeholder VMs at the DR side now have a unique icon (with a thunderbolt thru it) to identify them easily in the DR vCenter.
    • Reports now include the user ID that initiated the Failover or DR test.
    • Reports now include more information about the storage steps (including the device friendly names)
  • IPv6 Support – Ipv6 is now supported for all links.
  • IP Customization performance increase – big performance improvement in the actual IP conversion in the VM
  • In guest callouts – now you can run a script inside the VM, run a script on the SRM server or insert a breakpoint to post a message (these also now have maximum timeouts as an option) during the recovery plans
  • New APIs on both the Protected and Recovery Sides – new commands for 3rd party integration (note these are SOAP based and not PowerShell or PowerCLI)
  • Dependency Improvements – There are now 5 priority groups for each recovery plan.  Each priority group has to finish completely before the recovery plan will start with the next group.  Within a single priority group, you can also set dependencies (similar to how Windows Services set dependencies) so that a particular VM will not recover before it’s dependencies have recovered (note-this is within a single priority group and cannot span priority groups.)
  • Licensing – There are now two editions of SRM, Standard and Enterprise.  Both are feature identical.  Standard is for sites up to 75 VMs and Enterprise is for sites up to 1000 VMs (the technical limit).  All existing customers who maintain support will get SRM Enterprise when they go to SRM5.  SRM Standard is a new offering for SMBs and Remote Offices.  When customers need to grow beyond 75 VMs at a site, they can upgrade their existing VMs to SRM Enterprise and then continue buying SRM5 Enterprise VM-Packs.  Licensing still sold in packs of 25 VMs.  Only need to purchase for the VMs that you are going to protect.


Mar 312011

I was installing Site Recovery Manager in Miami today for a customer who replicates to Atlanta.  I’ve been working with them the last few days to setup SRM and get their Protection Groups and Recovery Plans in place.  One suggestion that I made was to add a Message to their Recovery Plans to pause after the Shutdown of the Primary Site VMs.

Here’s why:  In a true failover in SRM, the first step of the Recovery Plan is to shut down all of the VMs in the primary site that are protected.  This is so that when the VMs restart in the recovery site, they do not conflict on the network with the original VMs.  Here in Miami the typical disaster is, of course, hurricanes.  Hurricanes are typically predictable with a decent notice.  I can then assume that my odds are higher than normal that when the person hits the failover button here, both sites are still available and I will be avoiding a disaster and not recovering from one.  For this reason, I recommended to my customer that they add a message in their recovery plan right above “Prepare Storage” in the Recovery Plan.  You can do this by right-clicking “Prepare Storage” and selecting “Add Message”.

I added the following message to their Recovery Plan:

If this is a true failover and Miami is still available, perform a final replication of the storage to get the last transactions from Miami.  Once all replication is completed, you may click Continue.

If this is only a test of the plan, you may click Continue at any time.”
My customer replicates hourly between sites.  If they were to start a failover 45 minutes after the last replication, they would lose the last 45 minutes of transactions when they ran the Recovery Plan.  SRM does not kick off replication in any way.  When we add the message above, we can have the storage admin run a final sync from the production site with the VM’s in a down state.  This will extent my Recovery Time Objective as it will add time to the whole plan to have to wait for the final replication to finish.  However, using this last replication, I can now get the final transactions over to the Recovery Site and minimize the clean up work on recovery.  Of course, you can script this last replication if you want to, however be very careful as there is no way to mark a script to run in failover mode only (vs. test mode).  You may not want a script to force a replication in both test mode and full recovery mode.
Just a little tip to optimize your Disaster Failover.  Now let’s hope we never need to use it.