Debunking the myths of why Vmware is better than Hyper-V – Clustered File System…

Today I’m gonna write about something called Cluster Shared Volumes (CSV) and as usual, the stories that are told by Vmware.

When I talk to Vmware customers these days, one of the stories I hear (out of many) is that Hyper-V does not have a clustered file system that makes it possible to share storage between cluster nodes.
Looking at Vmware’s web site, I can see where this story originates from as they claim Hyper-V has no clustered file system.

Well, that is simply not true. Hyper-V has for many years now (since the release of Windows 2008 R2) had a feature called Cluster Shared Volumes which makes it possible to share the same storage across multiple cluster nodes and thereby also making live migration possible.

But how does CSV work then?

Well, as said CSV makes the same storage available to all nodes in the cluster, as shown in the picture below.

CSV

It does this by using Failover Clustering in Windows Server to make the storage accesible to all nodes. It does this by using something called a CSV coordinator node, which coordinates some types of transactions to the storage to ensure data consistency.
This coordinator role is completely interchangeable within the cluster and can be transitioned from one node to another as you please.

But CSV does offer more than sharing the same storage between nodes, it also offers fault tolerance. Let’s say that in the above scenario, the link to the storage dies for one of the hosts (for example, someone unplugs a fibre cable, the SAN admins makes an error in zoning or so on). Normally, one would assume that this would cause the virtual machines on that hosts to die as well and be failed over to other hosts causing a loss of service (which would be the case in Vmware for example).
Well, if you’re using CSV then it is a bit different as the node will just begin to ship disk I/O over the network to the CSV coordinator node as shown in the picture below.

Failed CSV

This would make it possible for Hyper-V admins to live migrate these machines to other fully working nodes without causing a loss of service the the end users.

Another cool feature of CSV is the ability to allocate a portion of the physical memory on the cluster node to read cache, thereby saving frequently accessed data in RAM (which off course is way way faster than disk) and thereby increasing performance for both the server requesting the data as well as the other servers by sparing the disk subsystem for the many read I/O’s.

For more reading, see this technet article: http://technet.microsoft.com/en-us/library/jj612868.aspx

Debunking the myths of why Vmware is better than Hyper-V – Performance

With this post, I’m continuing my series on the myths surrounding Hyper-V that I encounter in my workdays. Previously, I’ve written about Transparent Page Sharing, Memory overcommit and disk footprint.

Today I’m gonna write about another major issue I hear from customers, namely performance, and one that is completely understandable. Everyone can relate to wanting the most bang for your buck.
When looking at the Vmware sales pitch on their website (and one that is repeated time and again from their sales reps) it sounds like Vmware is the best option to buy:

VMware vSphere—the industry’s first x86 “bare-metal” hypervisor—is the most reliable and robust hypervisor. Launched in 2001 and now in its fifth generation, VMware vSphere has been production-proven in tens of thousands of customer deployments all over the world.

Other hypervisors are less mature, unproven in a wide cross-section of production datacenters, and lacking core capabilities needed to deliver the reliability, scalability, and performance that customers require.

You can clearly see from this text that Vmware believes their hypervisor to be superior in any way, including performance. But is this true?

Well, I can’t answer that and neither can most of the other people out there. Why is that?

Let’s snip a bit more from Vmware’s material, this time their EULA:

2.4 Benchmarking.You may use the Software to conduct internal performance testing and benchmarking studies. You may only publish or otherwise distribute the results of such studies to third parties as follows: (a) if with respect to VMware’s Workstation or Fusion products, only if You provide a copy of Your study to benchmark@vmware.comprior to distribution; (b) if with respect to any other Software, only if VMware has reviewed and approved of the methodology, assumptions and other parameters of the study (please contact VMware at benchmark@vmware.comto request such review and approval) prior to such publication and distribution.
As you can see, this clearly states that if you wish to do a benchmark pitching Hyper-V against Vmware, then you need Vmware’s approval for both doing the test, how you do it and afterwards for the results.
Wanna make a guess that a bad test for Vmware’s isn’t going to be approved?
So can you trust a vendor that claims superior performance, but is unwilling to allow tests where the results are approved by the said vendor… That’s up to you to decide…

Debunking the myths of why Vmware is better than Hyper-V – Disk footprint…

I’m continuing my posts tackling the myths that I meet when talking to customers about Hyper-V. I’ve previously written about Transparent Page Sharing and Memory overcommit.
For today’s post, I’m gonna talk about hypervisor disk footprint.

One of the things I hear mentioned from customers when comparing ESXi to Hyper-V is disk footprint. I’ve heard this mentioned numerous times at various customers, but none has gotten the terms right hence I can only speculate that this argument comes from the marketing machine at a certain vendor 😉

The argument stands as this:

ESXi only consumes 144MB of disk space, while Hyper-V consumes 5GB of disk space.

So, is this true?

Well, yes it is but only if you do the comparison of the two hypervisor structures as you would compare apples to bananas. While it is true that the hypervisor kernel in Vmware only consumes 144MB of disk space it is not true for Hyper-V. Here the kernel only consumes about 600KB of space. So why the claim that Hyper-V uses 5GB of disk space.
Well, Vmware counts the management partition in this calculation while forgetting to do the same for their own operating system.

To compare the two, we need to have some understanding of the two hypervisor structures.
Vmware uses a hypervisor type called monolithic, like shown in this picture:
ESXi kernel typeIn this kernel type, drivers are included in the hypervisor kernel.

If we take a look at Hyper-V, it uses a kernel type called micro-kernelized:
Hyper-V kernel typeIn this type, only the hypervisor runs in the kernel and all drivers, management and so forth are located in the parent (management) partition.

So, as shown above Vmware is both right and wrong when claiming that ESXi consumes 144MB of disk space and Hyper-V uses 5GB. It depends on how you look at it. But to do the comparison on a fair basis, then when Vmware claims that their hypervisor only takes 144MB of disk space, then they should tell that Hyper-V uses only 600KB.

Furthermore, when comparing these two designs there are some distinct differences that are worth mentioning.

  • As drivers are not loaded in the hypervisor kernel, the need for specialized drivers are removed in Hyper-V. All drivers working with Windows Server will work with Hyper-V as opposed to Vmware where they need to be specifically written for it.
  • All drivers in Hyper-V run in the parent partition, thus “isolating” them from acting directly in the hypervisor layer. Looking at the Vmware approach where drivers run in the kernel, this could cause a malfunctioning driver to impact virtual machines or allow a malicious driver to gain access to the virtual machines (for example through the vShield API).
  • The amount of code running in the kernel is 600KB as opposed to 144MB in ESXi

Lastly, another selling point that derives from this footprint discussion is the matter of security. Vmware states that their product is more safe due to the so-called smaller disk footprint, based on the argument that a smaller code base equals a more secure product.
If that statement was to hold up, then Windows 95 is to be considered more secure than Windows 7 as the first one only consumes about 80MB of disk space while the last one uses several GB.
Today, most attackers focus on getting in the same ways at the admins of the given product does. This is a side-effect of the products getting more and more secure, and as so it’s your security policies and processes that keep your infrastructure secure and not the amount of disk space (lines of code).

Debunking the myths of why Vmware is better than Hyper-V – Memory overcommit…

As I wrote previously, I’ve decided to tackle some of the myths and lies that surround Hyper-V as I hear them from either customers or Vmware sales reps. Previously, I’ve written about Transparent Page Sharing and why it isn’t useful anymore.

For this article, I’m going to talk about the Memory Overcommit feature from the vSphere Hypervisor.
Vmware description of this:

VMware’s ESX® Server is a hypervisor that enables competitive memory and CPU consolidation ratios. ESX allows users to power on virtual machines (VMs) with a total configured memory that exceeds the memory available on the physical machine. This is called memory overcommitment.

So, a really cool feature by its description.
Microsoft off course has a feature which accomplishes the same goal but in a very different way, called Dynamic Memory. More on that in a bit.

To go a little bit in depth of this feature, I’ll snip some from Vmware documentation:

For each running virtual machine, the system reserves physical memory for the virtual machine’s reservation (if any) and for its virtualization overhead.

Because of the memory management techniques the ESXi host uses, your virtual machines can use more memory than the physical machine (the host) has available. For example, you can have a host with 2GB memory and run four virtual machines with 1GB memory each. In that case, the memory is overcommitted.

To compare Vmware memory overcommit to Microsoft’s Dynamic Memory, on the surface they both operate towards the same end goal of providing more memory than available in the assumption that the virtual machines actually never goes beyond this boundary.
Dynamic Memory however works a bit different. Where in Vmware you just assign the maximum amount of memory you wish to have available, in Hyper-V you define 3 parameters:

  • Startup Memory
  • Minimum Memory
  • Maximum Memory

The startup memory is somewhat self-explanatory. It is the amount available for the machine at boot. Minimum memory is the minimum amount you wish to have available to the virtual machine and it will never drop below this. Maximum is again self-explanatory, as it is the maximum amount available to the virtual machine. Hyper-V then assigns and removes memory from the guest OS using hot-add and memory ballooning.
However, the key difference is not in these settings but in the fact that you CANNOT overcommit memory in Hyper-V. This however requires some explaining…

Let’s take the above example. You are running a host with 2 GB of memory and create 4 virtual machines with 1 GB of memory each. Your environment is then running and something happens that requires that the virtual machines to use all of their memory (could also just be that 3 of the virtual machines did, but this is just to illustrate the scenario where you need more memory than is available).
In ESX all the machines believe they have the memory available and will use it, but the underlying ESX hypervisor cannot do magic and come up with the extra memory so it starts swapping pages to disk = performance goes haywire.
If this where to happen in Hyper-V, the virtual machines would be aware of the fact that they did not have all that memory as Hyper-V will not assign more memory to the servers than what is available. So what will happen in this scenario? Well, like above swapping will occur but this time not at the hypervisor layer but at the virtual machine layer.

And this is a major difference and why it works so much better in Hyper-V. Who knows better which data can be swapped to disk and which can’t than the machine running the workload? An example could be a SQL server, where you would prefer that data related to SQL databases stayed in memory and for example pages relating to background processing going to the swap file.
In Vmware should swapping occur you run the risk of swapping out the wrong data and thereby decreasing performance even more, but in Hyper-V the virtual machine decides for itself what is best suited for this.

Now, as I’ve had this discussion a couple of times before I know the answer from the Vmware guys are that you can decide where to place the memory swap file in Vmware and do this on SSD.
Well, not so much an argument as first off SSD is still slower than RAM and second this is also a possibility in Hyper-V so the virtual machine places its swap file on SSD (and this is the actual swap file of the virtual machine, so it stays there constantly).

So to sum up, having less memory available than needed is not a desired configuration as it reduces performance. Should you however encounter the scenario where you have to little memory, Hyper-V solves the problem better for you…

Debunking the myths of why Vmware is better than Hyper-V – Transparent Page Sharing

When I visit my customers and talk about Hyper-V I get a lot of these “…but Vmware is better because…” and it always ticks me off when I know it isn’t true.
So, I’ve decided to go through these myths and argue why they either don’t matter/isn’t relevant any more or if they are outright untrue.

For this first post about this topic, I’ve chosen to talk about the transparent page sharing (TPS) feature from vSphere Hypervisor.

Vmware describes the feature like this:

When multiple virtual machines are running, some of them may have identical sets of memory content. This presents opportunities for sharing memory across virtual machines (as well as sharing within a single virtual machine).
For example, several virtual machines may be running the same guest operating system, have the same applications, or contain the same user data.
With page sharing, the hypervisor can reclaim the redundant copies and keep only one copy, which is shared by multiple virtual machines in the host physical memory. As a result, the total virtual machine host memory consumption is reduced and a higher level of memory overcommitment is possible.
So, that sounds neat doesn’t it?
Well, in reality this feature isn’t that useful as it was back in the days of Windows 2000/2003. And this is because of a “new” feature that was introduced with Windows Server 2008 called Large Pages. This means that where a memory page in the previous versions of Windows was 4KB in size it can now be 2MB in size.
To describe why Large Pages are better to use, I’ll snip a bit from an article from AMD:

Why is it [Large Pages] better? Let’s say that your application is trying to read 1MB (1024KB) of contiguous data that hasn’t been accessed recently, and thus has aged out of the TLB cache. If memory pages are 4KB in size, that means you’ll need to access 256 different memory pages. That means searching and missing the cache 256 times—and then having to walk the page table 256 times. Slow, slow, slow.

By contrast, if your page size is 2MB (2048KB), then the entire block of memory will only require that you search the page table once or twice—once if the 1MB area you’re looking for is contained wholly in one page, and twice if it splits across a page boundary. After that, the TLB cache has everything you need. Fast, fast, fast.

It gets better.

For small pages, the TLB mechanism contains 32 entries in the L1 cache, and 512 entries in the L2 cache. Since each entry maps 4KB, you can see that together these cover a little over 2MB of virtual memory.

For large pages, the TLB contains eight entries. Since each entry maps 2MB, the TLBs can cover 16MB of virtual memory. If your application is accessing a lot of memory, that’s much more efficient. Imagine the benefits if your app is trying to read, say, 2GB of data. Wouldn’t you rather it process a thousand buffed-up 2MB pages instead of half a million wimpy 4KB pages?

So Large Pages are a good thing to use, as it reduces the reads needed to get data from memory. But why is it then a problem for Transparent Page Sharing?
Well, let’s assume you have a bunch of servers running on your ESX host. These all contain data in memory, which is scanned by the TPS feature (which by they way uses CPU resources on this, but that’s another story). If you are running Windows 2003 the servers write 4KB pages to memory and the chances that 2 pages are similar and you thereby save memory is off course present.
But if you are running Windows 2008 or newer, then here comes the 2MB pages. If TPS is to be useful here it would mean that you had to have 16.777.216 bits (that is almost 17 million bits)) that are EXACTLY the same for TPS to kick in and work. And that’s not very likely to happen…
So to summarize, Transparent Page Sharing which is a selling feature from Vmware (and one I for a fact know they use to badmouth Hyper-V) isn’t really relevant any more. You just don’t need it anymore…

Hyper-V server 2012 R2 – or the free hypervisor that can do it all…

You probably already know about vSphere Hypervisor (ESXi), the free hypervisor product from Vmware. Some of you may already be using it, either for testing or in smaller scale production deployments.
Looking at the feature set compared to the larger versions, there is a distinct lack of features in this version:

  • No live migrations (vMotion or Storage vMotion)
  • No High Availability
  • No Replication
  • No integration to management tools (ex. vCenter)

So basically, it is meant to run a single server with non-critical virtual machines (which usually describes your typical test environment).

But what if there was another option… Well there is.

Microsoft Hyper-V Server (which has been around since Windows 2008) has just been updated and released in a new version, 2012 R2. With this you get all the features of Windows Server 2012 R2 Datacenter FOR FREE. A quick rundown on some of the new features are seen here on Technet.

And all the limitations of the vSphere Hypervisor is gone, since you also get:

  • High Availability in the form of Failover Clustering
  • Live migration of virtual machines, both “normal” and storage migration
  • Shared nothing live migration, where you can migrate a virtual machine between 2 non-clustered hosts without incurring downtime
  • Replication of virtual machines using Hyper-V Replica
  • Full integration with the System Center portfolio, in case you have those products

So, are you looking to provision some virtualization hosts but doesn’t like the feature set in the vSphere Hypervisor, Hyper-V Server is the product for you.

Swing by Technet and grab your copy of it.

And for those of you virtualizing Linux VM’s, there is off course also a wide range of these supported in Hyper-V. See the list here.

The case of the missing performance counters in VMM…

I upgraded a Virtual Machine Manager (VMM) 2012 installation to SP1 a little while ago, which since the base OS of the server was Windows 2008 R2, involved uninstalling VMM, upgrading the OS to 2012 and then installing VMM 2012 SP1 on the server.

Since then, the counters for CPU, Assigned Memory and Memory demand was not working as you can see in the screenshot below:

VMM no counters

The only ones showing anything where the machines with fixed memory, which off course displayed assigned memory.

I tried the usual troubleshooting options of updating the integration tools in the VM’s, re-installing the VMM agent on the hosts, rebuilding performance counters and so on but to no avail. So since then I’ve been beating my head over this, and was pondering opening a support case with Microsoft PSS.

But tonight, as I was doing some routine maintenance and also troubleshooting another problem I was installing the hotfix from KB2580360 and after rebooting the hosts and moving virtual machines back the counters where working again.
So it seems to be related to some WMI queries failing on the servers, although Operations Manager didn’t log any errors on this.

For reference, the hotfix is only intended for systems running Windows 2008 R2 SP1 or Windows 2008 R2 RTM with hotfix KB974930 installed.

So for now, I can at least close that case… The big question will then be if the error I was troubleshooting originally will disappears as well. Hopefully more on that later…