Virtualizing domain controllers used to be a big no-no…

So, I’ve been away from blogging for a while. Work has kept me busy during the day and my little daughter, which came last year, did the same at night 🙂

But for this post I’d like to tackle the issue of virtualizing a domain controller.

In the past this could cause some rather unforeseen and dramatic consequences if one was not aware of how to do this properly. Most businesses I’ve visited over the years since virtualization began to ramp of have to a large degree virtualized all servers possible and what are better candidates than domain controllers which most of the time doesn’t use the ressources allocated.
Most of these have also adapted their backup solutions to meet these new scenarios where one of the most seen solutions is based on some form of virtual machine snapshotting.
Well, that works okay for a file server, print server or other applications that are crash-consistent. But what happens if you restore a domain controller from a snapshot?

Let’s take some background information on how domain controllers work at first. They, as we all know, run Active Directory and each one of them holds a copy of the Active Directory database. This database is assigned a value known as an InvocationID and each update (that does not require the involvement of a FSMO role master) is done against the local Active Directory database, which is then synced with its replication partners. This is done using Update Sequence Numbers (USN). These two numbers form a unique identifier on the local database and are used to determine whether updates need to be processed.

Now, take the scenario from before where a domain controller is restored from a snapshot. What happens then?
Well, the domain controller that comes up from a restored state doesn’t know this so it will attempt to sync with its partners. But then comes the problems as the USN it tries to use are already acknowledged by the other domain controllers as used and therefor no sync will happen.
And as stated above, as changes happens towards the local Active Directory database then over the course of time they will become more and more out of sync. An example (in the light end is a user changing his/her password on the restored domain controller, which doesn’t sync to the others. You then have one password on some systems and another on others, depending on which DC they validate against).

This is really not a desired scenario, but thankfully Microsoft has done something about this…

In Windows Server 2012 Microsoft made the domain controller virtualization aware (on supported hypervisors that is). This means that the domain controller now knows it is virtual and can take steps to prevent the above described scenario.
So how is this done…

Well, Microsoft introduced a new attribute to Active Directory, more specifically the VmGenerationID which is a number stored on the domain controller object in Active Directory.
This attribute is then monitored by the Windows operating system to ensure it matches the number the server has stored locally. If this is not the case, then the domain controller assumes it has been restored and take actions the prevent the above scenario.
What happens in this case is that the domain controller resets its InvocationID and discards the issued RID pool effectively preventing the re-use of USN. It then marks its local SYSVOL share for a non-authoritive restore.

This picture shows how the proces is done:

Virtualization Safeguards During Normal Boot

The requirements for using this cool new feature is the following:

  • Supported hypervisor platform (Hyper-V 2012 or newer, vSphere 5.02, ESX 5.1)
  • Windows Server 2012 domain controller
  • PDC emulator running Windows Server 2012


Debunking the myths of why Vmware is better than Hyper-V – Clustered File System…

Today I’m gonna write about something called Cluster Shared Volumes (CSV) and as usual, the stories that are told by Vmware.

When I talk to Vmware customers these days, one of the stories I hear (out of many) is that Hyper-V does not have a clustered file system that makes it possible to share storage between cluster nodes.
Looking at Vmware’s web site, I can see where this story originates from as they claim Hyper-V has no clustered file system.

Well, that is simply not true. Hyper-V has for many years now (since the release of Windows 2008 R2) had a feature called Cluster Shared Volumes which makes it possible to share the same storage across multiple cluster nodes and thereby also making live migration possible.

But how does CSV work then?

Well, as said CSV makes the same storage available to all nodes in the cluster, as shown in the picture below.


It does this by using Failover Clustering in Windows Server to make the storage accesible to all nodes. It does this by using something called a CSV coordinator node, which coordinates some types of transactions to the storage to ensure data consistency.
This coordinator role is completely interchangeable within the cluster and can be transitioned from one node to another as you please.

But CSV does offer more than sharing the same storage between nodes, it also offers fault tolerance. Let’s say that in the above scenario, the link to the storage dies for one of the hosts (for example, someone unplugs a fibre cable, the SAN admins makes an error in zoning or so on). Normally, one would assume that this would cause the virtual machines on that hosts to die as well and be failed over to other hosts causing a loss of service (which would be the case in Vmware for example).
Well, if you’re using CSV then it is a bit different as the node will just begin to ship disk I/O over the network to the CSV coordinator node as shown in the picture below.

Failed CSV

This would make it possible for Hyper-V admins to live migrate these machines to other fully working nodes without causing a loss of service the the end users.

Another cool feature of CSV is the ability to allocate a portion of the physical memory on the cluster node to read cache, thereby saving frequently accessed data in RAM (which off course is way way faster than disk) and thereby increasing performance for both the server requesting the data as well as the other servers by sparing the disk subsystem for the many read I/O’s.

For more reading, see this technet article:

Debunking the myths of why Vmware is better than Hyper-V – Performance

With this post, I’m continuing my series on the myths surrounding Hyper-V that I encounter in my workdays. Previously, I’ve written about Transparent Page Sharing, Memory overcommit and disk footprint.

Today I’m gonna write about another major issue I hear from customers, namely performance, and one that is completely understandable. Everyone can relate to wanting the most bang for your buck.
When looking at the Vmware sales pitch on their website (and one that is repeated time and again from their sales reps) it sounds like Vmware is the best option to buy:

VMware vSphere—the industry’s first x86 “bare-metal” hypervisor—is the most reliable and robust hypervisor. Launched in 2001 and now in its fifth generation, VMware vSphere has been production-proven in tens of thousands of customer deployments all over the world.

Other hypervisors are less mature, unproven in a wide cross-section of production datacenters, and lacking core capabilities needed to deliver the reliability, scalability, and performance that customers require.

You can clearly see from this text that Vmware believes their hypervisor to be superior in any way, including performance. But is this true?

Well, I can’t answer that and neither can most of the other people out there. Why is that?

Let’s snip a bit more from Vmware’s material, this time their EULA:

2.4 Benchmarking.You may use the Software to conduct internal performance testing and benchmarking studies. You may only publish or otherwise distribute the results of such studies to third parties as follows: (a) if with respect to VMware’s Workstation or Fusion products, only if You provide a copy of Your study to benchmark@vmware.comprior to distribution; (b) if with respect to any other Software, only if VMware has reviewed and approved of the methodology, assumptions and other parameters of the study (please contact VMware at benchmark@vmware.comto request such review and approval) prior to such publication and distribution.
As you can see, this clearly states that if you wish to do a benchmark pitching Hyper-V against Vmware, then you need Vmware’s approval for both doing the test, how you do it and afterwards for the results.
Wanna make a guess that a bad test for Vmware’s isn’t going to be approved?
So can you trust a vendor that claims superior performance, but is unwilling to allow tests where the results are approved by the said vendor… That’s up to you to decide…

Debunking the myths of why Vmware is better than Hyper-V – Disk footprint…

I’m continuing my posts tackling the myths that I meet when talking to customers about Hyper-V. I’ve previously written about Transparent Page Sharing and Memory overcommit.
For today’s post, I’m gonna talk about hypervisor disk footprint.

One of the things I hear mentioned from customers when comparing ESXi to Hyper-V is disk footprint. I’ve heard this mentioned numerous times at various customers, but none has gotten the terms right hence I can only speculate that this argument comes from the marketing machine at a certain vendor 😉

The argument stands as this:

ESXi only consumes 144MB of disk space, while Hyper-V consumes 5GB of disk space.

So, is this true?

Well, yes it is but only if you do the comparison of the two hypervisor structures as you would compare apples to bananas. While it is true that the hypervisor kernel in Vmware only consumes 144MB of disk space it is not true for Hyper-V. Here the kernel only consumes about 600KB of space. So why the claim that Hyper-V uses 5GB of disk space.
Well, Vmware counts the management partition in this calculation while forgetting to do the same for their own operating system.

To compare the two, we need to have some understanding of the two hypervisor structures.
Vmware uses a hypervisor type called monolithic, like shown in this picture:
ESXi kernel typeIn this kernel type, drivers are included in the hypervisor kernel.

If we take a look at Hyper-V, it uses a kernel type called micro-kernelized:
Hyper-V kernel typeIn this type, only the hypervisor runs in the kernel and all drivers, management and so forth are located in the parent (management) partition.

So, as shown above Vmware is both right and wrong when claiming that ESXi consumes 144MB of disk space and Hyper-V uses 5GB. It depends on how you look at it. But to do the comparison on a fair basis, then when Vmware claims that their hypervisor only takes 144MB of disk space, then they should tell that Hyper-V uses only 600KB.

Furthermore, when comparing these two designs there are some distinct differences that are worth mentioning.

  • As drivers are not loaded in the hypervisor kernel, the need for specialized drivers are removed in Hyper-V. All drivers working with Windows Server will work with Hyper-V as opposed to Vmware where they need to be specifically written for it.
  • All drivers in Hyper-V run in the parent partition, thus “isolating” them from acting directly in the hypervisor layer. Looking at the Vmware approach where drivers run in the kernel, this could cause a malfunctioning driver to impact virtual machines or allow a malicious driver to gain access to the virtual machines (for example through the vShield API).
  • The amount of code running in the kernel is 600KB as opposed to 144MB in ESXi

Lastly, another selling point that derives from this footprint discussion is the matter of security. Vmware states that their product is more safe due to the so-called smaller disk footprint, based on the argument that a smaller code base equals a more secure product.
If that statement was to hold up, then Windows 95 is to be considered more secure than Windows 7 as the first one only consumes about 80MB of disk space while the last one uses several GB.
Today, most attackers focus on getting in the same ways at the admins of the given product does. This is a side-effect of the products getting more and more secure, and as so it’s your security policies and processes that keep your infrastructure secure and not the amount of disk space (lines of code).

Debunking the myths of why Vmware is better than Hyper-V – Memory overcommit…

As I wrote previously, I’ve decided to tackle some of the myths and lies that surround Hyper-V as I hear them from either customers or Vmware sales reps. Previously, I’ve written about Transparent Page Sharing and why it isn’t useful anymore.

For this article, I’m going to talk about the Memory Overcommit feature from the vSphere Hypervisor.
Vmware description of this:

VMware’s ESX® Server is a hypervisor that enables competitive memory and CPU consolidation ratios. ESX allows users to power on virtual machines (VMs) with a total configured memory that exceeds the memory available on the physical machine. This is called memory overcommitment.

So, a really cool feature by its description.
Microsoft off course has a feature which accomplishes the same goal but in a very different way, called Dynamic Memory. More on that in a bit.

To go a little bit in depth of this feature, I’ll snip some from Vmware documentation:

For each running virtual machine, the system reserves physical memory for the virtual machine’s reservation (if any) and for its virtualization overhead.

Because of the memory management techniques the ESXi host uses, your virtual machines can use more memory than the physical machine (the host) has available. For example, you can have a host with 2GB memory and run four virtual machines with 1GB memory each. In that case, the memory is overcommitted.

To compare Vmware memory overcommit to Microsoft’s Dynamic Memory, on the surface they both operate towards the same end goal of providing more memory than available in the assumption that the virtual machines actually never goes beyond this boundary.
Dynamic Memory however works a bit different. Where in Vmware you just assign the maximum amount of memory you wish to have available, in Hyper-V you define 3 parameters:

  • Startup Memory
  • Minimum Memory
  • Maximum Memory

The startup memory is somewhat self-explanatory. It is the amount available for the machine at boot. Minimum memory is the minimum amount you wish to have available to the virtual machine and it will never drop below this. Maximum is again self-explanatory, as it is the maximum amount available to the virtual machine. Hyper-V then assigns and removes memory from the guest OS using hot-add and memory ballooning.
However, the key difference is not in these settings but in the fact that you CANNOT overcommit memory in Hyper-V. This however requires some explaining…

Let’s take the above example. You are running a host with 2 GB of memory and create 4 virtual machines with 1 GB of memory each. Your environment is then running and something happens that requires that the virtual machines to use all of their memory (could also just be that 3 of the virtual machines did, but this is just to illustrate the scenario where you need more memory than is available).
In ESX all the machines believe they have the memory available and will use it, but the underlying ESX hypervisor cannot do magic and come up with the extra memory so it starts swapping pages to disk = performance goes haywire.
If this where to happen in Hyper-V, the virtual machines would be aware of the fact that they did not have all that memory as Hyper-V will not assign more memory to the servers than what is available. So what will happen in this scenario? Well, like above swapping will occur but this time not at the hypervisor layer but at the virtual machine layer.

And this is a major difference and why it works so much better in Hyper-V. Who knows better which data can be swapped to disk and which can’t than the machine running the workload? An example could be a SQL server, where you would prefer that data related to SQL databases stayed in memory and for example pages relating to background processing going to the swap file.
In Vmware should swapping occur you run the risk of swapping out the wrong data and thereby decreasing performance even more, but in Hyper-V the virtual machine decides for itself what is best suited for this.

Now, as I’ve had this discussion a couple of times before I know the answer from the Vmware guys are that you can decide where to place the memory swap file in Vmware and do this on SSD.
Well, not so much an argument as first off SSD is still slower than RAM and second this is also a possibility in Hyper-V so the virtual machine places its swap file on SSD (and this is the actual swap file of the virtual machine, so it stays there constantly).

So to sum up, having less memory available than needed is not a desired configuration as it reduces performance. Should you however encounter the scenario where you have to little memory, Hyper-V solves the problem better for you…

Debunking the myths of why Vmware is better than Hyper-V – Transparent Page Sharing

When I visit my customers and talk about Hyper-V I get a lot of these “…but Vmware is better because…” and it always ticks me off when I know it isn’t true.
So, I’ve decided to go through these myths and argue why they either don’t matter/isn’t relevant any more or if they are outright untrue.

For this first post about this topic, I’ve chosen to talk about the transparent page sharing (TPS) feature from vSphere Hypervisor.

Vmware describes the feature like this:

When multiple virtual machines are running, some of them may have identical sets of memory content. This presents opportunities for sharing memory across virtual machines (as well as sharing within a single virtual machine).
For example, several virtual machines may be running the same guest operating system, have the same applications, or contain the same user data.
With page sharing, the hypervisor can reclaim the redundant copies and keep only one copy, which is shared by multiple virtual machines in the host physical memory. As a result, the total virtual machine host memory consumption is reduced and a higher level of memory overcommitment is possible.
So, that sounds neat doesn’t it?
Well, in reality this feature isn’t that useful as it was back in the days of Windows 2000/2003. And this is because of a “new” feature that was introduced with Windows Server 2008 called Large Pages. This means that where a memory page in the previous versions of Windows was 4KB in size it can now be 2MB in size.
To describe why Large Pages are better to use, I’ll snip a bit from an article from AMD:

Why is it [Large Pages] better? Let’s say that your application is trying to read 1MB (1024KB) of contiguous data that hasn’t been accessed recently, and thus has aged out of the TLB cache. If memory pages are 4KB in size, that means you’ll need to access 256 different memory pages. That means searching and missing the cache 256 times—and then having to walk the page table 256 times. Slow, slow, slow.

By contrast, if your page size is 2MB (2048KB), then the entire block of memory will only require that you search the page table once or twice—once if the 1MB area you’re looking for is contained wholly in one page, and twice if it splits across a page boundary. After that, the TLB cache has everything you need. Fast, fast, fast.

It gets better.

For small pages, the TLB mechanism contains 32 entries in the L1 cache, and 512 entries in the L2 cache. Since each entry maps 4KB, you can see that together these cover a little over 2MB of virtual memory.

For large pages, the TLB contains eight entries. Since each entry maps 2MB, the TLBs can cover 16MB of virtual memory. If your application is accessing a lot of memory, that’s much more efficient. Imagine the benefits if your app is trying to read, say, 2GB of data. Wouldn’t you rather it process a thousand buffed-up 2MB pages instead of half a million wimpy 4KB pages?

So Large Pages are a good thing to use, as it reduces the reads needed to get data from memory. But why is it then a problem for Transparent Page Sharing?
Well, let’s assume you have a bunch of servers running on your ESX host. These all contain data in memory, which is scanned by the TPS feature (which by they way uses CPU resources on this, but that’s another story). If you are running Windows 2003 the servers write 4KB pages to memory and the chances that 2 pages are similar and you thereby save memory is off course present.
But if you are running Windows 2008 or newer, then here comes the 2MB pages. If TPS is to be useful here it would mean that you had to have 16.777.216 bits (that is almost 17 million bits)) that are EXACTLY the same for TPS to kick in and work. And that’s not very likely to happen…
So to summarize, Transparent Page Sharing which is a selling feature from Vmware (and one I for a fact know they use to badmouth Hyper-V) isn’t really relevant any more. You just don’t need it anymore…

Vmware – possible data corruption in virtual machine…

I came across this article on the Vmware support forums, and even though I haven’t encountered the error myself I though i’d post it anyways so as many people get this information.


On a Windows 2012 virtual machine using the default e1000e network adapter and running on an ESXi 5.0 or 5.1 host, you experience these symptoms:

  • Data corruption may occur when copying data over the network.
  • Data corruption may occur after a network file copy event.


The root cause of this issue is currently under investigation.

Please read this KB from Vmware on how to avoid this issue in case you are running ESXi 5.0 or 5.1 and have WIndows 2012 vm’s.