Updated in Mar 2023:
Ignore the May 2022 update, I was right all along. Microsoft actually change the default VMQ in Azure Stack HCI deployments, so that says all I need to know.
**** ALWAYS change base processor to core 2 (or 1 if HT is not enabled) ****
Updated in May 2022:
For about a year or so now since MS made some tweaks and a growing confidence in the out of the box settings, the optimal position has changed on WS2019. Essentially leave VMQ in default settings for Windows Server 2019.
A few months after this ranty post, Dan Quomo @ Microsoft published some blogs cleaning up some of the confusion around VMQ and its friends:
Synthetic Accelerations in a Nutshell – Windows Server 2012 – Microsoft Tech Community
Synthetic Accelerations in a Nutshell – Windows Server 2012 R2 – Microsoft Tech Community
Synthetic Accelerations in a Nutshell – Windows Server 2016 – Microsoft Tech Community
Synthetic Accelerations in a Nutshell – Windows Server 2019 – Microsoft Tech Community
Also since this rant, some blogs and posts have been taken down removing some of the confusion, so hopefully like is a little simpler in the Hyper-V world of networking.
Windows Server 2016 settings below still stand.
Cheers
Dan 🙂
End of May 2022 update.
Ok, so this has been something that’s been brewing for years now…
Having worked with many clients who have had Hyper-V issues related to VMQ, and spending a lot of my time debating incorrect/stale blog posts, I feel it’s time to set to story straight. So please excuse me if I come across a little too brash…
Yes, this is a bit of a “blowing of steam” post, but hopefully this can be helpful to those who have been misdirected or just need some assistance with VMQs on Hyper-V in scaled environments…
Ultimately, the main goal of this post is to help you succeed in your Hyper-V experience.
Target audience
Before I continue, let’s set the context of the post. This is for Hyper-V environments running at least 10GbE networks and using SUPPORTED hardware and drivers… What do I mean supported? Well… It’s on the Windows Server Catalog (WSC) and has “Virtual Machine Queue (VMQ)” listed as a certification. This is also targeted at those using legacy storage such as iSCSI or FC SANs (wow, I really do have a bee in my bonnet).
This post may also be helpful for those who are having random performance or stability issues in their new Hyper-V environment. This is often because VMQ is using core 0 and it is getting pinned to 100% starving hosts of being able to communicate with the cluster and or other critical low level operations. To be honest, those issues are often very hard to qualify and it’s just a matter of making some tweaks and seeing how things go.
Not for you…
If you’re running a single Hyper-V host with 1GbE NIC, then this post is not for you. If you are using HCI or your storage is SMB, then whilst this post would be helpful, there are considerations outside of what I will cover here.
Also, this post is not intended on teaching you what VMQ’s are or how they work (I may touch on it), this is a how to gather the information required and to set them correctly.
If you want more accurate info on what VMQ’s and other settings are, then these two blogs are your go-to’s:
Darryl van der Peijl
Altaro Hyper-V Dojo
Note: There are more, but these two blogs are from others that I know have extensive experience deploying and managing Hyper-V in scaled production environments. Not labs.
Ok, the disclaimers are out of the way and we’re all on the same page? Great 🙂
Sections of this post:
- Misconceptions of VMQs
- Common VMQ rules
- VMQ modes
- Teaming Modes
- VMQ Settings
1. Misconceptions of VMQs (and some of Hyper-V in general)
Firstly, let’s address a few common misconceptions:
- The out of the box settings are best – WRONG
- Both VMQ and RSS need to be configured on the same interface – WRONG
- Working out optimal VMQ is complicated – WRONG
- disable VMQ on Team interface – WRONG
- disable VMQ for any reason – WRONG
- NUMA = physical Proc – WRONG
- Dynamic load balancing algorithm is best – WRONG
- LACP gives more network perf to VMs – WRONG
- Switch Independent won’t work on my switches – WRONG
Note: I bet you’re a bit confused right now because you’ve read some blogs or support posts saying the opposite to much of the above. That’s fine, I’ll help you with that. They’re either outdated or wrong. 🙂 (no fence-sitting going on here…)
2. Common VMQ (and aligning with VMQ) rules
Some common rules for VMQ:
- Never use Core 0
- Do not overlap NUMA nodes
- Physical cores only (ignores HT)
- Max 64 physical CPUs
- In VMs with more than 1 vCPU, enable vRSS
When it comes to VMQ and RSS, think of it as RSS is a physical adapter setting but when you apply a virtual switch the setting changes to VMQ. Kinda correct kinda incorrect, but if you think about it that way, you’ll increase your likelihood of getting VMQ and RSS settings correct.
You mentioned core 0, what’s this all about?
Well, I won’t go in too deep here, but essentially some primary processing functions of the OS happen on core 0. And if we have some high network IO being processed this can essentially pin core 0 to 100% utilization and cause all sorts of performance and stability issues
In older environments with 1Gb networks, the amount of processing required to compute the network IO was minimal. But with 10GbE and higher networks, the amount of throughput utilized requires more processing. This is what VMQ helps with, distributing the compute requirements of network IO across multiple cores and removing this overhead from core 0.
Image: Incorrectly configured VMQ at work…
(Original source unknown. A client sent this to me but I am totally using it…)
3. Teaming Modes
Before we configure VMQ, we need to understand our teaming modes, and more importantly which is best?
To clarify what these are:
Switch Independent
This is the default recommendation for all Hyper-V virtual switches. Each interface is it’s own trunk and the physical switch sees these as separate independent interfaces.
This is your go-to config and will give a higher likelihood of success.
Summary: Switch Independent is the way to do Hyper-V switches properly
Switch Dependent
This is LACP or Static Teaming. Quite often I have customers who’s networking teams dictate this and try to mandate LACP as the methodology for fail-over. I STRONGLY recommend this be overruled and Switch Independent be the methodology used.
Whilst technically LACP works (I ran LACP in prod for a long time) too many time have I seen incorrect, misaligned or configs that just don’t work which give organisations a poor Hyper-V experience. Additionally, with VMQ we can get more networking performance out of our environments. Ok, networking people will argue this, but they thinking physical aggregation, not virtualization.
So which is best? I think you know my answer already 🙂
The long politically correct answer – it sometimes depends on the environment but ultimately Switch Independent is typically the optimal configuration.
Simple direct answer – Switch Independent is best.
4. VMQ modes
When configuring VMQ, there are two modes that we must be aware of: Min-queue mode and Sum-of-queue mode
Sum-of-queue mode:
All NICs in the team/switch must have separate processor assignments
Min-queue mode:
All NICs in the team/switch must have identical processor assignments
A good blog from Charbel Nemnom has a little bit more info on the two modes:
But which mode when? This common table should help with the overall selection…
Teaming mode | Hyper-V Port | Dynamic | Address Hash |
Switch Independent | Sum-of-Queues | Sum-of-Queues | Min-Queues |
Switch Dependent | Min-Queues | Min-Queues | Min-Queues |
If you’ve been reading properly though, you’ll note I stated above that you should always be using Hyper-V Port as your teaming algorithm. Following that rule, your table really should look like this:
Teaming mode | Hyper-V Port |
Switch Independent | Sum-of-Queues |
Switch Dependent | Min-Queues |
Much simpler right? 😀
5. VMQ Settings
This formula has seen successful implementations of Hyper-V for the last 6-7 years using varying manufacturers and configurations. The one thing that has been consistent, is the correct application of VMQs. Ok, I won’t be THAT jerk, this has taken many hours/days/weeks of trial and error and many late nights up with my environment or assisting customers to work out the best results… i.e. a lot of blood, sweat and tears
Firstly get your numbers right… (This example is of a dual proc Intel 5118 with Mellanox ConnectX-4 NICs)
Get-WmiObject -Class Win32_Processor | select Name,SocketDesignation,NumberOfCores,NumberOfLogicalProcessors
What we know so far: 2 Sockets. Each socket has 12 physical cores and 24 hyper-threaded cores.
How many NUMA nodes per socket?
(Get-VMHostNumaNode).count
We have 4 NUMA nodes in total, meaning 2 NUMA nodes per socket. Just to reiterate the point that NUMA is not equal to physical Proc i.e. we can have more than one NUMA node per Proc as shown here. Although the NUMA nodes is important to note when designing VM configs (i.e. don’t give more RAM or CPU than per NUMA etc), Hyper-V is NUMA aware so this is more of an FYI at this stage…
Taking into consideration that we ignore HT and avoid using core 0, we are left with 11 processors per socket we can use…
Now we have Sum-of-queue mode or Min-Queue mode.. I will give both examples below.
For Switch Independent configs
So if we use the magic Excel table that we see in many blogs (yes, I use this too…), we see based on my hardware the optimal settings become quite clear.
i.e. Switch Independent > Hyper-V Port > Sum-of-Queues mode
For WGE2-0-17, find the first ‘x’. This is out -BaseProcessorNumber. Now could the number of ‘x’ and this is your -MaxProcessors value.
Edit: Since Windows Server 2016, we no longer needed the maxprocesors setting as the hypervisor takes care of this for us. Technically the Core 0 is also handled automagically in 2019 as well but still proposes a risk if we have driver or firmware bugs… So I still strongly recommend to bypass core 0. Absolutely zero negative to doing this, whilst there is minimal risk to not doing this. Fairly simple decision for me 🙂
Windows Server 2016:
Set-NetAdapterVMQ -Name WGE2-0-17 -BaseProcessorNumber 2 -MaxProcessors 11 Set-NetAdapterVMQ -Name WGE1-0-17 -BaseProcessorNumber 26 -MaxProcessors 11
Windows Server 2019: **Refer to May 2022 update at top of this post**
Set-NetAdapterVMQ -Name WGE2-0-17 -BaseProcessorNumber 2 Set-NetAdapterVMQ -Name WGE1-0-17 -BaseProcessorNumber 26
Here we see those settings applied…
For Switch Dependent configs (or those stuck in the dark ages)
The same logic applies for building your config, the difference is that because we’re using Min-Queues mode (i.e. Switch Dependent > Hyper-V Port > Min-Queues mode) then we need to pin both NICs to the same processor set.
Windows Server 2016:
Set-NetAdapterVMQ -Name WGE2-0-17 -BaseProcessorNumber 2 -MaxProcessors 8 Set-NetAdapterVMQ -Name WGE1-0-17 -BaseProcessorNumber 2 -MaxProcessors 8
Windows Server 2019:
Set-NetAdapterVMQ -Name WGE2-0-17 -BaseProcessorNumber 2 Set-NetAdapterVMQ -Name WGE1-0-17 -BaseProcessorNumber 2
Note: for Switch Dependent adapters, we are limited to a maxprocessors value of 1, 2, 4, 8 & 16. (Just another reason why Switch Independent is better 🙂 )
Unfortunately I can’t show that setting applied in my environment as I’m not stuck in 2007 using LACP.
Much, much more to consider…
Some other variables to consider… If you are using LACP (you poor soul) you are limited to a Load-Balancing Failover Team (LBFO) for your config. If using Switch Independent, you can use LBFO or preferably use Switch Embedded Teaming (SET) for Windows Server 2016 and above.
If you are using Switch Independent, use SET unless there is a specific reason not to. If you think you have a reason not to use SET, then reassess as you’re likely just over-complicating your environment.
After configuring LBFO teams we are presented a new interface in the OS with interface description of ‘Microsoft Network Adapter Multiplexor Driver’. By default this is VMQ enabled and should remain this way. Do not modify the team interface VMQ settings though, leave them as default.
NUMA Distance is also relative when it comes to finding the absolute optimal config, but using the defaults will not affect stability, but will have a very minimal impact on performance. The rule here is to keep you NIC and NUMA distance at 0 to ensure the moist efficient operation. More on the in Darryl’s blog here… Go to the section on NUMA Node Assignment.
Some example configs
Here are some examples where we have 10 core procs with hyper-threading enabled
Below is a correctly configured LBFO switch in Switch Independent mode.
Below is a correctly configured LBFO switch in Switch Dependent (LACP) mode
Below is correctly configure SET switch adapters (i.e. only supports Switch Independent)
i.e. Don’t overthink this. Just optimize VMQ for the team members as shown in the above examples
Note: For HCI or SMB environments, we need to make some considerations for RSS on the physical adapters. If you stick to the same guidelines of VMQ (because VMQ and RSS are the same technology under the hood), you’re on the right track.
Also I’d really like to keep going and assist with other features and/or settings but the purpose of this post is to assist with VMQ. But I won’t go into all of that here.
Let’s wrap this up – If you’re still confused, have some feedback or simply disagree with some of the above then please reach out. Whilst I’ve been in many Hyper-V environments over the years, I haven’t been in them all, so there will be outliers that go against the rules I’ve stated above.
I’ll maybe add to this over time but hopefully this dispels some of the myths and misconceptions out there.
Happy Hyper-V’ing!
Dan
Hello,
Great post. You nailed it, simple and easy to follow.
I have one question and I can’t find answer.
On hyperV cluster, we have 4x 10gbps network adapters. As our network infrastructure is divided in two segments so we needed to create two SET switches (2x10gbps per teaming). One switch per network segment.
My question is about vmq pinning to cores:
We would follow same logic with two SET switches? So number of cores excluding core 0 would be divided with 4x NICs in my case, regardless that they are part of two different set switches.
Thanks and great post.
Hey, cheers for the feedback 🙂
Two switches does make it a little more difficult but essentially follow the same rules. Don’t overlap the cores and queues and you’ll be sweet.
i.e. cores 2-18 = SET #1 with 2-10 NIC1 & 12-18 NIC2 and cores 22-38 = SET #2 with 22-30 NIC3 & 32-38 NIC4
Hope this helps
Hi Dan, thanks for the great post. I have a quick question….I’m configuring VMQ on a team with two Intel 10Gbe NICs, and when I use the calculation for cores (2 Intel CPUs, 16 cores apiece, 32 HT Procs apiece) I come up with 15 usable processors per socket. However, Powershell won’t let me use the odd number 15, it instead makes it 16. Should I be bothered? Below is the text of the output I get:
PS C:\Windows\system32> Set-NetAdapterVmq -name NIC1 -BaseProcessorNumber 2 -MaxProcessors 15
Set-NetAdapterVmq : No matching keyword value found. The following are valid keyword values: 1, 2, 4, 8, 16
At line:1 char:1
+ Set-NetAdapterVmq -name NIC1 -BaseProcessorNumber 2 -MaxProcessors 15
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (MSFT_NetAdapter…363026B6498F}”):ROOT/StandardCi…rVmqSettingData) [Set-NetAdapterVmq], CimException
+ FullyQualifiedErrorId : Windows System Error 87,Set-NetAdapterVmq
Hey, is this SET switch or LBFO? Also is it 2016 or 2019? If memory serves, you can only use odd numbers for SET switches.
For 2019 though, don’t use the ‘-maxprocessors’ variable at all. Just set the baseprocessornumber and let dynamic VMQ sort the rest out.
Hello there,
You say that Switch Dependent adapters are limited to maxprocessor values of 1,2,4,8 and 16, but have you ever seen where Powershell also gives that same error for Switch Independent adapters?
PS C:\Windows\system32> Set-NetAdapterVMQ -Name “Slot 5 Port 2” -BaseProcessorNumber 36 -MaxProcessors 15
Set-NetAdapterVMQ : No matching keyword value found. The following are valid keyword values: 1,
2, 4, 8, 16
At line:1 char:1
+ Set-NetAdapterVMQ -Name “Slot 5 Port 2″ -BaseProcessorNumber 36 -MaxP …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (MSFT_NetAdapter…6BF8B8838B01}”):ROOT/StandardCi
…rVmqSettingData) [Set-NetAdapterVmq], CimException
+ FullyQualifiedErrorId : Windows System Error 87,Set-NetAdapterVmq
PS C:\Windows\system32> Get-NetLbfoTeam
Name : 10GB Team
Members : {SLOT 5 Port 1, SLOT 5 Port 2}
TeamNics : 10GB Team
TeamingMode : SwitchIndependent
LoadBalancingAlgorithm : Dynamic
Status : Up
If memory serves, you can only use odd numbers for SET switches.
If the OS is 2019 though, don’t use the ‘-maxprocessors’ variable at all. Just set the baseprocessornumber and let dynamic VMQ sort the rest out.
I need to update the post
Hi Dan,
Excellent article/post. Appreciated your efforts!
I’ve some problem with my newly configured Windows Server 2019 two node Hyper-V cluster.
I’m seeing system Event ID106 in every reboot. Source – “Hyper-V-VmSwitch” Available processor sets of the underlying physical NICs belonging to the LBFO team NIC /DEVICE/{798D75DE-3951-4F42-9162-65769F015FF2} (Friendly Name: Microsoft Network Adapter Multiplexor Driver #2) on switch C479DB71-5D0C-437C-B9BB-C6B4810C2D00 (Friendly Name: vSwitch) are not configured correctly. Reason: The processor sets are not identical when LBFO is configured with min-queue mode.
Googled for above error and found your blog !
I’ve 4*10Gig NICs bundled as one NIC teaming for VMs Traffic/virtual switch. It’s configured in LACP/Dynamic Mode. I’m planning to change this to Switch Independent/Hyper-V mode.
Each node runs with 2 CPU*24 Cores (48 LPs) and now my query is how many processors should i assign to each NIC. Please help!
Name SocketDesignation NumberOfCores NumberOfLogicalProcessors
—- —————– ————- ————————-
Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz CPU1 12 24
Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz CPU2 12 24
PS C:\Windows\system32> (Get-VMHostNumaNode).count
2
PS C:\Windows\system32> Get-NetLbfoTeam
Name : PROD
Members : {VM-T-S 1 P 1, VM-T-S 1 P 2, VM-T-S 8 P 1, VM-T-S 8 P 2}
TeamNics : PROD
TeamingMode : Lacp
LoadBalancingAlgorithm : Dynamic
LacpTimer : Fast
Status : Up
Hi Saravana, sorry for replying so late. I assume by now you’ve changed your vSwitches to use SET?
Hi Dan,
Reconfigured Teaming mode and algorithm from LACP/Dynamic to Switch Independent/Hyper-V but this time getting following error “Reason: The processor sets overlap when LBFO is configured with sum-queue mode”
Not sure, where i’m making mistake. Could you please help..
FYI – It’s 2 NUMA node, 2 socket*12 cores each and i,m trying to allocate proc for 4*10Gig Teamed NICs
PS C:\Windows\system32> (Get-VMHostNumaNode).count
2
Name SocketDesignation NumberOfCores NumberOfLogicalProcessors
—- —————– ————- ————————-
Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz CPU1 12 24
Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz CPU2 12 24
Hi Dan,
we have been discussing the same problems at Spiceworks for a long time.
https://community.spiceworks.com/topic/2225989-server-2019-network-performance
I have just made decisive progress regarding the wrong default values, see
https://community.spiceworks.com/topic/post/8923556
The fault with the wrong default settings lies with the NIC manufacturers and not Microsoft.
Best Regards from Germany
Alex
Hi Dan,
you write:
“If you’ve been reading properly though, you’ll note I stated above that you should always be using Hyper-V Port as your teaming algorithm”
where do you descripe why you would pick Hyper-v port and not dynamic and why ? 🙂
thanks
Hey, I didn’t really go into the why to use Hyper-V port. It’s faster. Dynamic is theoretically accumulated bandwidth outbound but comes with processing overheads. Size your NICs right for your workloads. If 10Gbps isn’t enough bandwidth to a single vNIC then you need bigger pNICs.
In a failover, Hyper-V port will likely lose a ping when failing over whereas in testing Dynamic seems to continue, but given that’s a failover scenario I’ll take the day to day performance improvements.
I’m going to go out on a limb and say I already know your answer, but does having multiple 10GB switches change your mind about using LACP with MCLAG vs SET?
Hey, No. For WS 2016 and above, always use SET. Never use LACP. Hard rule 🙂
Hi Daniel, great article. I have an issue with a Hyper-V/Citrix implementation that I think this solution can solve it, but I’m not sure how to apply the configuration because I have many network interfaces that came from multiple teaming group. I have done all the first steps (excel inclusive) but I’m lost with the interfaces.
This is a VDI implementation implemented by a third company. One symptom for example, is when we restart via citrix a virtual desktop VM, if the netscaller are in same Hyper-V host of the VM, it lost connections during 20-30 seg with all the services (exchanges, web servers, VDI). The vdi service is very sensible so the clients lost their client desktop during this lost conection. I think it could be a momentaneously MAC address confusion. I had some 106 error in system logs, but this don’t match with all the failures.
When I do run the Get-NetAdapterVmq command the result is:
LiveMigration1 QLogic BCM57810 10 Gigabit E…#7 False 0:0 16 29
iSCSI2 QLogic BCM57810 10 Gigabit E…#6 False 0:0 16 14
VLAN810_SrvAcademicos Microsoft Network Adapter Mu…#7 False 0:0 0
Management Microsoft Network Adapter Mult… False 0:0 0
VLAN140_Estudantes Microsoft Network Adapter Mu…#5 False 0:0 0
VLAN811_SrvAdministrativos Microsoft Network Adapter Mu…#8 False 0:0 0
VLAN828_Infraestruturas Microsoft Network Adapter Mu…#9 False 0:0 0
Management0 QLogic BCM57810 10 Gigabit E…#5 True 0:0 16 29
LiveMigration Microsoft Network Adapter Mu…#3 False 0:0 0
HB-VMAccess0 QLogic BCM57810 10 Gigabit E…#4 True 0:0 16 29
HB-VMAccess1 QLogic BCM57810 10 Gigabit E…#9 True 0:0 16 29
HeartBeat Microsoft Network Adapter Mu…#4 False 0:0 0
iSCSI1 QLogic BCM57810 10 Gigabit …#10 False 0:0 16 14
VLAN902_Infraestruturas Microsoft Network Adapter M…#10 False 0:0 0
Management1 QLogic BCM57810 10 Gigabit E…#8 True 0:0 16 29
VLAN808_DSI Microsoft Network Adapter Mu…#6 False 0:0 0
LiveMigration0 QLogic BCM57810 10 Gigabit E…#3 False 0:0 16 29
VMAccess Microsoft Network Adapter Mu…#2 True 0:0 58
Thanks for your help,
Nelson Matias
Wow, why so many NICs? The cabling must be tedious.
If I am reading this correctly (excluding the 2x NICs for iSCSI), you have 2x NICs for LM, 2x NICs for Mgmt, and 2x NICs for VM traffic? And assuming because of the naming they are all LBFO with VLAN interfaces?
I’d seriously consider changing this to a single SET switch with all 6x pNICs and create host vNICs for your requirements. Or if you absolutely must have physical separation 2x NICs in SET for MGMT/Host and everything else in another SET switch.
For WS 2019 use default VMQ settings. Don’t overcomplicate it.
Hi Daniel
This is a great article.
Can you please explain, why “you should always be using Hyper-V Port as your teaming algorithm”?
We use SCVMM and “Switch Embedded Team”, also one network card with two NIC’s 2x 10Gbps.
But we are not sure about LBA “Dynamic vs Hyper-V Port”.
We just know that dynamic has double the bandwith, so why use Hyper-v port instead. You didn’t explain that in the article.
Thx and best regards
Hi, Dynamic has overheads in processing and it is not entirely accurate to claim that Dynamic is double the bandwidth. Dynamic can leverage more pNICS outbound but inbound can only be as fast as a single physical adapter. Hyper-V port removes the overheads and will be as fast as a single pNIC. Microsoft now recommend Hyper-V Port as default in virtual environment because of the overheads and a few other reasons.
If 10Gbps throughput is not fast enough for your individual workloads then I’d suggest the NIC sizing needs a revisit.
Cheers
Hello Dan,
Thank you so much for this in-depth article. I have one question, If we have LACP on the switch side, can we still use switch independent mode in Hyper-V. Please advise.
Thanks,
Kaushik
Hi Kaushik
No, LACP is unsupported from the virtual switch. Configure each switch port as an individual trunk. The virtual switch software manages all local balancing and redundancy.
Cheers