WORK IN PROGRESS

Editors Note: this is still a working document as my priorities have to be on other work right now, but in the interest of sharing I have made this available now… If you find any issues or errors let me know – thanks for reading!

 

There are many S2D build blogs out there and I don’t want to just add to the list but given I’m doing this build with SCVMM and SCOM integration I thought I’d run through the additional steps.

Ok, I’ll confess now that I’ll add my specific S2D build process, but this is more for completeness. Much of the PS is gathered from other sources, primarily Microsoft Docs – which are improving by the day!!

First, the BOM can be found here

I won’t go too deep into the decision making process on the BOM but at a high level our requirements were for a disaggregated storage solution using Scale-Out File Server and we wanted to have dedicated front end (SOFS) and back end (S2D) networks

Note: I do make references to HCI, but be aware S2D in HCI and SOFS at the same time are not supported.

Overview of the management environment:

  • SC Virtual Machine Manager 1801
  • SC Operations Manager 1801
  • Windows Admin Center 1804.25

SCVMM is our primary DC management solution. It works brilliantly and will still be our day to day. We also have a recent HA deployment of Windows Admin Center which will be used to support the management of the S2D environment. Maybe SCVMM will be superseded by WAC one day, but for the functionality we use this day is still very far away..

A few caveats for the below code. For those that have worked with me or read my blogs, you’ll know that I am almost the furthest thing from a developer however I still try to script everything.. Sometimes taking an hour to script something that would take 5 minutes manually, but it’s all about growth.

The build process

We received the servers assembled and latest firmware applied. We racked and cabled them and ran through the OS install via iLO.

After much debate, we (because I was far too stubborn) chose to deploy Server Core.

At this point in time, the tasks that have been completed are: Installed OS, used sconfig to apply the latest updates (2018-05), named and IP’d the nodes then joined the domain.

This is where PowerShell took over…

Note: A lot of this is not my code or sourced from MS docs. Where possible I will reference the original authors and give them credit.

Ok, let’s rock and roll!

Set the variables:

$nodes = @(
    "pfshn01.domain.corp",
    "pfshn02.domain.corp",
    "pfshn03.domain.corp",
    "pfshn04.domain.corp"
)

$storA = @{"VLANID"="1001";"IP"="192.168.191.1"}
$storB = @{"VLANID"="1002";"IP"="192.168.192.1"}
$clustername = "pfshc01"
$clusterIP = "10.0.10.9/16"
$sofsname = "psofs01"
$CSVCacheSize = 4096 #Size in MB

I’m a bit of a control freak and like to do things a certain way, so I set the scene for each node by disabling, removing and enabling the services and features first… then reboot.

ForEach ($node in $nodes){
    Invoke-Command -ComputerName $node -ScriptBlock {
    Write-Host -Foreground CYAN "Working on host $($env:computername)"
    get-service -Name *spool* | Stop-Service
    get-service -Name *spool* | Set-Service -StartupType Disabled
    Disable-WindowsOptionalFeature -Online -FeatureName smb1protocol -NoRestart
    netsh advfirewall set allprofiles state off
    Install-WindowsFeature -Name "Data-Center-Bridging","Failover-Clustering","Hyper-V","RSAT-Clustering-PowerShell","Hyper-V-PowerShell","FS-FileServer"
    Restart-Computer -Force
    }
}

This might not suit some, but I build the cluster early on and add to SCVMM

Always test your cluster!!

# Test the new cluster
Test-Cluster –Node $nodes –Include "Storage Spaces Direct", "Inventory", "Network", "System Configuration"

clip_image001

clip_image002

My report identified that my servers were in different OU’s – ooops! So I moved them to the target OU and continued.

Because we’re deploying storage spaces direct, there is a process to clean the disks. This is typically not required for new clusters but in the spirit of completeness, I run it anyway..

# Clear the disks - not required for new build but ran for completeness

ForEach ($node in $nodes){
    Invoke-Command -ComputerName $node -ScriptBlock {
    {
        Update-StorageProviderCache
        Get-StoragePool | ? IsPrimordial -eq $false | Set-StoragePool -IsReadOnly:$false -ErrorAction SilentlyContinue
        Get-StoragePool | ? IsPrimordial -eq $false | Get-VirtualDisk | Remove-VirtualDisk -Confirm:$false -ErrorAction SilentlyContinue
        Get-StoragePool | ? IsPrimordial -eq $false | Remove-StoragePool -Confirm:$false -ErrorAction SilentlyContinue
        Get-PhysicalDisk | Reset-PhysicalDisk -ErrorAction SilentlyContinue
        Get-Disk | ? Number -ne $null | ? IsBoot -ne $true | ? IsSystem -ne $true | ? PartitionStyle -ne RAW | % {
            $_ | Set-Disk -isoffline:$false
            $_ | Set-Disk -isreadonly:$false
            $_ | Clear-Disk -RemoveData -RemoveOEM -Confirm:$false
            $_ | Set-Disk -isreadonly:$true
            $_ | Set-Disk -isoffline:$true
        }
        Get-Disk | Where Number -Ne $Null | Where IsBoot -Ne $True | Where IsSystem -Ne $True | Where PartitionStyle -Eq RAW | Group -NoElement -Property FriendlyName
        } | Sort -Property PsComputerName, Count
    }
}

Ok, let’s create the cluster.

# Create the new cluster
New-Cluster –Name $clustername –Node $nodes –NoStorage -StaticAddress $clusterIP

clip_image003

Be sure to review the report… I was happy with this.

clip_image004

First enabled S2D and add the SOFS role

# Enable S2D
Enable-ClusterStorageSpacesDirect –CimSession $clustername

# Enable Scale-Out File Server
Add-ClusterScaleOutFileServerRole -Name $sofsname -Cluster $clustername

At this point I added the cluster to SCVMM.

Normal rules apply here. Have a Run As account that is local admin on the cluster. I have an account called Fabric Admin that gets added to the local admins via GPO. I suggest you do the same.

I guess you could PowerShell for this process but my preference is to do this part manually. It’s just me 🙂

clip_image005

Review the jobs to ensure all is good. Make sure you let the Refresh host cluster job finish before continuing. This can take a few minutes.

clip_image006

While it is refreshing, you should see the storage provider, array and scale out file server appear in SCVMM storage.

clip_image007

clip_image008

clip_image009

clip_image010

At this point I created the Logical Switch on each node.

We had the Logical Switch already created in SCVMM with the desired settings. In this case it is the Switch Embedded Team that is the main component.

clip_image011

I’m using the existing Uplink profile which has the required network definitions (VLANS & Subnets) already defined.

clip_image012

That’s it for the GUI, let’s get back to PowerShell

First I run the below to remove the management VLAN tag from the physical adapter.

# remove the inherited VLAN tag
ForEach ($node in $nodes){
    Invoke-Command -ComputerName $node -ScriptBlock {
        get-netadapter | Set-NetAdapter -VlanID $null
    }
}

Next is the bulk of the work. This is an ever changing script I keep handy to build my S2D host fabrics.

This script is particular to my build so review carefully before using on your environment.

Is summary it does various things such as:

  1. Disables the 1GbE adapters
  2. Creates a SET switch for the backend network
  3. Creates the SMB virtual adapters and IP’s them
  4. Disabled DNS registration on the SMB vNICs
  5. Creates the QOS policy and traffic class
  6. Configures Flow Control
  7. Enables RDMA
  8. Enables Jumbo
  9. Sets the VMQ’s (Just in case we do end up with VM’s on there)
  10. Sets Live Migration to SMB
  11. Bounces the hosts
ForEach ($node in $nodes){
    Invoke-Command -ComputerName $node -ScriptBlock {
        Write-Host -Foreground CYAN "Working on host $($env:computername)"
        $SETSwitch = "vSwitch-SET"
        $storA = @{"VLANID"="1101";"IP"="192.168.101.1"}
        $storB = @{"VLANID"="1102";"IP"="192.168.102.1"}

        # Confirm NIC driver version
        get-netadapter | where {$_.Name -like "HPE*640*"} | select Name,InterfaceDescription,DriverVersionString
        
        #Disabled the 1GbE NICs
        get-netadapter | where {$_.InterfaceDescription -like "*Ethernet 1Gb*"} | Disable-NetAdapter

        # Skip Backend Config if exists - do not modify
        IF($BESwitch = Get-VMSwitch -Name $SETSwitch -ErrorAction SilentlyContinue){
            Write-host -fore cyan "$($SETSwitch) exists..."
        }
        ELSE
        {
            Write-host -fore Cyan "Storage Net A = $(($StorA.IP + $env:computername.Substring(14))) "
            Write-host -fore Cyan "Storage Net B = $(($StorB.IP + $env:computername.Substring(14))) "
            $FENICS = (Get-VMSwitchTeam | select NetadapterInterfaceDescription).NetAdapterInterfaceDescription
            $BENICS = get-netadapter | where {$_.Name -like "HPE*640*" -and $_.InterfaceDescription -notlike $FENICS[1] -and $_.Name -like "HPE*640*" -and $_.InterfaceDescription -notlike $FENICS[0]}
            $BESwitch = New-VMSwitch -Name $SETSwitch -AllowManagementOS 0 -NetAdapterName $($BENICS[0].name),$($BENICS[1].name) -MinimumBandwidthMode Weight -Verbose
            $vNICA = Add-VMNetworkAdapter -ManagementOS -Name 'vNIC-Storage-A' -SwitchName $SETSwitch -Passthru | Set-VMNetworkAdapterVlan -Access -VlanId $storA.VLANID -Verbose
            $vNICB = Add-VMNetworkAdapter -ManagementOS -Name 'vNIC-Storage-B' -SwitchName $SETSwitch -Passthru | Set-VMNetworkAdapterVlan -Access -VlanId $storB.VLANID -Verbose

            New-NetIPAddress -InterfaceAlias 'vEthernet (vNIC-Storage-A)' -IPAddress ($StorA.IP + $env:computername.Substring(14)) -PrefixLength 24 -AddressFamily IPv4 -Verbose
            New-NetIPAddress -InterfaceAlias 'vEthernet (vNIC-Storage-B)' -IPAddress ($StorB.IP + $env:computername.Substring(14)) -PrefixLength 24 -AddressFamily IPv4 -Verbose

            Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName 'vNIC-Storage-A' -ManagementOS -PhysicalNetAdapterName $BENICS[0].name -Verbose
            Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName 'vNIC-Storage-B' -ManagementOS -PhysicalNetAdapterName $BENICS[1].name -Verbose

            $SMB = Get-NetAdapter | where {$_.name -like "*storage*"}
            $SMB | Set-DnsClient -RegisterThisConnectionsAddress $False

            $pNICS = Get-NetAdapter -Physical
            $pNICS | Enable-NetAdapterRdma -Verbose

            #Enable RDMA vNICs
            $SMB | Enable-NetAdapterRdma -Verbose
            $SMB | Restart-NetAdapter -Verbose
        }

        New-NetQosPolicy -Name 'SMB' -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3 -Verbose
        New-NetQosTrafficClass -Name 'SMB' -Priority 3 -BandwidthPercentage 50 -Algorithm ETS -Verbose

        Enable-NetQosFlowControl -Priority 3 -Verbose
        Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7 -Verbose

        Enable-NetAdapterQos -InterfaceAlias "*"

        # RDMA front end NIC (for SOFS)
        $MGMT = Get-NetAdapter | where {$_.Name -like "*SDDC*"}
        $MGMT | Enable-NetAdapterRdma -Verbose

        # Enable Jumbo Frames on all 10/25GbE adapters
        $MGMT | Get-NetAdapterAdvancedProperty -RegistryKeyword "*JumboPacket" | Set-NetAdapterAdvancedProperty -RegistryValue 9014
        $pNICS | Get-NetAdapterAdvancedProperty -RegistryKeyword "*JumboPacket" | Set-NetAdapterAdvancedProperty -RegistryValue 9014

        NetAdapterAdvancedProperty -Name "*" -DisplayName "Encapsulated Overhead" -DisplayValue "160"

        Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Services\spaceport\Parameters -Name HwTimeout -Value 0x00002710 -Verbose

        netsh int ipv6 isatap set state disabled

        # Configure the base and maximum processors to use for VMQ queues
        Set-NetAdapterVmq -InterfaceDescription $FENICS[0] -BaseProcessorNumber 2 -MaxProcessors 11
        Set-NetAdapterVmq -InterfaceDescription $FENICS[1] -BaseProcessorNumber 22 -MaxProcessors 12

        Set-VMHost -VirtualMachineMigrationPerformanceOption SMB -Verbose
        Restart-Computer -Force
    }
}

Once the cluster comes back up, we start the storage build.

If you’ve deployed at least 4 nodes in your cluster, the desired resliiency should already be set correctly, however I just give it a quick confirmation

# Confirm Fault Tolerance
Get-StoragePool –FriendlyName S2D* | FL FriendlyName, Size,FaultDomainAwarenessDefault

clip_image013

# Confirm Storage Tiers
Get-StorageTier | Select FriendlyName, ResiliencySettingName, PhysicalDiskRedundancy

clip_image014

Make sure the physical disk redundancy is set to 2.

Next I disable placement on the OS drive and the NICS in SCVMM to ensure compliance

Important Note: Using a S2D HCI cluster and the SOFS roles are not supported together. i.e. when chossing S2D, you should either use as HCI only, or dis-aggregated SOFS only. Some of the below is for demo purposes only.

# Disable placement on non S2D storage

ForEach($node in $nodes){
    $guid = New-Guid
    $vmHost = Get-SCVMHost -ComputerName $node
    $vmHostVolume = Get-SCStorageVolume -Name "C:\" -VMHost $vmHost
    Set-SCStorageVolume -StorageVolume $vmHostVolume -AvailableForPlacement $false

    # Get Host Network Adapter 'HPE Ethernet 1Gb 4-port 331i Adapter'
    $vmHostNetworkAdapters = Get-SCVMHostNetworkAdapter | where {$_.VMHost -eq $node -and $_.Name -like "*Ethernet 1Gb*"}

    ForEach($vmHostNetworkAdapter in $vmHostNetworkAdapters){
        Set-SCVMHostNetworkAdapter -VMHostNetworkAdapter $vmHostNetworkAdapter -Description "" -AvailableForPlacement $false -UsedForManagement $false -JobGroup $guid
    }

    # Get Host Network Adapters in Logical Switch
    $vmHostFrontEndAdapters = Get-SCVMHostNetworkAdapter | where {$_.VMHost -eq $node -and $_.Name -like "*SFP*" -and [string]$_.VirtualNetwork -eq "LSwitch-SDDC"}
    ForEach($vmHostFrontendAdapter in $vmHostFrontendAdapters){
        Set-SCVMHostNetworkAdapter -VMHostNetworkAdapter $vmHostFrontendAdapter -Description "" -AvailableForPlacement $false -UsedForManagement $false -JobGroup $guid
    }

    # Get Host Network Adapters in Backend SET Switch
    $vmHostBackendAdapters = Get-SCVMHostNetworkAdapter | where {$_.VMHost -eq $node -and $_.Name -like "*SFP*" -and [string]$_.VirtualNetwork -eq "vSwitch-SET"}
    ForEach($vmHostBackendAdapter in $vmHostBackendAdapters){
        Set-SCVMHostNetworkAdapter -VMHostNetworkAdapter $vmHostBackendAdapter -Description "" -AvailableForPlacement $false -UsedForManagement $false -JobGroup $guid
    }
    Set-SCVMHost -VMHost $vmHost -JobGroup $guid -RunAsynchronously
}

At this point I do a visual check to ensure all the networks are happy.

Pro tip – This is something you should always check in SCVMM after making any host changes

clip_image015

If you want to do some stress testing of your HCI using VMFleet, this is the time to do it. (In the above image, the ‘internal’ switch is left over from VMFleet. This is to be removed for production use.)

A good overview on how to use VMFleet can be found here by Roman Levchenko

This is also the step in the process where I start testing resiliency by ripping out network cables etc.

Whilst VMFleet was running the below was shown in Windows Admin Center – hopefully I’ll share more in this later.

clip_image016

Things to check before moving on:

Test-ClusterHealth.ps1 (found in VM Fleet)

Test-RDMA.ps1 (found in the SDN git here)

Once I was all happy, I destroy VMFleet using the script. BEWARE! This will indiscriminately delete all VMs on the cluster…. You’ve been warned.

So once you’ve run the destroy script, there will still be remnants of VMFleet so we need to clean up the virtual disks and remove the internal switches

Invoke-Command -ComputerName $nodes[0] -ScriptBlock {
    Write-Host -Foreground CYAN "Working on host $($env:computername)"
    Get-VirtualDisk | Remove-VirtualDisk -Confirm:$false
}

foreach($node in $nodes){
    Invoke-Command -ComputerName $node -ScriptBlock {
        Write-Host -Foreground CYAN "Working on host $($env:computername)"
        get-vmswitch -SwitchType Internal | Remove-VMSwitch -Force
    }
}

Tip: at this point I would reboot all of the nodes one at a time. If you don’t you will likely get an alert pertaining to the adapter for VMFleet missing.

Create some volumes and shares

As with most things Microsoft, there are many ways you can achieve this. SCVMM UI & PowerShell, Failover Cluster Manager UI & Shell, or directly from one of the cluster nodes..

Typically, if you’ve ever been within earshot of one of my SCVMM rants, then you’re probably familiar with the principal of only use SCVMM to perform tasks. i.e. Don’t use FoCM or HVM as this could lead to SCVMM being out of sync. And in some very rare scenarios, you can end up with locked, or even worse, deleted resources…

But to be completely hypocritical, I am going to do this by PowerShell via PS-Remote on a node 🙂

My reason is simple, I want to use the same share name that exists on another SOFS cluster but SCVMM doesn’t like that… But I want my way so this is how I am doing it #stubborn

For optimal performance, you should have the same amount of volumes per node. In our deployment, we will be starting with a single 10TB volume per node. (You can use WAC to resize a CSV online)

S2D is smart enough to auto balance volumes, so as long as this is the first 4 in creation, then a volume will land on each node.

I am creating each volume as a 3-way mirror.

Enter-PSSession -computername $nodes[0]

# Create volumes and shares
(1..4) |% {
    $volume = "Volume$_"
    New-Volume -StoragePoolFriendlyName S2D* -FriendlyName $volume -FileSystem CSVFS_ReFS -ResiliencySettingName Mirror -Size 10TB
    $share = "Disk0$_"
    md C:\ClusterStorage\$volume\Shares\$share
    New-SmbShare -Name $share -Path C:\ClusterStorage\$volume\Shares\$share
}

Rescan the storage provider – note, this takes a while. In my case about 8 minutes.

Now let’s fix the shares and bring them under management of SCVMM

image

Open the properties of the share.

Tick File share managed by virtual machine manager and select the classification you want to use.

image

 

Let’s prep for production…

SCOM Agent

Deploy the SCOM agent to each node.

Make sure the S2D management pack is installed (available here) – always read the MP notes prior to installing!!!

clip_image017

clip_image018

clip_image019

DPM Agent

Make sure you deploy DPM agent – I use Run Script from SCVMM for this.

Add to Windows Admin Center – if will link post when done

At this point I start ripping out cables and doing my benchmarking prior to loading up the system.

Congrats, you’re ready to start using your S2D SANKiller in your production environment.

Enjoy!
Dan

Leave A Comment