Part 8: Server 2016 Software Defined Networking Overview

Previous Post in Series: Part 7: Expose Shielded VMs to Windows Azure Pack Portal

Welcome back folks, in part 8 of my Server 2016 Features Series I’ll be providing an overview of SDN v2. This was originally going to be a single post but as I started working on the overview, I realised it would be a bit of a beast by the end.  It’s for that reason I’ve decided to split SDN into 3 post, which are:

  • SDN v2 Overview
  • Network Controller Service deployment
  • Software Load Balancer Service deployment

The deployment and configuration pieces will be covered in parts 9 and 10.  As such, I assume that at the very least you’ve deployed a 3 node Hyper-V cluster on Server 2016 and it’s managed by the latest version of SCVMM.

SDN Roles

Before we get into it the deployment, here’s a little detail on each role and it’s place within your environment. Alternatively, you’ll find a much more in-depth read HERE (with links to the other roles):

Network Controller:

The Network Controller is the brains on your SDN environment, it’s a highly available Server 2016 role that “provides a centralized, programmable point of automation to manage, configure, monitor and troubleshoot virtual and physical network infrastructure in your datacentre.” Our primary interaction with this role will be via the SCVMM console which we’ll be using for the following (as an example):

  • Manage our Hyper-V host virtual switches
  • Manage our Software Load Balancers
  • Manage our RAS Gateway
  • Deploy/manage our fabric and tenant SDN VM Networks
  • Deploy/manage our fabric and tenant SDN Static IP Pools
  • Configure/manage the datacentre firewall for both East/West and North/South network traffic
  • Store and distribute virtual network policies

The Network Controller communicates with network devices, services and components using its southbound API, the northbound API is used to allow administrators to configure, monitor, troubleshoot and deploy services, via, PowerShell or SCVMM for example. The Northbound API is implemented as a REST interface.

SDN on Server 2012 used Network Virtualisation Generic Routing (NVGRE is an extension of GRE) as its only encapsulation method, in Server 2016 it uses both GRE and Virtual Extensible Local Area Network (VXLAN), with VXLAN being the default. GRE is one of the options available for tunnelling out of the RAS Gateway (a little more on that later).

Being that we’re deploying our Network Controller role as highly available, it’s worth noting that this isn’t done by means of failover clustering and instead uses Service Fabric.

The Azure Service Fabric platform “provides functionality that is required for building a scalable distributed system”. Here are some of the features Service Fabric provides across multiple operating system instances, more information HERE:

  • Synchronising state information between instances
  • Electing a leader node
  • Failure detection
  • Load Balancing of workloads

The Network Controller can be deployed as a single node for testing scenarios but for the purposes of this guide, we’ll be deploying the 3 node production option by means of an SCVMM Service template. Our Network Controller deployment will therefore be a service fabric application made up of several stateful services, spread across one primary node and two secondary nodes. It’s also worth noting that the primary node (replica) services all requests (for the services that are primary to it) with the two secondary nodes existing for high availability purposes only. So workload load balancing is achieved because not all primary service replicas will exist on the same node…does that make sense? No? OK, hopefully it’s just down to my poor explanation, so here’s a diagram to try and help clear things up.

NOTE: Our deployment will only have 3 Network Controller nodes across 3 Hyper-V hosts, but the theory is the same nonetheless:

clip_image001

OK, that’s enough about the Azure Service Fabric…well almost. Why use it over traditional failover clustering, at least for our Network Controllers? Here are some of the benefits:

  • Very fast failover due to multiple hot secondary service replicas being available at all times.
  • Easy to scale up and down as required.
  • Persistent storage by means of a Key-Value store that is replicated and transactional.
  • Service modularity. Each service can be developed and updated without affecting the others.

That should do it for the Network Controller for the time being, other than to say that as the brains of the operation, it’s the first component that we’ll be deploying on our hosts

Software Load Balancer:

As mentioned above, our Network Controllers are the brains but we’ll still need a Software Load Balancer as it provides the following capabilities:

  • Layer 4 load balancing services for North-South and East-West TCP/UDP traffic
  • Public and Internal network traffic load balancing
  • Supports dynamic IP addresses (DIPs) on VLANs and on virtual networks using Hyper-V Network Virtualisation
  • Health probe support
  • Quick and easy scale out ability

So among other things, this means that not only is SLB responsible for load balancing our inbound tenant traffic, it’s also responsible for providing the inbound and outbound NAT for our tenant SDN VM networks.

So how exactly does the SLB role work then? It works by mapping virtual IP addresses (VIPs) to dynamic IP addresses (DIPs), but what the hell are VIPs and DIPs?

Firstly, where SLB is concerned, there is a public VIP and a private VIP, NAT’d together. These are single IP addresses that sit at the front of your VM network and provide public access to a group of VMs attached to said VM network. These VIPs are held within the SLB Multiplexer and are advertised out to the physical network as a /32 route using Border Gateway Protocol (BGP).

DIPs are the IP addresses assigned to member VMs within a SDN VM Network.

So with all that in mind, here’s a quick diagram of a packets travels from the internet to a VM on a SDN VM Network.

clip_image002

Hopefully that gives you an idea of why we need the SLB service

Moving on…

RAS Gateway:

The RAS Gateway is a role you may or may not need to deploy, depending on the requirements of your SDN scenario. This guide will be skipping the deployment of the RAS service as I’ve not had the chance to get much hands-on experience with it and I didn’t want to delay the guide until that changes. I will however add a separate guide on it when I get time over the next couple of months…or so.

As with the Network Controller and Software Load Balancer services, RAS can be deployed as a single node or as a high availability pool. Here’s a list of the main features it provides:

Site-to-Site VPN

I won’t go into too much detail here as I’m sure you all know what a site-to-site VPN is It’s the ability to connect two networks in two difference physical locations across the internet over a Virtual Private Network…there, done.

Dynamic Routing With BGP

BGP is a dynamic protocol, when tenants or organisations have multiple sites that are connected using BPG enabled routers (or RAS in this case), BGP allows routes to be automatically calculated. This also has the added benefit of cutting down on the need for manual route configuration by admins.

Layer 3 Forwarding

The RAS gateway can also act as a router that connects virtual networks with physical VLANs

Network Topology

During my testing, this is the area that caused me the biggest headache, mainly because no two networks are the same and I’m not a networking engineer, although I’m closer now than ever because of this

For the purposes of this guide, I’ll run you through the sample network topology that I used in my lab testing. This will include:

  • Any prerequisites pertaining to the network
  • Physical network configuration
  • The networks you’ll need and what purpose they serve
  • Various diagrams by way of explanation

Prerequisites

If you’re going with a fully converged network model (all data logically split but sharing the same physical NIC(s)), there are a few things to be aware of:

Let’s talk about Data Center Bridging (DCB). If you’re using Remote Direct Memory Access (RDMA) you’ll need Top of Rack switches that are DCB capable. There are two types of DCB we’re interested in, these are as follows:

  • Enhanced Transmission Selection (ETS) for RDMA (Chelsio T5 NICs for example)
  • Priority-based Flow Control (PFC) for RDMA over Converged Ethernet (RoCE) based RDMA (Mellanox X-3 NICs for example)

NOTE: For the sake of completeness, the Chelsio T5 adapters use internet Wide Area RDMA Protocol (iWARP) based RDMA

The following link is a good read if you want to learn more about RDMA, DCB and SET Teaming, as I had separate RDMA NICs for my SMB storage it meant this part of the process was a little more straightforward for me

NOTE: At present only IPv4 addressing is supported for Server 2016 Software Defined Networking

Sample Network Topology

As I mentioned above, I’m going to run through the network topology I used during my lab deployment, all VLAN IDs, prefixes, gateways etc. will need to be configured for your own environment:

Network NameSubnetPrefixVLAN IDGatewayTerminates on
Management172.10.10.0/2410*1172.10.10.1Firewall
Provider Address172.10.20.0/2420172.10.20.1ToR Switches
Transit172.10.30.0/2430172.10.30.1ToR Switches
Public VIP50.50.50.0/24NA50.50.50.1See note below
Private VIP172.10.40.1/24NA172.10.40.1See note below
GRE VIP172.10.50.1/24NA172.10.50.1See note below
SMB 1172.10.60.1/2460172.10.60.1ToR Switches
SMB 2172.10.70.1/2470172.10.70.1ToR Switches

*1 – Set native VLAN on TOR ports, set as 0 within OS/SCVMM logical networking.

NOTE:  The above VIP networks have NA against their VLAN ID because those subnets are reached via BGP advertised routes to the SLB/MUXes and gateways, there are not configured on any interface on the physical switches.

Management Network

The following servers will have at least one IP on this network, assigned by DHCP or statically (can make use of SCVMM static IP pool for everything but the hosts):

  • Hyper-V hosts
  • Scale-Out file server nodes
  • Network Controllers – requires an additional IP reserved as the REST IP address
  • Software Load Balancers
  • RAS gateways

This will be the default gateway network for all of the above, possibly with the exception of the RAS gateways, as mentioned above, I’ve yet to get time configuring that role so will defer committing till I have

It will also be the network used to reach your configured DNS servers.

Provider Address Network

The following servers will have at least one IP on this network, which are assigned by the system (using SCVMM static pool):

  • Hyper-V hosts – 2 IPs assigned per host by the Network Controller
  • Software Load Balancers

This network carries the encapsulated virtual network traffic.

Transit Network

The following servers will be statically assigned an IP on this network (using SCVMM static pools)

  • Software Load Balancers
  • RAS Gateways

This network is used for exchanging BGP peering information and external/internal tenant traffic (North/South). The Hyper-V hosts that run SLB or RAS VMs need connectivity to this subnet.

Public VIP Network

The following servers will be statically assigned an IP on this network

  • RAS Gateway – Statically assigned at the point of deployment and is used as the Site-to-Site endpoint

This network will contain the public IP addresses that will internet routable. These will be the front-end IPs used by external clients to connect to tenant workloads running on SDN virtual networks.

Private VIP Network

The following role will be statically assigned an IP on this network

  • Software Load Balancer VIP – Statically assigned at the point of deployment

This network does not need to be routable outside of the cloud as it is used for VIPs that are only accessed from internal cloud clients, such as the SLB manager or private services.

GRE VIP Network

The GRE VIP network is a subnet that exists solely for defining VIPs that are assigned to gateway virtual machines running on your SDN fabric for a S2S GRE connection type. This network does not need to be pre-configured in your physical switches or router and need not have a VLAN assigned.

SMB 1 Network

The following servers/roles will be assigned an IP on this network

  • Hyper-V Hosts
  • Scale-Out File Server Hosts

If this network is RDMA capable, it should be tagged with a VLAN as this is a requirement of most physical switches to allow them to apply Quality of Service (QoS) settings correctly.

SMB 2 Network

The following servers/roles will be assigned an IP on this network

  • Hyper-V Hosts
  • Scale-Out File Server Hosts

If this network is RDMA capable, it should be tagged with a VLAN as this is a requirement of most physical switches to allow them to apply Quality of Service (QoS) settings correctly.

Routing Infrastructure

Now that we know what each of the networks are and what they do, let’s have a look at what’s required in your routing infrastructure.

NOTE:  This guide will be using SCVMM to deploy our SDN environment, however if you’re deploying it using scripts you Management, Provider Address, Transit and VIP subnets need to be routable to each other…in our case, they do not

The VIP subnets we covered above do not have a VLAN assigned and are not pre-configured in the Top of Rack switches. The next hop for these subnets is advertised by the SLB/MUX and RAS Gateways to the Top of Rack switches using internal BGP peering.

You will need to create a BGP peering on your Top of Rack switches that will be used to receive routes for our VIP logical networks advertised by our SLB/MUXes and RAS gateways.

NOTE:  This peering should only be one way from the SLB/MUXes and RAS gateways to the Top of Rack switches.

In our example, these route will then be distributed to the rest or our core network using Open Shortest Path First (OSPF)

NOTE:  The IP subnet prefix for the VIP logical networks do not need to be routable from the physical network to the external BGP peer

You will need your networking team to provide you with the following pieces of information before configuring your SDN infrastructure:

  • Router ASN
  • Router IP Address(es)
  • ASN for use by SDN (this can be any number from the private ASN range)

NOTE: Four byte ASNs are not supported by the SLB/MUX, therefore a two byte ASN must be used between the SLB/MUX and the router it connects to

Once your SLB/MUXes (and RAS gateways if required in your scenario) are deployed, your network team will have to configure your Top of Rack switches to accept connections from the ASN and the Transit logical network they’re using.

One of the things it took us a while to work out was that Three Way Handshake needs to be configured on the firewall your management network terminates on. This is because there are times that traffic is routed from your management network up to firewall (where it’s default gateway lives) and back to the SLB/MUX VIP address from the Private VIP network. This isn’t routed back along the same path though and is instead routed from the firewall directly back to your Top of Rack switches. For this reason, the following changes need to be applied to the firewall:

Configure TCP State Bypass – This is required due to “u-turning” as traffic is entering and leaving the firewall via the same interface but with a different destination.

An example config on a Cisco ASA would look something like this:

(config)#access-list tcp_bypass extended permit tcp any SLB/MUX_VIP 255.255.255.255</span>
(config)#access-list tcp_bypass extended permit tcp SLB/MUX_VIP 255.255.255.255 any</span>
(config)#class-map tcp_bypass</span>
(config-cmap)#description "SLB/MUX TCP traffic that bypasses stateful firewall"
(config-cmap)#match access-list tcp_bypass
(config-cmap)#policy-map tcp_bypass_policy
(config-pmap)#class tcp_bypass
(config-pmap-c)#set connection advanced-options tcp-state-bypass
(config-pmap-c)#service-policy tcp_bypass_policy interface Management_Interface_Here

Disable TCP randomisation – As above, the return path to the SLB/MUX VIP address doesn’t go back through the firewall, therefore it’s sequence number can’t restored after being randomised on the way in. This can be disabled for the VIP address only and doesn’t need to be a global change.

An example config on a Cisco ASA would look something like this:

access-list ACL-SEQUENCE-NUMBER extended permit tcp <SLBM VIP> any
access-list ACL-SEQUENCE-NUMBER extended permit tcp any <SLBM VIP>
class-map CM-SEQUENCE-NUMBER
match access-list ACL-SEQUENCE-NUMBER
policy-map global_policy
class CM-SEQUENCE-NUMBER
set connection random-sequence-number disable

On the Top of Rack switch ports your Hyper-V hosts are plugged into, change the MTU to 1702 to accommodate the Encapsulation overhead of SDN.

The standard Ethernet frame can reach up to 1542. The encapsulation overhead of SDN is 160, which gives us…1702

If your Physical NICs support the EncapOverhead property, a value of 160 will be applied to them during the deployment of the SDN roles., otherwise you’ll need to modify the NIC settings yourself.

OK, so this turned out to be a lot more information that I was planning on supplying but as it took me a while to get my head around some of this, I’m hoping you can benefit from it

Microsoft have a massive amount of documentation now available on SDN that wasn’t there when I was building this out the first couple of times. What I’m hoping to do is give you everything you need to get started without it being too overwhelming (as it was for me).

Before we finish up, here is a diagram that was put together during our lab build out to try and illustrate how all this is connected up.

clip_image003

Phew, so now that we’ve gotten that out of the way, I’ll see you in Part 9 when we actually deploy this to our infrastructure.

9 Replies to “Part 8: Server 2016 Software Defined Networking Overview”

  1. This is a great series Dave! Thanks for deploying and sharing. I too have managed to get my dNAT working and received connections inbound. (My provisioning platform is based on Windows Azure Pack that uses SPF to forward provisioning jobs to SCVMM 2016.) Unfortunately I am still working on outbound connections, no Internet access for the tenant VM yet. Can you share the path between hops that the packet travels or should travel? Where does it start, where does it go through how it returns… Network controller, Edge Gateways and SLB machines do not seem to have troubleshooting tools…

  2. My sNAT config seems wrong… My firewall – BGP – router – TOR admin told me that he sees traffic originating from on of the HNV network ip addresses.

    1. Hi Alexandros,

      Thanks and I’m glad it’s helping. Here’s a quick list of things to check:

      1. Ask your network admin to check that he/she can see a successful BGP session coming in from the SLB/MUX VMs on the transit network, if so, what routes are being shared, you should see IPs from the Private VIP network if you’ve enabled a tenant VM for outbound access.
      2. What’s the output of running debug-networkcontrollerconfigurationstate -networkcontroller , is everything showing as healthy?
      3. Have your network admins had a look at whether 3-way handshake and/or disabling sequence randomisation are required in your setup?

      There is a diagram at the bottom of this post that should show the traffic flow.

      Hopefully that’ll help get things moving again, unfortunately I no longer have access to the kit I require to keep a lab up and running so most of this is from memory.

      1. After your suggestions we narrowed down the issue to the BGP router. Well the routing guys figured it out. Since the packages go around through many VLANs there were some unusual requirements in the BGP router that the SLBs talk to. After sorting it out there seems to be good functionality besides the fact that ICMP does not get NATed outside! HTTP out, RDP in, all works fine. Any ideas why I don’t see ping getting NATed out?

        1. Hi Alexandros,

          Really happy you’re all up and running. Regarding the ICMP issue, I could be remembering incorrectly here as it’s been a while, but I don’t think ICMP will/can route over the SDN network, much in the same way that it doesn’t in Azure.
          Happy to be corrected on this though, as I said, it’s been a while.

  3. Hello David.
    Thank you for the great guides about SDN that were very helpful for me.
    It’s been a while I’ve been trying to implement a LAB environment based on VMM with SDN of Microsoft.
    However, I’m stuck on some communication issues.

    After correct instalation (without any error) of single Network Controller VM and all networks settings I connected some VMs to the same tenant network. And there I’ve noticed that the VMs cannot see each other in the same network. Ok, I’ve started walking through the MS troubleshooting guides and found these symptoms that I don’t know how to resolve:
    1. NC leaves error messages in event log every 5 minutes. They report that NC can’t connect to some devices specified with IDs. I found that these IDs belong to hosts.
    2. Netstat check on port 6640 reports that every host or NC itself listens on the port but no connection have been established.
    3. Every next check like Get-ProviderAddress, Get-PACAMapping is without any result obviously because of the previously mentioned problems.

    I checked also these points:
    1. Whole network is capable to transmit packets greater than 1674 – checked with ping between almost every devices.
    2. All related services and components both on NC or hosts runs.
    3. I think that almost all diagnostics of NC via Powershell are ok.
    4. I’ve also upgraded every OS, firmware and drivers.

    Can you give me some advice what could be wrong, please?

    1. Hi Michal,

      I’m glad the guides helped. It’s been a good while since I worked on SDN so I’m a little rusty and I’d imagine the process has changed a little since then. It sounds like there’s an issue with the Provider address network, I assume you’ve created all that within VMM?

      https://docs.microsoft.com/en-gb/system-center/vmm/sdn-controller?view=sc-vmm-1807#validate-the-deployment

      Run the following command from the Hyper-V host:

      ipconfig /allcompartments

      Assuming your PA network is represented, run the command below to see if you can ping the hosts PA IP (change “3” to match the output of the earlier command)

      ping -c 3 “Provider Address IP” (without quotes)

      I hope that helps as it’s about all I can remember from the troubleshooting side of things 🙂

      Good luck.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.