#VCDX Constraint: LBT (and vSphere Distributed vSwitch) ?

While creating a VCDX design, you consider common decision criteria and use cases and revisit them over and over. You baseline & compare design decisions against alternatives. These alternatives can be raised for your consideration during your defence.

One of the most common and subjective examples is the use of different logical network designs in a 10Gb/s scenario. 10Gb/s is becoming standard especially with the resurgence of blades in converged platforms such as Hitachi Unified Compute Platform, Cisco UCS and other solutions.

Within converged and hyper-converged infrastructure with embedded 10Gb/s Ethernet connectivity there is a reduction in the number of physical uplinks. It’s normal to see a physical Blade or host with a maximum of 20Gb/s virtual bandwidth per host either delivered via two logical 10Gb/s virtual devices, or 8 x 2.5Gb/s or other combinations.

Compared to the (good/bad?) old days of 6-8 x 1Gb/s Ethernet plus 2 x 4Gb/s FC this is an embarrassment of riches right ? That’s kinda true until we lay on virtualisation which raises the spectre of reduced redundancy and increased risk for a networking design.

Let’s be honest and admit that sometimes there’s just no right way to do this and it boils down to experience, planning and expecting that when things change the design can be changed to accommodate these impacts.

Some Background Reading

For those who want to understand vSphere teaming and failover design/considerations I recommend this VMware document. It’s an excellent resource for decision-making in vSphere/Cloud/Hyper-V networking design:

http://www.vmware.com/files/pdf/vsphere-vnetwork-ds-migration-configuration-wp.pdf

When starting designing vSphere solutions, I was used to Tier-1 storage solutions and using Active-Active architectures at all times. Active-Passive didn’t cut it. I applied this mindset to networking as well as storage. In hindsight much of this was due to lack of knowledge on my part. That kind of made me want to learn more which is how I ended up going down the VCAP-VCDX route to get to the bottom of it.

The document above shows you why this is not always optimal.  It made everything clear in terms of understanding good practice and moving away from slavishly adhering to “active/active” topologies. Some protocols such as NFS v3 find it hard to leverage LACP and in those cases LACP does not provide a performance or management benefit, as it increases management complexity in my view.

There are many excellent posts on the subject such as this one by Chris Wahl here:

http://wahlnetwork.com/2011/06/08/a-look-at-nfs-on-vmware/

Chris has written a series where he has tested LBT in his lab and established the definitive behaviour of what happens when a link nears traffic saturation.

and by Michael Webster here:

http://longwhiteclouds.com/2012/04/10/etherchannel-and-ip-hash-or-load-based-teaming/

and by Frank Denneman here (complete with gorgeous Mac Omnigraffle logical topologies):

http://frankdenneman.nl/2011/02/24/ip-hash-versus-lbt/

The VMware document is also useful in terms of showing how clear documentation makes for nice and easy deployments, rather than “back of a fag/cigarette pack” designs. You cannot put a value on  good clear documentation laid out like this. When it’s in a visual format in 2-D you can really get a picture for the way different traffic will route. It’s here you should make changes, not when you’re installing the platform.

How many times is this never done by VMware partners when installing vSphere and vCenter ?. I’ve seen it a lot and this can lead to many issues.

LACP will take care of it 

And sometimes it’s assumed that LACP will “take care of it” as now we have more bandwidth.

This is not the case for a discrete TCP/IP session  from a virtual machine to an external destination or for NFS. These sessions will only ever use the same uplink when an IP hash is calculated as Frank has shown. Yes a VM might use multiple uplinks across multiple sessions (TCP/IP ports) but never for one point-point “conversation”.

And NFS typical use case – a vSphere host mounting a datastore from a VIP on a NAS device will also only every use a single uplink as Chris has clearly shown.

The VMware document also shines a light on the fact that keeping it simple and avoiding LACP and other more complex topologies that may not deliver any material benefit is important. Use case, requirements and constraints drive conceptual and logical design decisions.

Your logical network design is a template that will be replicated many times. If the logic behind it is questionable, any adverse issues will be amplified when deployed across one or more clusters. Personally I believe this is the most critical element in designs (from experience) to ensure cluster and host stability and availability. I would put storage as a close second.

Logical Network Design

When making design choices regarding balancing different workload types across a pair of 10Gb/s adapters it can be a game of chance. If you’ve completed your current state analysis and understand your workload you can apply some science. You still might suffer from massive organic growth and the effect of previous unknown technology such as vMotion.

From a design perspective there are so many things to consider:

  • Understanding the current workload
  • Upstream connectivity (logical and physical)
  • Traffic Type(s)
  • Traffic priority (relative)
  • Latency requirements for different types of traffic
  • Application dependencies (some may benefit or suffer from virtual machine to host affinity)
  • Workload profile (across day/month/year)
  • Bandwidth required
  • Performance under contention
  • Scalability

A balanced solution such as Teaming and Failover based on Physical NIC load (also known as load-based teaming) is an excellent way to ensure traffic is moved between uplinks non-disruptively. It’s like DRS for your network traffic.

So for me LBT without LACP is a good solution and can be used in many use cases. I personally would hold off using Network I/O Control Day 1. It’s better to give all traffic types access to all bandwidth and only put on the handbrake for good reason. NIOC can be applied later in real-time.

And now the constraint

Unfortunately LBT is only a feature of the vSphere Distributed Switch. This post is kind of trying to reach out to raise this within the community and make the point that vDS is around a long time now (more than 4-5 years) and it’s time it made it’s way down the SKUs to standard edition.

After all Microsoft doesn’t have any such limitations at present.

10Gb/s is now pervasive which means more and more we need a software(policy)-based solution to ensure fairness and optimum use of bandwidth. I believe we have reached the point where VMware Load Based Teaming without LACP is a great solution in many common use cases today to balance traffic under load.

I haven’t gotten into Network I/O Control and the additional priority and Class of Service (CoS/QoS) that can be applied to different traffic types. That’s another tool in the weaponry that can be used. Maybe more on that later.

KISS

For me network design comes down to KISS  Keeping It Simple, Stupid.

So if LBT is the a great potential solution, without Enterprise Plus you can’t use virtual distributed switches and without this feature you cannot use LBT. This also rules Network I/O control which also requires the vDS.

As vSphere has evolved we have seen awesome features appear and become commonplace in lower SKUs. It’s strange that vDS is still an Enterprise Plus feature and I don’t like to have to use this as a design constraint with customers who can’t afford Enterprise Plus.

I hope someday soon this awesome technology will be available to all customers regardless of which license tier they reside in.

Thoughts ?

10 thoughts on “#VCDX Constraint: LBT (and vSphere Distributed vSwitch) ?

  1. Good article Paul, my views are that KISS should be used by default unless you have a compelling reason not to e.g. Application Performance/Availability.

    In relation to NIOC, I would enable this by default as otherwise in times of contention the wrong VM could be ‘hogging’ an uplinks bandwidth. Especially in cases when you have Test/DMZ/LAN all using the same bandwidth.

    I want to call out iSCSI as this uses a VMkernel Port (same as NFS) however it has in built MPIO capabilities and therefore preference would be given to 1 to 1 port binding rather than LBT as each VMkernel Port cannot use more than 1 physical network adapter.

    Keep the post coming!

  2. Great Comments Craig. Fair point about NIOC. I would agree that the question of PRD/DEV/TEST is an interesting question and the whole noisy neighbour debate. And that’s not just at the network layer but also compute. Most common sense dictates not using resource pools for compute/network by default but this can lead to noisy neighbours as you say, so sometimes it depends IMHO on the customer and making sure you have robust monitoring in place to catch these situations.

    I didn’t mention iSCSI but of course that’s another use case where we cannot use multiple vmnics for an iSCSI kernel port so this also rules out LACP in that instance.

    Paul

  3. Good post Paul, thanks for sharing. I agree completely with you premise and Craig’s comment, just wanted to add that very similar to iscsi, multi nic vMotion would be another use case for 1 to 1 mapping of Port group to Nics, using LACP disables the ability to do these things. Another reason is that LBT does not require any special configuration on the physical switch, it’s just works. LACP also disables the ability to use port mirroring if that is required for troubleshooting purposes. All in all I find LACP as more restrictive and complex then LBT that “just works” and does a good job at it.

  4. Thanks Niran for adding to the mix 😉 with those two key points.
    Completely agree and this is a little bit of SDN goodness with software making intelligent decisions and making the physical networking much more simple for all concerned.

    Paul

  5. Nice post Paul!

    In some cases/designs, LACP can be justified, I have treated this in my post about LACP Design Considerations.

    For info, LACP requires vDS too 😉
    And you cannot do LBT with LACP.

    1. Hi Romain,

      Thanks for your comments and further suggestions.

      I include your excellent post here which you should have included: Another excellent resource !!
      http://cloudmaniac.net/vsphere-enhanced-lacp-support/

      In your post you mention the scenario where we use LACP for VM traffic. The behaviour is that if we have traffic from source to destination VM with fixed IP and the same TCP port, the new teaming and failover settings would not result in a change of uplink from uplink 0 to uplink 1 for example. So in that case route based on originating is a dump yet pretty fair way of load distribution, especially when combined with proper monitoring. That’s from a performance perspective. It seems to me that for more random traffic coming from multiple sources these new schema will be highly effective.

      Considering the additional complexity and lack of any performance benefit I’m still not sure it justifies it. As you rightly pointed out, having a good network SME is important but it is also important that they understand vSphere networking.

      In terms of failure scenarios (physical link failures) what benefits does LACP provide in your experience. Link convergence time or loops should not be an issue anyway when the right settings are applied to a physical switch port in a vSphere environment (portfast/bpduguard). Also what do you see as the benefits with NFS v3 ?.

      Thanks,
      Paul

  6. Hi Paul !
    I just wanted to express that LBT is not the Holy Grail of the performances, even if you have Enterprise Plus licenses…
    …but neither is LACP…both have their own benefits. 🙂

    I don’t understand your example “The behaviour is that if we have traffic from source to destination VM with fixed IP and the same TCP port, the new teaming and failover settings would not result in a change of uplink from uplink 0 to uplink 1 for example.”

    BTW, if you have mostly east/west traffic between VMs and Multi-Chassis LAG (VSS or VPC), you will benefit from a unified MAC table and thus reduce the possible number of hops between VM (traffic will never cross ISL between switchs of VSS/VPC).

  7. Paul, great post. I had this tagged in my feed and I am just now getting caught up on my reading. I feel like in converged 10G scenarios you really paint yourself into a corner without the advanced features of the VDS, specifically NIOC, QOS tagging, and LBT.

    I’ve been encouraging people without enterprise plus to use 10G adapters that can be partitioned into virtual adapters with separate bandwidth, like the Cisco VIC or broadcomm NPAR technology. Partitioning the NIC lets them mimmic the gig networking switch topologies that we know work (switch0 MGMT VMOTION, switch1 VMs, switch2 storage).

    I really would like to see the VDS moved into more vSphere license levels.

Leave a Reply

Your email address will not be published.