The VVol Blog Series:
Part 1: Introduction
Part 2: With and Without VVols
Part 3: New Storage Constructs
Part 4: How are VVols instantiated
Part 5: Storage Containers & Capability Profiles
Part 6: The Mythical Protocol Endpoint
Last time we looked at Storage Containers, Capability Profiles and storage policy-based data placement (based on matching of rules).
This time I want to delve deeper into the Protocol Endpoint (PE), a mystical object that lots of people have trouble understanding……
This post is focused on Block storage rather than NFS. To read more about NFS deployment of VVol (It’s been supported since Day 1 release of VVol), read this great post by my colleagues Valentin Hamburger and Christian Hess HERE.
The PE is not a LUN – it doesn’t store any data but is an Access Path or Entry Point for ESXi hosts to access VVol objects.
The PE is called the ALU (Administrative Logical Unit) and is associated with a storage container (SC). Just to flag one point, if you’re using HDS storage you can have multiple pools within an SC. With VVol you do not set a PE at Pool level. The PE governs access to the container itself so is set at container level.
You decide as the VMware administrator whether certain applications and VMs have access to certain VM Storage Profiles or not, working with the storage admin. That’s what governs whether VMs can live inside certain pools or not. You can check my last post Part 5: Containers and Capability Profiles if this is unclear or reach out to me.
When you create a storage container you MUST create a PE and this is part of the workflow in Hitachi Command Suite. Once the container and PE are in place, you perform a rescan on your ESXi hosts. Assuming you have allowed access from hosts to front-end ports based on whatever network topology (FC/iSCSI) you’re using, you will discover the underlying PE.
You can see that this is presented as LUN 256.
I probably mentioned it already but if you’re reading this post first in this series, you should know that you cannot run VVol without a HBA and associated driver listed on the VMware Hardware Compatibility List, supporting Secondary LunID feature as listed in the figure below.
I describe this as the HBA being able to “Talk VVol”, otherwise it has no way of interpreting or understanding the IO Conversation that occurs and where to direct the IO from a Block perspective. More on that later !
Back to the PE and ESXi rescan……
After the rescan you can see the set of protocol endpoints mounted on the ESXi host(s):
You are now ready to start using the capacity inside the storage container. That was pretty easy !!
As I mentioned in the last post Part 5: Containers and Capability Profiles, you first need to create capability profiles on the pools to expose some characteristics to your VMware or Application admin.
You mount the “VVol Datastore” onto the hosts. This is purely a logical placeholder to access VVol objects within the storage container. This is not like a traditional VMFS datastore and it has zero administrative overhead.
VVol Access Path
How does the ESXi host talk to the VVol ? That is a fundamental question in terms of the journey of an I/O. It’s pretty simple, the VASA provider provides the address on the backend. The ESXi host uses the address of the PE with an offset.
The ESXi host sends I/O directly to the VVol object on the backend using the PE address combined with an offset.
All I/O from the ESXi host to the VVol on the storage is out-of-band from the VASA provider. Once the VVol addresses are queried by the ESXi hosts from the VP, the VP does not need to be up to perform reads and writes. However the VP must be up for Power On/Power off and provisioning operations. I will be writing a further post later on for design considerations.
A VVol is still an LDEV or logical device.
To give an indication of scale, on Hitachi G1000 64,000 VVols are supported. Logically this still equates to 64,000 LDEVs.
Potentially that’s up to 20,000 ish VMs with three VVol’s each on a Hitachi G1000 array (Config, OS and Data per-VM) to do the maths.
WARNING: Vendor Bias coming up !
Setting aside being vendor-neutral for a moment, the Hitachi G1000 supports more VVols than any other vendor based on published real (not theoretical) figures I’ve seen, unless someone wants to correct me with official supported figures?
Hitachi could also trot out theoretical VVol maximum figures (we have them internally and they are many factors higher), but Hitachi NEVER advises customers to play loose and fast with mission-critical workloads and does not promote fanciful performance figures.
After all, we are now providing a defined SLA for applications at individual VM disk-level, and offloading lots more operations to the array like snapshots, for example. Metadata management is more critical as this is held within the array, not persistently on the vSphere host. So we need to be sure the arrays can handle it.
How many PE’s do I need ?
A maximum of single figures of PE’s are recommended even on an array with 64,000 VVols.
In fact, one of the main drivers for multiple PE’s could be for security for a multi-tenant environment.
All HDS arrays support array virtualisation (partitioning) technology. So you could have a Cloud infrastructure with the following configuration:
- 10 Tenants
- 10 partitions on one or more arrays
- 10 Storage containers
- 1 protocol endpoint per storage container to control host/access and segregate I/O
- 10 clusters using PE as the access control mechanism
- Mixed iSCSI and Fibre Channel for different performance classes
Protocol endpoint performance and queue depth come up a lot with customers and colleagues. Many have asked about the queue depth of the PE.
We are used to thinking in terms of queueing on many elements of the path between a VM and the front-end storage port.
How is I/O handled so PE doesn’t become a bottleneck ?
In vSphere you still have a queue depth of 64 per-device, per host. So how can you funnel all this I/O through a single LUN called the ALU ?
On PE, it’s the secondary LUN (VVol) that has the queue depth that matters, not the PE.
In the high-end G1000 array, there is an ASIC protocol chip which distinguishes IOs and transfers I/O directly to the appropriate VVol, not to the PE logical device. Saying it another way, the I/O request issued to VVOL is transferred in hardware directly to the processor which has the VVol/LDEV ownership.
On other Hitachi arrays the offloading process is similar but is done in microcode as G200/400/600/800 have different architectures, and do not have the same ASIC protocol chip in place as the G1000.
Until the next time ….