vSphere Storage Interoperability Series Part 2: Storage I/O Control (SIOC) 1 of 2

Disclaimer: I work for Hitachi Data Systems. However this post is not officially sanctioned by or speaks on behalf of my company. These thoughts are my own based on my experience. Use at your own discretion.

This is the second part of my storage interoperability series designed to bring it all together.

Here are the other parts:

Part 1: vSphere Storage Interoperability: Introduction

This started from a desire I had to be crystal clear about what these settings are and when you should and shouldn’t use them. Many vSphere experts may be aware of these settings but I’m hoping to bring this to the masses in a single set for easy access. All feedback gratefully accepted !!!!

In this post I will cover some topics around Storage I/O Control. This is the first post related to SIOC.  I will have at least one more and possibly two more depending on how easy each is to digest.

SIOC Overview

Storage I/O Control is an I/O queue-throttling mechanism that is enabled at datastore-level in vSphere. It takes action in real-time when IO congestion occurs for any datastore on which it is enabled.

When I say “takes action” I mean it reduces the queue depth or pipeline of I/Os queued up for a datastore in real-time leading to an immediate effect. It’s like driving with the handbrake on or in a lower gear.

It is important to note that device latency and device queuing relate to the individual queue associated with the datastore itself, not at HBA or at VMkernel level. The term device in vSphere always refers to a datastore when used in connection with storage.

You may have seen this image before which shows the different queues at play in vSphere and the work the kernel is doing which is quite amazing really:

Screen Shot 2014-12-11 at 10.39.08

SIOC does not take any action during normal operational conditions.

It only intervenes when one of two specific thresholds set by the administrator are breached:

  • An explicit congestion latency threshold is breached for a datastore. This is an observed latency or response time in Milliseconds (ms)
  • A percentage of peak performance has been breached for a datastore. (The peak performance of the datastore is calculated automatically by vSphere.)

When you enable Storage I/O Control on a datastore, ESXi begins to monitor the device latency observed by connected hosts via a file called IORMSTATS.sf which is stored directly on the datastore. All connected hosts have access to this file and can read from and write to the file. When device latency exceeds the defined threshold, the datastore is considered congested, and each virtual machine that accesses that datastore is allocated I/O resources in proportion to their shares.

If all virtual machines accessing the datastore have the same shares (the default is that they are all normal), each virtual machine will be allowed equal access regardless of their size or whether they are running on the same host or not. So, by default, critical virtual machines will have exactly the same priority or shares as the most unimportant virtual machines.

NOTE:

SIOC is not just a cluster-wide setting. It applies to any host connected to the datastore. Connected hosts write to a file called IORMSTATS.SF held on the datastore, regardless of cluster membership. So you could have standalone ESXi hosts managed by vCenter sharing datastores with a cluster. Not a great idea IMHO but it’s possible.

Recommendation:

In an ideal world do not share datastores between clusters. It is possible but much more difficult to isolate performance (or other) problems in this scenario. This might not be possible when using VSAN or other similar system/SDS/hyper-converged systems.

In the past I recommended using “swing LUNs” to move datastores datafiles between clusters. This will not be so much of an issue now with the flexibility that is now supported within vCenter for cross-vCenter vmotion etc.


Recommendation:

Unless a storage vendor or VMware has issued guidance otherwise, enable Storage IO Control on datastores. It is still a good idea to check before making wholesale changes affecting hundreds/thousands of virtual machines.

Without SIOC, hosts have equal device queue depths and a single VM could max out a datastore despite the fact the datastore is being accessed by multiple VMs on other hosts. When congestion occurs this situation will be maintained and it is possible that more critical workloads will be throttled to the benefit of less important workloads. It is always a good idea to enable SIOC regardless of the back-end disk configuration, with a high enough threshold so as not to inhibit performance.


SIOC characteristics

Storage IO Control can very simply apply powerful controls against runaway workloads and prevent so-called “noisy neighbour” syndrome within a vSphere environment.

When configuring SIOC the following should be taken into account:

The feature is enabled once per-datastore and this setting is inherited across all hosts. Note the previous point that this can be across multiple clusters or any hosts that share the datastore.

  • It takes action when target latency has been exceeded OR percentage performance capability of the datastore has been reached.

o          Both of these settings are customizable for each datastore.

  • Enabling virtual machine shares also allows you to set a ceiling of maximum IOPS per-VM if you choose, as well as the relative virtual machine shares.

o          This allows relative priority and a level of Quality of Service (QoS) to be defined and enforced during periods of contention.

  • With SIOC disabled, all hosts accessing a datastore get an equal portion of that datastores available queue. This means it is possible for a single virtual machine to become a “noisy neighbour” and drive more than it’s fair share of the available IOPS. In this scenario, with SIOC disabled, no intervention will occur to correct this situation.

By default, all virtual machine shares are set to Normal (1000) with unlimited IOPS. If you select this setting to prioritize performance the following settings are available:

  • Low: 500
  • Normal: 1000
  • High: 2000
  • Custom: Set a custom value for the VM

So the ratio of Low:Normal:High is 1:2:4. This can be adjusted if required or custom values can be set.

Storage I/O requirements and limitations

The following conditions must be adhered to when considering using SIOC.

Datastores that are enabled for SIOC must be managed by a single vCenter Server system. As mentioned earlier, vCenter is not part of the data plane and is only used for management and configuration of the service.

  • Storage I/O Control is supported on Fibre Channel-connected, iSCSI-connected, and NFS-connected storage.
  • Raw Device Mapping (RDM) is not supported for SIOC.
  • Storage I/O Control does not support datastores with multiple extents. Extents are equivalent to traditional elements of a concatenated volume that are not striped.

Recommendation:

It’s best not to allow concurrent access to a datastore (backed by a common disk set) to vSphere clusters or hosts managed by multiple vCenter instances. Always ensure back-end spindles are dedicated to clusters or individual hosts managed by the same vCenter instance if at all possible.


Recommendation:

Many people wonder about whether you can use tiering with SIOC. I will cover this in the next post but it’s safe to use it, with some caveats, and this is one you should check with your vendor to ensure they have qualified it !!!.

Set a threshold that is “impossibly” high so as not to be “in play” in normal operational state. So for an SSD backed tiered pool make sure the threshold is at least 10-20ms and SIOC should not ever intervene unless serious problems occurs.

It should make sense now that you should not allow an external physical host to use the same backend disk pool (HDP or HDT) as a vSphere host or cluster on which SIOC is enabled. This is just bad design and an accident waiting to happen.

NOTE: Make sure not to forget about the impact of using VADP backup proxy servers accessing datastores/LUNs directly for SAN-based backup and the IOPS Impact this could have on your backend storage.


Until the next time …….

vSphere Storage Interoperability: Part 1 Introduction

 

Disclaimer: I work for Hitachi Data Systems. However this post is not officially sanctioned by or speaks on behalf of my company. These thoughts are my own based on my experience. Use at your own discretion.

VMware vSphere has many features within its storage stack that enhance the operation of a virtual datacenter. When used correctly these can lead to optimal performance for tenants of the virtual environment. When used incorrectly these features can lead to a conflict with array-based technology such as dynamic tiering, dynamic (thin) pools and other features. This can obviously have a detrimental affect on performance and lead to an increase in operational overhead to manage an environment.person picture

It is quite common to observe difficulty among server administrators as well as storage consultants in understanding the VMware technology stack and how it interacts with storage subsystems in servers and storage arrays. It is technically complex and the behavior of certain features changes frequently, as newer versions of vSphere and vCenter are released.


Recommendation:

Don’t ever just enable a feature like Storage I/O Control, Storage DRS or any other feature without thoroughly understanding what it does. Always abide by the maxim “Just because you can doesn’t necessarily mean you should.” Be conservative when introducing features to your environment and ensure you understand the end-to-end impact of any such change. It is my experience that some vSphere engineers  haven’t a clue how these features really work, so don’t always just take what they say at face value. Always do your homework.


Scope

This is the first in a series of posts to introduce the reader to some of the features that must be considered when designing and managing VMware vSphere solutions in conjunction with Hitachi or other storage systems. Many of the design and operational considerations apply across different vendors technology and could be considered generic recommendations unless stated otherwise.

In order to understand how vSphere storage technology works I  strongly recommend all vSphere or Storage architects read the cluster deep-dive book by Duncan Epping and Frank Denneman. This is the definitive source for a deep dive in how vSphere works under the covers. Also checkout any of Cormac Hogan’s posts which are also the source for much clarification on these matters.

Screen Shot 2015-01-09 at 10.28.30

http://www.yellow-bricks.com/publications/

http://frankdenneman.nl/publications/

Why write this series ?

When questions came up in my head regarding whether I should or shouldn’t use certain features I always ended up slightly confused. Thanks to Twitter and Duncan, Cormac Hogan, Frank and others who are always available to answer these questions.

This is an attempt to pull together black and white recommendations regarding whether you should use a certain feature or not in conjunction with storage features, and bring this all into a single series.

apple picture

The first series focuses on Storage IO control, Adaptive Queuing, Storage DRS (Dynamic Resource Scheduler) and HLDM/multi-pathing in VMware environments, and how these features interoperate with storage. I also plan to cover thick vs thin (covered in a previous post), VAAI, VASA, and HBA queue depth and queuing considerations in general. Basically anything that seems relevant.

Should I use or enforce limits within vSphere?

Virtualization has relied on oversubscription of CPU, Memory, Network and Storage in order to provide better utilization of hardware resources. It was common to see less than 5-10% average peak CPU and Memory utilization across a server estate.

While vSphere uses oversubscription as a key resource scheduling strategy, this is designed to take advantage of the idle cycles available. The intention should always be to monitor an environment and ensure an out-of-resources situation does not occur. An administrator could over-subscribe resources on a server leading to contention and degradation in performance. This is the danger of not adopting a conservative approach to design of a vSphere cluster. Many different types of limits can be applied to ensure that this situation does not arise.

In some environments close-to 100% server virtualization has been achieved, so gambling with a company’s full workload running on one or more clusters can impact all the company’s line-of-business applications. That’s why risk mitigation in vSphere design is always the most critical design strategy (IMO).


Recommendation:

If at all possible please be conservative with vSphere design. If you’re putting your eggs in one basket use common sense. Don’t oversubscribe your infrastructure to death. Always plan for disaster and assume it will happen as it probably will. And don’t just use tactics such as virtual CPU to physical CPU consolidation ratios as your design strategy. If a customer doesn’t want to pay explain that the cost of bad design is business risk which has a serious $$$ impact. 


More on Reservations

Reservations not only take active resources from a server preventing other tenants on the same server from using them, but also require other servers in a HA cluster to hold back resources to ensure this can be respected by the cluster in the event of a failure. This feature of vSphere is called High Availability (HA) Admission Control and ensures a cluster always maintains resources (preparing for when a host failure occurs).


Recommendation:

Implementing limits is a double-edged sword in vSphere. Do not introduce Reservations, Resource Pools or any other limits unless absolutely necessary. These decisions should be driven by specific business requirements and informed by monitoring existing performance to achieve the best possible outcome.

In certain cases like Microsoft Exchange it makes complete sense to use reservations, as Exchange is a CPU-sensitive application that should never be oversubscribed !. But that is an application/business requirement driving that decision and is a VMware/Microsoft recommendation.


 

The following text has been taken from the vSphere Resource Management guide for vSphere 5.5 and provides some important guidance regarding enforcing limits.

Limit specifies an upper bound for CPU, memory, or storage I/O resources that can be allocated to a virtual machine. A server can allocate more than the reservation to a virtual machine, but never allocates more than the limit, even if there are unused resources on the system. The limit is expressed in concrete units (megahertz, megabytes, or I/O operations per second). CPU, memory, and storage I/O resource limits default to unlimited. When the memory limit is unlimited, the amount of memory configured for the virtual machine when it was created becomes its effective limit.

 In most cases, it is not necessary to specify a limit. There are benefits and drawbacks:

  • Benefits — Assigning a limit is useful if you start with a small number of virtual machines and want to manage user expectations. Performance deteriorates as you add more virtual machines. You can simulate having fewer resources available by specifying a limit.
  • Drawbacks — You might waste idle resources if you specify a limit. The system does not allow virtual machines to use more resources than the limit, even when the system is underutilized and idle resources are available. Specify the limit only if you have good reasons for doing so.

In the next part I cover Storage I/O Control … coming soon.

Opinion Piece: The truth about Converged Systems

Disclaimer: I work for Hitachi Data Systems. This post is not authorised or approved by my company and is based on my personal opinions. I hope you read on.

A couple of weeks back Pure Storage announced their converged stack called the Flashstack. So now Hitachi has UCP Director, VCE has Vblock, Netapp has Flexpod. Now Pure is in the Club.

A Personal Experience of a converged deployment

Last year I worked on two converged system implementations as an independent consultant. This post is written from that perspective and is based on frustrations about over-engineered solutions for the customers in question. I was a sub-contractor, only brought in to build vSphere environments once the solutions had been scoped/sold/delivered.

Both instances involved CCIEs working for the system integrator for the network/fabric/compute design & build. Core network management was the customer’s responsibilty. So this was Layer-2 design only yet still required that level of expertise on the networking  side.

In this project, there were 100+ days PS time for the storage system for snapshot integration and (just) 10 days to design & deploy SRM.

I had a hunch that the customer didn’t have the skillsets to understand how to manage this environment and would ultimately have to outsource the management which is what happened. This was for an environment for 50-100 VMs (I know !).

Reality

When I started this post I was going to talk about how Hitachi converged the management stack using UCP Director inside vCenter and Hyper-V. That sounds like FUD which I don’t want to get involved in, so I decided to raise the question of whether a converged system is better and what is converged ?.

Question: What is a converged system?

Answer: It is a pre-qualified combination of hardware, with a reference architecture and a single point of support (in some cases).

Untitled

Question: So does that make manageability easier ?

Answer: In many cases you still manage hypervisor,  server image deployment as well as the storage & network separately. So you still need to provision LUNs, zone FC switches, drop images on blades etc etc. And each of these activities requires a separate element management system (to clarify: this is not the case with HDS).

Question: So how is this better ?

Answer: If you look at some of the features of the blade systems in question, they are definitely an improvement re: the ability of blades to inherit a “persona”. The approach of oversubscribing normally under-utilised uplinks for storage and network traffic is also a good idea. However you have to ask yourself at what scale many customers would require this functionality i.e. how many customers deploy new blades every day of the week and with modern hardware, server failures are relatively uncommon, so will they benefit from the more modern architecture.

Question: So wouldn’t it maybe be simpler to just use rack servers ?

Answer: It depends on your requirements, what you’re comfortable with and in most cases whether the vendor you are most comfortable with has a solution that suits your needs. Also it might make sense to spread a fault domain across multiple racks which can be a challenge with a single blade chassis which was the case for the customers I worked with.

Question: So when should you deploy a converged platform

Answer: A good use case could be consolidating from many legacy server models onto a single infrastructure reference architecture/combination, which can reduce support overhead via standardisation.

I don’t know about other vendors but Hitachi just released a much smaller converged solution – the 4000E – which comes in a form factor of 2-16 blades. So at the very least make sure you have a system that is not too big for your needs, or over-engineered.

Untitled1

A bridge too far

FCoE has not taken off, has it ?

I think this is one of the great failures of the whole converged story i.e. the ability to take advantage of the undoubted benefits of converging storage and Ethernet traffic onto a single medium. This has not happened in the most part. I think it’s widely accepted that Cisco was championing FCoE as part of it’s Data Center 3.0 initiative.

In relation to the concept of FCoE, I’m suggesting most customers use native FC on an array and “convert” to Ethernet inside the converged stack by running inside Ethernet, thereby removing one of the main value propositions of the whole concept i.e. to remove a separate fibre channel fabric and reduce HBA, cabling and FC switch costs. I’d welcome feedback on that point.

Bottom Line

Only go down this route if your requirements and scale justify it, and if you have the skills to understand the technology underpinning such a solution.

As has been said many times … KISS … Keep it simple Stupid and in my humble opinion sometimes converged is not the simplest route to go. Now with some hyper-converged solutions such as VMware EVO:RAIL they have simplified management and made this a great solution for many customers.

What do you think ?

Comment or ping me on Twitter @paulpmeehan

Hitachi HDLM Multipathing for vSphere Latest Docs

HDLM

HDLM stands for Hitachi Dynamic Link Manager which is our multi-pathing software for regular OSes such as Windows, Linux etc but also for vSphere. You can install a VIB and use this to optimise storage performance in Hitachi storage environments.

I came across these documents yesterday as part of something I was doing. I thought I’d share them here as sometimes people tell me it can be hard to find stuff related to HDS.

For those customers and partners using HDLM, attached find the latest documentation from October 2014 for vSphere for HDLM. First the User Guide. This is Revision 9.

Download

 

Then the Release Notes from October 2014. This is Revision 10.

Download

 

Regarding default PSPs for different arrays types:

Active-Active: FIXED
(Although round robin can be used in many cases)
Active-Passive: MRU
ALUA:  MRU
(Although some arrays need to use Fixed)

For more info start at this KB article:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1011340

If you want more balanced performance and awareness of path failures/failovers for Hitachi arrays you can use HDLM. Please consult the user guide but to summarise, the main benefits are:

  • Load balancing (Not based on thresholds such as hitting 1000 IOPS)
  • Path failure detection and enhanced path failover/failback
  • Enhanced path selection policies
    • Extended Round Robin
    • Extended Least I/Os
    • Extended Least Blocks

Extended Least I/Os is the default.

Why would you use it: A quick Recap

If are using Hitachi storage and are entitled to use HDLM then you should probably use it to get optimal performance and multi-pathing between the host-target. I would typically suggest that for Production at the very least, and you could make a call on whether to use it for Dev/Test etc.

The main benefit is a more optimal use of paths in a synchronous manner. I’d equate it to the way DRS smooths out performance imbalances across a cluster. HDLM does the same thing with the IO directed to a datastore/LUN in real-time based on enhanced path selection policies.

You can use the native ESXi multipathing NMP and in most cases it will be more than sufficient for your needs if you don’t use HDLM,  or EMC Powerpath.  There is a misunderstanding commonly about the native NMP Round-Robin (RR) Path Selection Policy (PSP).

To clarify:

  • For a given datastore only a single path will be used at a single point in time.
    • The important thing to note is that this will be the case until a given amount of data is transferred. Consider this the threshold.
  • This is kind of like using LACP on a vSphere host for NFS. You cannot split an Ethernet frame across two paths and reassemble at the other end to simultaneously transmit a single frame across multiple uplinks. Same thing for storage isn’t it ?
  • The switching threshold can be either based on IOPS or bytes. The default is 1000 IOPS.
    • I suggest leaving this alone. Read on.
  • When the threshold set for the path is reached the host will switch to the next path.
  • The host will not use the previous path for the particular datastore at that point in time until it is selected again.

From experience it is entirely acceptable that a particular datastore is accessed using one path at a time.

I have seen extremely high IO driven from a single VM with very large block size. Using the infamous IOPS=1 setting is not enough to get the IO to the required level. Then you need to look at queue depth and number of outstanding IOs (DSNRO).

Once you start playing with these parameters the impact can be system/cluster-wide which is why it’s probably not a good idea.

Queuing in vSphere

Also consider the different queues involved in the IO path:

Screen Shot 2014-12-11 at 10.39.08

Consider the job the hypervisor is doing for a second. It is co-ordinating IO from multiple VMs, living on different data stores, on multiple different hosts. Think of the amount of moving parts here.

Imagine a cluster with

  • 20 hosts
  • 2 HBAs per hosts
  • 180 datastores
    • Thats a total of 360 active devices each host is managing with multiple queues on top and underneath.

That’s not trivial so before you let the handbrake off, take the time to consider the decision you’re making to ensure it will not have an unforeseen impact somewhere down the line.

 

 

 

 

 

 

 

VCAP-DCA Lab Talk Part 3: Homelab Physical Networking

It’s been a real long time since I wrote one of these posts concerning home labs. I’ve been using work labs for a while now but need to start working on the gathering rust at home too. I’ve kind of neglected the homelab of late but just did a refurb on the office so now it’s a sweet place to wile away the witching hours geeking it out.

Powerline

Last time I wrote a blog on this topic I was enjoying  home networking nirvana. Everything was going swimmingly until the black magic hit. I’ve been using TP-LINK WPA-281 wireless extenders in the house for both wired and wireless (repeated) extension. I’ve run into some problems probably related to the size of the house or the way it’s wired.

I actually really like a lot of the TP-LINK stuff. It’s really easy to configure and feature-rich and in general it does work. Don’t let my experience put you off but it’s probably more likely to be suited to smaller houses (or at least not the larger ones with brick walls throughout).

I’ve been seeing intermittent network disconnects on the Mac – back to my login prompt – with “No Network Accounts Available”. It crashes my apps which is not good.

So it’s time for a wired solution. I didn’t realise that my electrician (who I thought wired the house with phone cable) wired the house with Cat5E but just didn’t connect RJ-45 faceplates. So now it’s time for a bit of crimping and rooting around.

I need to consider getting a decent network switch that supports VLANs, Layer 3 routing  and some other stuff that allows it to more closely mimic a proper network.

Community Power as usual

I really hate reinventing the wheel. It’s great to do your homework  but sometimes there’s no point when there’s a huge support network out there who’ve probably tested every model imaginable. So I put out the request for help and got some great options back.

I’d like to thanks my community comrades for some great  suggestions and problem solving. Follow these guys – you never know when you might need help, and they might even follow you back :-) Also check out their blogs.

  • Joe Silvagi @VMPrime
  • Mads Fog @hazenet
  • Manfred Hofer (Shouldn’t he take over The Hof from David Hasselhoff ?) …..@Fred_vbrain
  • Bobby Stampfle @BobbyFantast1c
  • Todd Simmons @trainingrev
  • Gareth Hogarth @viGareth
  • Craig Waters @cswaters1

The options proposed:

HP

Joe proposed the HP Procurve 2824. Check out the specs here:

http://h18000.www1.hp.com/products/quickspecs/archives_Division/11993_div_v1/11993_div.HTML

I checked it out on ebay.ie and it’s probably likely to cost anything from €80 to €180.

It supports layer3. VLANs, Gig speed and LACP. That’s a pretty good price for a Layer-3 switch.

Gareth suggested the following HP switch:

http://h17007.www1.hp.com/us/en/networking/products/switches/HP_1910_Switch_Series/index.aspx#.VIi4a0tXvPk

A good model of this one will likely come in around €200+.

Cisco

Next there was a couple of Cisco options thanks for Manfred and Mads. The SG200-26 and SG300-28. Both support VLANs, as well as LACP and are very quiet.

For the SG200 this is a Layer-2 switch only so no inter VLAN routing.

http://www.cisco.com/c/en/us/products/collateral/switches/small-business-100-series-unmanaged-switches/data_sheet_c78-634369.html?mdfid=283771818

You could probably use a Vyatta router or equivalent as a VM if you like, although it’s obviously best to use an external network device if you can rather than have your router inside the virtual infrastructure. This one is likely to cost anything around €200 for the 20+ port version.

You can move up to the 300 series which a few of the guys recommended and I’ve heard good things about from other people on Twitter, and at a price tag of about €300 minimum this will be a better buy than the 200 series.

http://www.cisco.com/c/en/us/products/collateral/switches/small-business-smart-switches/data_sheet_c78-610061.html?mdfid=283019617

Finally on the Cisco front Todd proposed a 2960G for access layer and a 3560G for Layer-3.

http://www.cisco.com/c/en/us/products/switches/catalyst-2960-series-switches/models-comparison.html

This will probably come in just under €200.

For the Layer-3 switch you can find the specs here

http://www.cisco.com/c/en/us/products/switches/catalyst-3560-series-switches/index.html

This one is likely to cost €250ish. So you can have separate physical Access and Aggregation layers for about €400 which is also not a bad option.

Dell

Gareth came up with a really good option. The Powerconnect 6224 which is also a full Layer-3 switch and comes in on Ebay starting at €250 but more likely to €350-400.

http://www.dell.com/us/business/p/powerconnect-6200-series/pd

Again a really capable switch. I do like Dell switches – much underrated like a lot of their equipment – especially Dell servers.

UPDATE: Microtik Cloud Switch

Thanks to Dmitry who has suggested the following Microtik switch in the comments. It looks like a really great option at a great price. You can get these at €140 on Ebay which is an awesome price for a full featured Layer 3 switch. This could be a popular one !

Check out the specs here:

http://www.balticnetworks.com/docs/CRS125-24G-1S-131025130632.pdf

Mikrotik Cloud Router Switch CRS125-24G-1S-RM

Conclusion

Here are some good options. For me you need LACP, VLANs, Layer-3 support (or two devices), possibly priority tagging using 802.1p but aside from that it’s up to your own individual taste.

HDSPOST: Hitachi Blades …. Have your cake and eat it. Part 1: The introduction

At the Hitachi Data Systems Community Website you will find some posts that are related to my role at HDS. Several people have asked me about our blades. I plan to write the second in a number of articles shortly but wanted to link across to give people an introduction. The original article is posted here:

 

https://community.hds.com/people/pmeehan/blog/2014/11/14/hitachi-blades-have-your-cake-and-eat-it-part-1-the-introduction

I have also posted the article on my personal blog here to ensure no login issues which some people have had on the community website.

Disclaimer: Take note that while this article contains my own personal thoughts,  it does not express official HDS opinions.  It also has views that are specific to our technology so bear that in mind and don’t be surprised about that or complain when you get to the bottom of the article :-)

……………………

Hitachi Blades …. Have your cake and eat it. Part 1: The introduction

When I joined Hitachi I didn’t understand the power and use cases that could be applied to our Blades. I admit I assumed they were just like any other blades. Furthermore I have always had a sneaking disregard for blades. My preference when all things were equal was always rack servers. I think people either love ‘em or hate ‘em. I wasn’t quite sure blades were ever being properly utilised and frequently saw half empty chassis wasting power.

I knew Hitachi had our fully converged platform (UCP Pro) with fully integrated management of the entire stack – orchestrated by UCP Director software. That was the first differentiator that nobody else can match. If you wanna check out the demo here and let me know of similar solutions I’d love to hear about them:

Hitachi Unified Compute Platform Director —  Full Length Demonstration – YouTube

I was pretty impressed by UCP Director as it does resolve a common problem which is having to separately provision network (VLANs), storage, hosts etc. It even lets you drop images on bare metal servers – that are not part of the virtual infrastructure – as well as deploy ESXi images from in vCenter. It just makes things easier. I have a different opinion to some companies who argue that managing storage is incredibly complex. I believe the overall complexity of virtual infrastructures is what has made things more complex. It’s not the fault of the storage – you could make the same statement about the network surely ?, particularly in a virtual datacenter. Just think about planning a virtual network design for NAS/NFS. It’s the network that introduces most of the complexity. Right ?

So reducing management complexity and offloading that to an orchestration layer is what it’s all about. Just look at VAAI and what is has meant for template deployment, storage vmotion and other tasks. In my view that’s where intelligent orchestration software like UCP Director is an equally valid approach to starting from scratch with a blank sheet of paper.

Multi-Hypervisor

Of course you can get UCP Director for VMware and Hyper-V with the same orchestration layer for both vCenter and System Center. So this is about choice for our customers.

LPARs

When I heard of LPARs (logical partitions) I didn’t get it. As a died-in-the-wool virtualisation-first advocate I couldn’t see why now, with vSphere 5.5, there could be a use case with monster sized VMs now possible. To introduce LPARs, they came from Hitachi Mainframe and UNIX platforms and have been around for years.

hitachi-lpar1-2

Now I get it. If you’re a customer and want to benefit from a consolidated converged platform from Hitachi with UCP Director, you may be concerned about software licensing or worried about an audit from Oracle, Microsoft, IBM or other running on a vSphere or Hyper-V cluster.

Now you can have your cake and eat it !!. 

I heard of a customer the other day with 5500 servers which had cores “switched off” for licensing purposes. They had an average of 1.3 cores per server. Isn’t this sad that customers have to do this because of punitive licensing policies ?

Let’s say you want to use vSphere (or Hyper-V) on a UCP Pro converged platform. But you need to consolidate a bunch of physical servers that need to stay on single CPUs for licensing reasons. You can run 30 LPARs on a single blade which aids significant consolidation. So you can mix and match within the same chassis. Note that in the case of Oracle, you still need to license all the cores on a blade, but with that shouldn’t be too much of an issue if you consolidate a number of instances to a single or a few blades, and then use Hitachi hardware protection to automate identity failover if a blade fails.

SMP

What about if you want to run a super sized server that requires more than a dual core footprint which is common today. You can do that too – with up to 80 cores in a single instance.  Within our customer base these use cases are not uncommon.

What else can we do ?

N+M

N+M allows you to do two things:

  • Protect up to four blades with a matching set of four blades. If the hardware dies N+M kicks and a substitute blade will take the place of a first team player. This is called Hot standby N+M. The only potential limitation is the requirement for dedicated standby nodes. A use case could be a large SAP HANA instance – I saw one in a bank two weeks ago – the 80-core example I mentioned already.
  • Protection for a group of up to 25 blades. This is called cold standby. In this case you could have 22 blades configured running production with SAP, Oracle, VMware and other apps in separate clusters. If one fails the 3 standby blades will replace the failed blade, mirroring it from a hardware perspective.

Nesting

What if you want to do something like Hu Yoshida described in his blog here in terms of performance benefits of the new Haswell class process and VMCS functionality :

HDS Blogs: Hitachi LPARs Provide Safe Multi-Tenant Cloud With Intel E5v3

This resolves the performance impact of multiple VMM running on the same hardware.

Conclusion

Why should you care ???

This is only significant because it reduces lock-in for customers and allows them to do more with less, by satisfying many use cases with a single uniform form factor and solution type. As a Virtualisation Architect this is always what I like to see – Options!. Options are what allow you to make smart design decisions. And as we all know use case should drive all decisions and there are some real genuine use cases here.

This is particularly true in what I perceive as a final battleground in the quest to achieve 100% virtualisation in many enterprise customers, and that is software licensing.

My future posts will cover some real use cases with real examples. Until then ……. Enjoy your cake 

It’s all about the Community: Let’s talk storage at UK VMUG [minus Koolaid]

Next week I’ve been asked to run a Community round table session at the UK VMUG.

Not a word of a lie when I say this is a real honour for me. I haven’t done many of these so to be able to get involved and help out at this is a real pleasure.

This is in Birmingham at the National Motorcycle Museum.

It’s an awesome session with some really great sessions and speakers like Duncan Epping, Frank Denneman, Cormac Hogan, Alan Renouf, Joe Baguley, not to mention Chris Wahl. So an amazing event with almost 500 people.

The sessions are superb: Check them out here:

 http://www.vmug.com/p/cm/ld/fid=5166 

It’s a kind of open-ended session called Storage For Dummies. It could be shallow or deep – the idea is just to foster a conversation and see where it goes.

So let’s try the old Conceptual -> Logical -> Physical top-down approach

It is a community session so I can personally guarantee there will be no Hitachi Koolaid on display.

I have huge respect for the VMUG UK team (Alaric, Jane, Simon etc)  and have many many friends in the UK VMware Virtualisation community. So I can’t wait to meet up with them over there.

On Monday night we have the vCurry which in Birmingham is always memorable so looking forward to the mandatory amount of socialising with good buddies old and new.

Hope to see you all there :=)

Hitachi at VMworld 2014: Introduction

Looking forward to meeting many VMware community peers and friends next week at VMworld in Barcelona. As always it promises to be an awesome week.

This year I’m working for Hitachi Data Systems so without being too pushy I want to increase awareness of our  fantastic integration with VMware Solutions as well as our partnership with VMware. VMware and SAP are two of our key global partners that you will see much tighter integration with in the future.  I will be posting some further information on SAP HANA and why I believe we can do things no other server vendors in the industry can do today, using our Mainframe-derived LPAR hardware partitioning technology. More on that later :-)

Back to VMworld …..  In that regard I’d like to recommend some genuine sessions for those in the community wishing to find out more about some really cool tech.

Powerpoint Light

You will find our approach much fresher and more interactive. Come talk to experts and discuss your requirements using whiteboarding and pen and paper instead of being powerpoint heavy.

My picks!!!

Firstly !!!! UCP Director is the most awesome management and orchestration software in the industry today. It is part of our Unified Compute Platform converged stack which is a solution based on Hitachi storage but also our servers and Ethernet switches from Brocade or Cisco.

Come to the booth and catch a demo …. See how it easy it it to drop an ESXi image onto a blade and watch it boot, right from inside vCenter.

How you can compare VLANs on your Ethernet switches and identify mismatches with your virtual switches ???. Also wouldn’t it be great if you  could remediate these differences from inside vCenter ?.  Using UCP Director you can do that for vSphere (and soon for Hyper-V) without using another tool.

Wouldn’t it be great to be able to upgrade SAN Switch firmware and a smart software engine decides based on path allocation what the sequence should be ? Well UCP Director has that intelligence built in and can even decide which front end array port to map/zone based on existing SAN topology and port assignments and usage.

I’ve previously blogged here …… http://paulpmeehan.com/2014/07/11/vmetro-storage-cluster-hds-rise-storage-virtual-machine/
about vSphere Metro Storage Cluster and our Global Active Device (GAD) active-active array topology for customers, without the use of appliances. Come talk to some of our product managers and find out more about this unique offering in the industry and see a demo.

Breakouts: VVols and UCP Director

In terms of Breakout Sessions my countryman Paul Morrissey is Hitachi VMware Global Product Manager and will be presenting a deep dive on VVOLs and discussing how Hitachi plans to address this awesome opportunity.

STO2752-SPO – VVOL Deep Dive
‒Implementing and managing VMware VVOL with Hitachi Data Systems.
Date: Tuesday, October 14, 17:00 – 18:00
Speakers: Suzy Visvanathan – Product Manager, VMware & Paul Morrissey – Global PM, Virtualization, Hitachi Data Systems

The next session will give you a great overview of UCP Director for those who don’t like to get too close to vendors :-) with a slightly more focused session on vCOPs.

INF2920-SPO – Making vCenter and vCOPS from VMware smarter
‒Unified management and deeper insight for all physical and virtual
‒resources with UCP Director.
Date: Wednesday, October 15, 12:30 – 13:30
Speaker: Hanoch Eiron – Solutions Marketing, Hitachi Data Systems & Ruchit Solanki – Product Manager, Hitachi Data Systems

Meet the Experts

You can come to our booth and speak to subject matter experts about the following topics.

  • Unified Compute Platform Director integration into vCenter and vCOPS
  • Simplified Disaster Recovery with UCP Director Operations Center and VMware SRM
  • Active/Active Datacenter: Global Active Device for vSphere Metro Storage Cluster
  • Policy Based Storage Management: Hitachi VVOL integration
  • Data Protection and View Composer hardware offload
  • Storage workflow automation with the Hitachi vCenter Orchestrator Plugin
  • vCOPs Management Pack for Hitachi Storage platforms
    Hitachi Storage Management Tools
  • End User Compute Solutions: VMware Mirage and Hitachi Storage/HNAS
  • Virtualizing SAP
  • Cloud Automation Suite powered by Ensim

I hope you can take the time to find out about our solutions. Hitachi is focused on bringing solutions to the market that are enterprise-grade and add value and simplify management.

For anyone who wants an introduction to any of the awesome team we’ve assembled get in touch and I’d be happy to oblige.

Viva Barcelona !!

Thoughts for Today, Ideas for Tomorrow