I have been working lately with vmware & storage in environments where presentation of storage at different layers within the stack and the ability to report and understand usage and over-provisioning is a difficult challenge. In Vsphere 4.X this could be a ticking time bomb – at least that’s been my recent experience – so I’m blogging about how this can adversely impact space utilisation in an environment. It’s neither the fault of either vendor but does outline how the upcoming VMware storage capabilities in Vsphere 5.X and where I see storage moving, are so necessary. More on those on a future blog.
Firstly I am going to refer to Chad Sakac at EMC for his excellent description of what thick and thin provisioned storage objects are, and where each can be used. Please read this if you need to understand more on what this technology is an how it works:
The key thing from that blog that is relevant here is that zeroedthick VMDK files are effectively thin provisioned from a VMware perspective i.e. blocks on the VMFS filesystem are only zeroed on access, not on creation of the virtual hard disk or VMDK file. More on that later….. but BTW the only way to make them eager zeroed is at the commvault line using vmkfstools (in this version of VMware).
# vmkfstools -d eagerzeroedthick -i <virtual-disk-source>.vmdk <virtual-disk-target>.vmdk
Last thing is …..Let’s assume VMware and the SAN vendors best practice white paper for the specific environment states that we must use standard thick provisioned virtual machines on top of LUNs presented from the SAN as thin provisioned.
For the purpose of this blog, let’s take the following scenario:
We want to present a 1TB datastore to an ESXi 4.1 cluster to add a data volume to an existing VM called oradb01. We have a fibre channel SAN and as per best practice above, use thin (dynamic) provisioned pools at the storage layer and thick on top. As per Chad’s blog, this means a regular zeroedthick disk/VMDK file is used by the VM. This is the only option currently available in the GUI, but this has changed in Version 5.0 where a lot of improvements will occur.
The steps are as follows, with some description of the state of the presented storage objects along the way.
- Create the LUN on the SAN. We will call this entity a “virtual volume” and name it ESX_DATA01.
- As this is a thin volume we are only touching the blocks on raw disk on first write. Only at that point are disk blocks/sectors initialised and flagged as “used”, to put it simply.
- Now we present to the ESXi cluster, and create a VMFS3 filesystem on top
- At this point we now have our 1TB datastore presented and ready for use – call it DS_DATA01
- The only blocks actually written to the virtual volume I mentioned are concerned with metadata, akin to a superblock on a UNIX filesystem. But it will be a small number
- At this point the SAN and Vcenter will report the same block count
- We now add the new virtual hard disk to the VM. Let’s say it needs to be 500GB
- At this point the VMDK is created inside the folder on the VMFS volume.
- In Vcenter, used space and provisioned space will be the same and Vcenter thinks all blocks assigned to oradb01 are used
- On the SAN almost no blocks will be used.
- Now let’s say we create a database file without using instant file initialisation – so we touch all the blocks on disk.
- Vcenter does not report any change from the previous state where used=provisioned
- The blocks on the virtual volume on the SAN are now flagged as used, so about 500GB used space.
So everything is hunky dory. Now let’s say we need to move the data volume to another datastore for performance reasons. What happens to the space ?????
We storage vmotion the 500GB Data VMDK file to another datastore. Let’s called this DS_DATA02
- Vcenter now reports datastore DS_DATA01 as almost empty, again apart from the metadata.
- The SAN still thinks that all blocks on DS_DATA01 are used. Why would it think otherwise – it doesn’t know anything about the blocks being deleted at the VMFS level by the ESXi host
- So we have a 1TB virtual volume with 500GB flagged as used.
- Now let’s say we attach a 300GB hard disk to another VM with a database.
- Vcenter will report 300GB used on the datastore
- The SAN will still think that around 500GB is used
- Let’s now pad out the new data volume with a fully initialised database file
- Vcenter still thinks there is 300GB used
- The SAN will need to allocate blocks from the remaining empty space in the virtual volume.
- The SAN will now think that 800GB is used
- Let’s again now storage vmotion the 300GB data VMDK to another datastore called DS_DATA03, again for performance reasons.
- Vcenter will report almost no space used on the volume
- The SAN will report 800GB used
So we now have a scenario where we can end up with an out-of-space condition on this virtual volume. Effectively it can become overprovisioned in Vcenter, even though we all think we are not using thin provisioning.
There is no way to automatically recover from this condition in Vsphere 4.1 !!!.
The effective solution to this problem is to use zero page reclaim technology on the SAN in conjunction with vmkfstools at the command line. What this does is get the SAN to hunt for zeroes at the virtual volume level and where they are found, reset the markers saying these blocks are used. This is how it’s done:
- We need to first create a “dummy” hard disk – let’s make it 800GB and use regular zeroedthick format.
- We need to attach it to a dummy VM – let’s use an old converter machine not used anymore. Called converter01.
- Let’s assume the VMDK file is actually called converter01.vmdk. This will live on DS_DATA01 in the folder “converter01”.
- The trick now is to “re-zero” the blocks on the VMFS filesystem so we can later reclaim them on the SAN
- We need to power off the VM to allow this operation to succeed
- We go to command line and use vmkfstools -w to write zeros to all blocks on the dummy VMDK. This is a destructive operation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Do not perform on a production volume !!
- Go to the folder containing the dummy VMDK file on any ESXi host that can see DS_DATA01 and run……….
- path # vmkfstools -w converter01.vmdk
- ………………………This will zero out all blocks at Vcenter level
- Now ………..
- Vcenter still reports 800GB used.
- The second dummy hard disk should now be removed from the VM. You must need to select “remove files and deletes files from disk” to remove the underlying VMDK file from the folder
- Now Vcenter will report almost no space used … just metadata
You can now go to the SAN and run a zero page reclaim on this volume. This will return the used blocks to the free pool within the dynamic or thin provisioned pool…..
Note that this technique is also very useful where free space in a datastore may have previously been written to. In this case, like above, the blocks will report as used on the SAN, and free in Vcenter. So you add the dummy hard disk into the free space, attach to a dummy VM and write zeros in there. Then you run you reclaim page job on the SAN. This is a good quarterly activity to ensure a more accurate depiction of what is used in Vcenter and reported as used on the SAN.