Wednesday, 22 February 2012

Compellent iSCSI Optimisation on Windows

We now have our shiny new pair of Compellent Storage Center's installed, up and running.
This means that I get to spend some quality time testing, benchmarking and evaluating them before they go into production.
Previously I mentioned that the Compellent's were too smart for SQLIO which really disappointed me no end.
Not only did we use it as a tool to compare arrays through the purchase decision, but I find it significantly easier to test with than IOMeter.

Owing to a suggestion from @tonyholland00 (thank you Tony!!) it turns out that if you pre-allocate the disk and turn off caching, you can get realistic results from SQLIO again. Now most other array admins would cry out in horror at the idea of turning off caching on their arrays, but that's because their arrays are completely dependent on their write cache to deliver their advertised performance. Compellent arrays are different - they only have a tiny (512MB) write cache, and are properly architected around the number of spindles required for their target workload and don't require cache as a crutch. In addition, when testing the SSD Tier we have, the SSD's are actually faster than the cache. Pre-allocation is slightly out of the ordinary, but if that last 1-2% is that important to you for bulk loading, you would be pre-allocating your LUN anyway!

So there I was merrily testing away, and I wanted to both confirm the performance I was promised in our POC, and to see how far I could push one of these arrays. I'll definitely do some more posts on my findings, but I came across a bottleneck that may be common in many iSCSI configurations.

We are using 10GbE and I configured a blade with 4 ports, but was only able to get 2.4GB/sec even on bigger block sizes. Interestingly, the performance data looked almost identical to the POC data I received from Dell, but I assumed they had only used 2 HBA's.
All targets were connected, but I couldn't get as much info on the connections as I had been getting with my EQL arrays, as there is no connections tab comparable to what you get with the EQL HIT kit.
After a bit of investigation, it turned out that we were effectively getting one session on each fault domain (VLAN), and hence restricting us to just over 20Gb.

The solution is to manually create a session for each NIC to each target. Through the GUI, you can go to each connection, go to the properties, and add a new session. (Remembering to enable multi-pathing both when you create the initial connection, and when you add the additional session).
Now that's great if you are setting up a single server, going to a single array, but it's going to get very tedious when you are doing large numbers of servers, and connecting to more than one array.

We have 2 arrays, dual controller, each with 6 ports - resulting in 48 sessions to be manually configured. That's 10 min of tedium per server that I want to avoid. Thankfully, I found this link:
You configure your list of targets once, and then on each server simply choose which nics you want to use. Run the batch file, reboot and you should be good to go.

After configuring the additional sessions, I was able to max out bandwidth on all 4 10GbE ports using a block size as low as 128K on a single LUN.
I suspect this oversight is common, as I think the same happened even in our POC setup, and it's not intuitive that you need to do it.
Coming from EQL, you expect it to be done automatically... the HIT Kit makes you lazy... :-)
Hopefully this will help even if you have fewer nics, by saturating all bandwidth available.

1 comment:

  1. Currently I work for Dell and thought your post is really impressive. I think server is a computer or device on a network that manages network resources.The information you shared with us is very useful, thanks for sharing with us.