Sunday, 6 November 2011

Storage Concepts - Part 1

SQL Storage 101

Having completed my session at SQLBits, I felt that there was too much information that I didn’t cover as comprehensively as I would have liked. So what I’m going to do is break up some of my session into a series of posts to try flesh out a high level understanding of SQL storage.

Welcome to the first part of SQL Storage for the Real World – Storage 101.

Disk IO

Disk IO are the physical reads and writes that make their way to the storage medium of your SQL server. The two types are writes and reads, and as their name entails, are either written to / read from the drive, from / into memory on the server. SQL server predominantly uses 8K – 64K block sizes, but these can be bigger when the server is doing large sequential operations such as read-ahead’s, backups, checkDB’s etc. In reality you will want to test your storage subsystem on 8k, 64K, 256K and possibly 1024K block sizes to get a feel for what it is capable of. I will expand on testing your storage, especially with SQLIO in the near future.

Read / Write Ratio

To understand the requirements that your storage subsystem will have to fulfil, you will need to find out what the ratio is between reads and writes under peak loads. The reason for this is, depending on the storage you choose, the number of writes the system can sustain will be drastically different depending on this ratio. For example, SSD’s generally have different specifications for reads and writes. Some can be drastically slower to write data, and some can be significantly faster. This will come down to manufacturer and model. Equallogic SSD’s for example have much lower write speeds than read speeds and as a result the specified IO performance of a member is highly dependent on the R/W ratio. On the other hand, FusionIO PCI SSD’s are able to write much faster than read, so read speed may well be the limiting criteria.

On the other hand, individual hard drives have more symmetrical R/W ratios, but are generally used in RAID configurations. Depending on the type of RAID, this can have a massive effect on the number of writes a set number of disks can handle. For example, a RAID 0 stripe has no overhead, and hence the write throughput should be roughly equal to the sum of the number of drives in the stripe and also be equal to the number of reads possible. On the other hand, RAID10 arrays have to write each block twice, and hence are only able to write at half the number of IO’s that they are able to read. This gets progressively worse for RAID 5 and RAID 6. This phenomenon is termed the RAID Factor. Some vendors have put in place mechanisms to mitigate the impact of this – Netapp is an example that comes to mind.

To determine your read/write ratio is relatively easy – simply run perfmon over the busiest periods of disk activity, capturing reads/sec and writes/sec, put this into excel and work out a 95^th percentile average. It is common to see a 70/30 read/write ratio on an OLTP or mixed workload. However this can skew massively in both directions - on one of my systems I have seen a continuous 45/55 ratio - meaning that there were more writes than reads.

Random vs Sequential

There are 4 main pillars of disk performance:

· Random Read

· Random Write

· Sequential Read

· Sequential Write

Each of these profiles needs to be tested and validated against what you will be expecting from your system. Depending on your requirements, some types of data access will be more valuable than others. For example: sequential writes are extremely important for log files, while random reads are more important for OTLP data files. On the other hand, data warehouses generally prefer sequential reads. The good / bad news is that no load is 100% random and no load is 100% sequential and I’m not aware of an accurate way of measuring random vs sequential. You can argue that some access types will be more prevalent in certain types of databases, but remember that almost all disk access is abstracted or virtualised in some way. For example, some SSD controllers cache and then compress incoming writes before they are stored in NAND, more than likely your SAN is not storing your writes in the exact order they came in, on a single disk. In the world of hypervisors, almost all access will end up being random in any case.

As a result, all you can really do is understand what your system is capable of, so that you can firstly check that it is performing as you expect it to, and secondly test your application and see if its performance is sufficient for your needs. How do you know if it is sufficient?

That's where IO latency comes in, and that will be covered in the next post.

Tuesday, 18 October 2011

Equallogic Automated Performance Load-balancing

About a week ago I was troubleshooting some latency issues on our Equallogic SAN, and came across a nice illustration of APLB in action. The basic test was this – run a consistent load on a LUN to see if we could reproduce the latency spikes I was intermittently picking up on our SQL instances. That way we could determine if it was a constant issue, or something to do with the load that SQL was putting on the SAN.

Our SAN is made up of the following: 1 x PS6510X and 3 x PS6010S units. All 10GbE, running on their own set of switches. We upgraded to FW5.1.1-H1 about a month ago (just before H2 was released) and we had enabled load balancing to allow unused SSD capacity to enhance our SUMO.

IOMeter Setting:

8K request size

Single worker

100% Read

100% Random

1 outstanding request (To make sure it wasn’t going to affect the rest of the SAN)

10GB test file (To make sure it wasn’t cached)

LUN on the SUMO (to begin with…)

Initially, we were seeing reasonably consistent performance and also managed to get one some of those spikes to appear.

Avg latency - 8ms

Avg IOPS – 180

I then left this running overnight to see if the issue became more obvious over time.

In the morning, it looked like this:

Avg latency - <1ms

Avg IOPS - 1K+

What had happened is that APLB had kicked in and over time realised that some blocks in this LUN were doing more work than some others that had been migrated to SDD member, and then proceeded to move these over. By morning, the entire 10GB LUN had been moved and was running an order of magnitude faster than it was the day before.

You may be wondering why there were only 1K IOPS when this was sitting on SSD, but remember that I only had a single request outstanding, and therefore was more than likely being throttled by the network latency rather than the SSD storage itself.

To be fair, EQL SSD units are not the fastest – in fact I would go as far as to say that they are one of the slowest rackmount 10GbE SSD implementations out there. However, they are faster than any configuration of HDD’s you could fit into the same form factor, and we purchased them on the basis that they were good enough for what we wanted to do with them at the time.

So in conclusion:

· Equallogic Load balancing across SAN members works, and does so especially well on consistent loads (The jury is still out on volatile workloads)

· Mixing SSD and HDD models can have large benefits, if particular if there is a subset of data that you want to accelerate of a large LUN, but don’t want to put the whole LUN on SSD.

· The load balancing happens automatically – you just turn it on for the group and it will do the rest. This is both a positive and a negative – it works but you lose control of the placement of the data. You can use volume pinning to fix a volume to a member, but that’s another story …

· Load balancing is not instant – it will take time to kick in. It appears to me that capacity balancing kicks in first, and then performance load balancing kicks in afterwards.

Wednesday, 5 October 2011

My final slide deck from SQLbits 9.
http://bit.ly/qtlLyM

Thanks to all who attended - please send in your feedback on my session.

Sunday, 25 September 2011

SQLBits Slide Deck

Here is my provisional slide deck for my SQLBits 9 session:
SQL Storage for the Real World:

Any feedback will be appreciated!
I will be presenting in the Aintree room, on Saturday, at 16:00 - hope to see you there!

Wednesday, 21 September 2011

SQLBits Prep...

An overview of what I’ll be covering in my SQLBits session

I have pretty much finished up my slides for my session at SQLBits, and now I’m going over the content I want to discuss for each slide. To give you some idea of what I’m going to discuss, here is the summary page from the session:

Requirements
Storage concepts
Storage options and priorities
SAN / DAS
HA / DR implications
Decide on a storage strategy
General guidance

It feels fairly easy to talk about the storage concepts and technologies for ages on end, but the problem is that it can come across as a little dry. I did a quick run through with my team last week and I nearly put some of them to sleep. So this week and next is all about making the content interesting!

I was originally going to avoid using vendor names and naming products, but then the whole discussion becomes very theoretical and difficult to relate to. Therefore I’m going to try to be fair and balanced with everyone I discuss and not mention any products that I really don’t like...

I promise to make the session as informative and interesting as possible!

Tuesday, 20 September 2011

The reason for my silence is...

My submission for the SQLBits 9 conference (Query across the Mersey) has been accepted!!!
Been spending a lot of time working on my presentation, so hopefully it will go well!

I have never presented at a conference before, so I'm a little anxious but also very excited at the prospect!
My session is the last session on Saturday (community day), so if you can make it I would be most grateful!
It is titled: SQL Storage for the real world.

If it goes well, I will expand it into a series of articles here on my blog.

More info on the conference:
http://sqlbits.com/default.aspx

And this is what I was talking about...

In my previous entry I mentioned that you could build a SAN that performed better and cost less than your commercial SANs out there. Now I'm not going to make light of doing so, but here is an article to validate what I was trying to say:

http://www.theregister.co.uk/2011/09/20/nexenta_vmworld_2011/

"Cheap and cheerful storage from Nexenta matched EMC and NetApp's multi-million-dollar systems in VMworld 2011's Hands On Lab, and took up some of the slack from its rivals when difficulties arose on the first day. EMC and NetApp have cooly brushed off Nexenta's claims it outperformed them."

Really good to see some competition out there, as well as some proof that there isn't too much magic underneath the SAN covers...

Scatterling DBA