Sunday, 6 November 2011

Storage Concepts - Part 1



SQL Storage 101

Having completed my session at SQLBits, I felt that there was too much information that I didn’t cover as comprehensively as I would have liked. So what I’m going to do is break up some of my session into a series of posts to try flesh out a high level understanding of SQL storage.
Welcome to the first part of SQL Storage for the Real World – Storage 101.

Disk IO

Disk IO are the physical reads and writes that make their way to the storage medium of your SQL server. The two types are writes and reads, and as their name entails, are either written to / read from the drive, from / into memory on the server. SQL server predominantly uses 8K – 64K block sizes, but these can be bigger when the server is doing large sequential operations such as read-ahead’s, backups, checkDB’s etc. In reality you will want to test your storage subsystem on 8k, 64K, 256K and possibly 1024K block sizes to get a feel for what it is capable of. I will expand on testing your storage, especially with SQLIO in the near future.

Read / Write Ratio

To understand the requirements that your storage subsystem will have to fulfil, you will need to find out what the ratio is between reads and writes under peak loads. The reason for this is, depending on the storage you choose, the number of writes the system can sustain will be drastically different depending on this ratio. For example, SSD’s generally have different specifications for reads and writes. Some can be drastically slower to write data, and some can be significantly faster. This will come down to manufacturer and model. Equallogic SSD’s for example have much lower write speeds than read speeds and as a result the specified IO performance of a member is highly dependent on the R/W ratio. On the other hand, FusionIO PCI SSD’s are able to write much faster than read, so read speed may well be the limiting criteria.
On the other hand, individual hard drives have more symmetrical R/W ratios, but are generally used in RAID configurations. Depending on the type of RAID, this can have a massive effect on the number of writes a set number of disks can handle. For example, a RAID 0 stripe has no overhead, and hence the write throughput should be roughly equal to the sum of the number of drives in the stripe and also be equal to the number of reads possible. On the other hand, RAID10 arrays have to write each block twice, and hence are only able to write at half the number of IO’s that they are able to read. This gets progressively worse for RAID 5 and RAID 6. This phenomenon is termed the RAID Factor. Some vendors have put in place mechanisms to mitigate the impact of this – Netapp is an example that comes to mind.
To determine your read/write ratio is relatively easy – simply run perfmon over the busiest periods of disk activity, capturing reads/sec and writes/sec, put this into excel and work out a 95th percentile average. It is common to see a 70/30 read/write ratio on an OLTP or mixed workload. However this can skew massively in both directions - on one of my systems I have seen a continuous 45/55 ratio - meaning that there were more writes than reads.

Random vs Sequential

There are 4 main pillars of disk performance:

·         Random Read
·         Random Write
·         Sequential Read
·         Sequential Write

Each of these profiles needs to be tested and validated against what you will be expecting from your system. Depending on your requirements, some types of data access will be more valuable than others. For example: sequential writes are extremely important for log files, while random reads are more important for OTLP data files. On the other hand, data warehouses generally prefer sequential reads. The good / bad news is that no load is 100% random and no load is 100% sequential and I’m not aware of an accurate way of measuring random vs sequential. You can argue that some access types will be more prevalent in certain types of databases, but remember that almost all disk access is abstracted or virtualised in some way. For example, some SSD controllers cache and then compress incoming writes before they are stored in NAND, more than likely your SAN is not storing your writes in the exact order they came in, on a single disk. In the world of hypervisors, almost all access will end up being random in any case.
As a result, all you can really do is understand what your system is capable of, so that you can firstly check that it is performing as you expect it to, and secondly test your application and see if its performance is sufficient for your needs. How do you know if it is sufficient?

That's where IO latency comes in, and that will be covered in the next post.

No comments:

Post a Comment