RAID mumbo jumbo

Ok, this blog post will be filled with technical jargon and is quite engineer friendly, and not particularly reader friendly. Sorry about that. I'll do something campy and silly on Monday to compensate.

I'm sure when I put RAID mumbo jumbo as the title, you prolly thought I was going to talk about raiding in WoW. Ha! fooled you!

I just came out of a meeting with some salespeople from a consulting company that is trying to sell us some storage. Their Sales Engineer made the statement that "The proprietary version of RAID-6 that [company name withheld] is offering is EQUAL to the performance of RAID 1+0".

I am flabbergasted by the statement. So much so that I asked him to repeat it twice.

I need to explain how very wrong this statement is, so to do that I have a quick overview of RAID terminology:

RAID = Redundant Array of Inexpensive Disks. RAID is used because, quite frankly, in any given computer system the Hard Drive is absolutely the item most likely to fail. The reason? It has moving parts that are constantly moving while the system is in use. Any item with moving parts will fail, it is only a matter of time. You can extend that time by days, weeks, even years with good engineering, but it will fail, guaranteed.

So, you have a ticking time bomb on your hands that is guaranteed to blow up eventually, you just don't know when. On a PC, it isn't usually too big a deal when it fails. Unless of course you aren't backing up your data. You are backing up your data, right?

In the world of Enterprise IT applications though, backups aren't really enough because even though you have a backup, you have to repair the hardware and restore that backup, right? This could take hours or even days in some cases. In applications that people are relying on to do their jobs, their jobs can't be put on hold every time a drive fails while they wait for an IT guy to fix the hardware and do a restore.

This is where RAID comes in. You package the disks all together, and you add in the capability to restore the data should you lose a disk (or multiple disks, depending on the RAID method you choose).

Speaking of that, here are the methods you have to choose from (well, many more besides this, but these are probably the most common).

RAID 1+0: Mirrored and striped. Mirroring (also referred to as RAID 1 by itself) means that whatever data you have to store on the disk, you also store on an identical disk. That way if you lose a single disk, you don't lose the data. That part is pretty easy to understand. Striping (RAID 0, referred to by itself) is a breaking out the data into smalle chunks (or stripes) and putting it across all of the disks. This increases the performance of the disk.

Let me illustrate that because it is important. Lets say, I need five non-sequential 8kb chunks of data off a disk. If I go to a single disk for that data, that single disk has a seek time of (for the sake of this example) 10ms. I have to do this 5 times on the single disk, so that means I have a total of 50ms to service that request.

In the case of a striped system, I could make the request, and that data could actually exist 1, 2, 3, 4, or if I was extremely lucky, 5 disks in total. Each disk still has a 10ms seek time, so unless I'm extremely unlucky and all of the data chunks I want all exists on a single disk, then I stand to gain performance. If it exists on 5 disks, then all 5 requests would be serviced in 10ms, not 50ms.

In the RAID 1+0 configuration in fact, because the same data exists on more than one disk, I am always doing concurrent seeks, further increasing the overall performance.

Of course, this great performance comes at a price. Essentially, all of your storage costs double. For 100gb of usable storage, you have to buy 200gb of usable disk.

Enter RAID-4 (or RAID 5...the concepts between the two technologies are similar enough for our purposes).

RAID-4 solves the problem of doubling your storage costs by the use of parity. The easiest way I can explain parity is this: Let us say you have 4 disks in total, each of them are striped.

The chunks of data in this example are going to be expressed by numbers.
Disk 1: 1
Disk 2: 3
Disk 3: 29
Disk 4: 19

Lets say the numbers represent your data. In a standard stripe, if you lose Disk 3, you lose the number 29. Period. The system has no way of knowing what the number was....your data is now corrupt, you are completely screwed.

What RAID-4 does is add another disk into the equation. This disk contains the 'parity information'. What the parity is (basically) is the sum of the stripes on disks 1-4. So, to calculate that out, 1+3+29+19 = 52.

Now you lose Disk 3. You can replace the disk and it can figure out what the data was from parity: 52 - 19 - 3 - 1 = 29....congratulations, you have no data loss.

The PROBLEM with this approach is a performance loss. To calculate the parity, you have to use CPU time every time a piece of data changes on the disk. In the case of my example, the CPU has to calculate the sum...this takes precious time. Lets say that just the write itself takes 10ms. In a mirrored configuration, you are done after that 10ms. But if you have to calculate parity, and that takes 1ms...your total time to service that disk request is now 11ms.

RAID 6 is a little addition to RAID 4 that is a little difficult to explain, but basically it doubles the parity. Where RAID 4 does horizontal parity (pretty much the process I describe above), RAID 6 adds on diagonal parity, which is more difficult to describe without the use of a chart. The point is, that it also has to calculate parity (twice in fact, so whatever CPU overhead there was for RAID 4 is effectively doubled).

My point in all of this is that to say that any implementation that uses parity is at all equivalent (performance wise) to an implementation that does NOT use parity, is ridiculous and I can't believe a storage 'engineer' would repeat such crap. Parity requires overhead, plain and simple. It may be not be COST EFFECTIVE, but RAID 1+0 will always be faster. Always!

I hate Salespeople (and I can pretty much guarantee they hate me at this point. :P)

Comments

1 Response to "RAID mumbo jumbo"

Jac said... August 19, 2009 at 12:40 PM

Diagonal parity sounds nasty if you end up in the middle of the martix. Does it go in both directions? You should tell them you want a snakes and ladders style system...


Oh, and you're such a nerd :D

Post a Comment