Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/29

Why RAID is (usually) a Terrible Idea

Written on February 5, 2007 by Jon Bach
Share:

Introduction

As President of Puget Custom Computers, I get a unique perspective on computer products and technology. Our company specializes in selling high performance custom computers, and that naturally brings up the question of RAID often. There is an overwhelming opinion out there that if you have the money and want a blazing fast and stable computer, that you should put your hard drives in RAID. We have known for years that this perception is just flat out wrong, but the problem is that the idea is so widely accepted that it is nearly impossible to convince our customers otherwise. In fact, if we try too hard to talk them out of it, we end up losing the sale! So, should we be selling configurations that we know are flawed, for the sake of making the sale? We think the answer comes in the form of transparency and education! This article is just the latest effort in educating the public about RAID.

To be clear, there are definitely times when RAID is a good route. I will get into that later, but I need to start by saying that I am not talking about enterprise servers. That is a completely different type of computer. I am talking about home desktops, enthusiast computers, and professional workstations.

What is the Problem?

So what do I have against RAID? I have no problem with the concept. The problem is that for 90% of the people out there, it isn't a good idea. Since high end enthusiast machines are some of our most common builds here at Puget, we frequently are asked to build machines with RAID when it doesn't make sense. I am here to say that those RAID configurations account for one of the biggest sources of frustration for our customers, as evidenced by the fact that they make up a very large portion of our support tickets. That is the root of my problem with RAID -- I see the frustrations it causes all the time. When I weigh that against the benefits of RAID, I just can't recommend it except in very select cases.

The underlying problem with RAID is the fact that by using it, you are making your computer significantly more complicated. Not only are you now relying on two hard drives to work properly, but you are also relying on a much more complicated controller (the RAID controller). If any one of those items has a problem, the array fails. If one hard drive has a problem even for a moment, you have a degraded array on your hands. At that point, you are relying on the RAID controller for error correction and array management, and the fact of the matter is that all (yes, all) RAID controllers onboard a motherboard are low quality. They have been added to the motherboard as an afterthought -- a feature added simply because the manufacturer knows that if they add any feature they can, they're more likely to sell their product. At at a time when nearly every modern motherboard has built in RAID, they have to offer it just to be considered as feature rich as their competitors.

RAID1 (mirroring) for Data Loss Protection

One of the commonly accepted beliefs is that if you want your data to be ultra-secure, then a RAID1 array for hard drive redundancy is the best route. This an area where the first hand data we have accumulated gives me a very strong argument to the contrary. As we sell hundreds of computers a month, our self-built administrative systems log every single failed piece of hardware and trouble ticket we open. With this, we can see hard data on how often hard drives fail, and how often a RAID array has problems. This is really useful information to look at! Here is the data I have for our hard drive sales in the last year, where we have sold at least 200 units:


Hard Drive# Units SoldFailure Rate
Seagate Barracuda 7200.9 250GB SATAII2803.21%
Seagate SATA Barracuda 80GB2712.58%
Western Digital SATA Raptor 74GB5922.03%
Seagate Barracuda 7200.10 320GB SATAII2021.98%
Seagate Barracuda 7200.9 160GB SATAII2651.89%
Seagate Barracuda 7200.9 80GB SATAII4031.74%
Western Digital ATA100 80.0GB WD800JB2901.72%
Western Digital SATA Raptor 150GB2781.44%

When I look at those numbers I see excellent reliability. Specifically, the Western Digital Raptor hard drives impress me. We sell a huge amount of those drives, and have only had a handful fail. In fact, two of those failures were our fault -- one we wired incorrectly and fried, and the other we dropped on the concrete warehouse floor...so technically, the Raptor failure rate should be a bit lower. Impressive! Neither of these damaged hard drives ever even left our facilities, obviously.

Unfortunately, it is not as clear of a number when it comes down to how many RAID failures there have been. Since it is not a black and white failure issue, I do not have hard data. However, at the agreement of our support staff, I estimate that anywhere from 25% to 30% of our customers with RAID will call us at some point in the first year to report a degraded RAID array or problem directly resulting from their RAID configuration. Granted, a failed RAID1 array does not mean data loss, but it certainly means a long, frustrating hassle. On the other hand, a single hard drive will often give warning signs before failure, so that scenario doesn't necessarily mean data loss either.

The real question is: Is RAID1 really worth being 15-20 times more likely to have a problem? Keep in mind, RAID1 does nothing to protect you from:

  1. Accidental deletion or user error
  2. Viruses or malware
  3. Theft or catastrophic damage
  4. Data corruption due to other failed hardware or power loss
So if you are going with a RAID array to protect your data, just look at the numbers, and make an informed decision. My personal recommendation is that if 3% or less is too high a risk of possible data loss, then get yourself an external SATA or USB hard drive, and run a scheduled backup. Not only does that get you nearly all the protection of RAID1, but it also protects you from the four things above. This leaves you even more protected in the end. Not only that, but it vastly simplifies your computer, leaving you literally 15-20 times less likely to have frustrating problems with your data storage.

RAID0 (striping) for Performance

It is generally well accepted that RAID0 carries a sizable risk of data loss. When that is acceptable, people will often desire RAID0 for the speed benefits. What is not understood is that the speed benefits are dependent on the type of disk usage. To make a vast simplification of the issue, there are two main factors in disk performance: access time, and throughput. Access time dictates how quickly a hard drive can go from one operation to the next, and throughput dictates how quickly data can be read or written. RAID0 does increase throughput, but it does absolutely nothing to help the access time. What does that mean? It means that if you are reading and writing a large number of smaller files, the performance benefit will be very minimal. If you are reading or writing a large amount of data at one location on the disk, that is where you will see a benefit. Therefore, in times where you are working with transferring or copying very large files, RAID0 can make sense.

Video editing is a good example of when RAID0 might make sense. Now, you still need to be sure that the speed of the hard drives is the right place to focus. For example, if you are editing a video, and when doing so your CPU usage is pegged at 100%, then you can be fairly certain that moving to RAID0 will not be a help, because you'll still be limited by your CPU. Therefore, video editing alone does not mean RAID0 will be useful...it has to be video editing in which your CPU or memory is not the bottleneck, which honestly is very uncommon. My personal recommendation -- do your homework. Do not take on the hassles of RAID unless you know it will be a help. Go into your research with the knowledge that it is actually very uncommon for RAID0 to be faster in desktop usage.

Don't take my word for it! Storagereview.com, an authority in hard drive performance analysis and discussion, has a whole page talking about RAID0.

The other time when RAID can make sense is if you need an extremely large volume of space. RAID0 or RAID5 would be able to let you combine drives together. If you're working with HD video and need to be able to save 2-3TB of data, then a RAID array is necessary to have a working space that large.

Other Types of RAID

Discussing RAID1 and RAID0 gives me the most well defined discussion points, but the same principles can be combined to apply to RAID5 and RAID10. Remember that I am not talking about server usage. With servers, RAID can often bring more benefits, though a large reason for that is because the quality of the RAID controllers in a server environment is often much higher.

RAID Controllers

If you do decide that RAID makes sense for you, make sure you use a quality RAID controller. Software RAID controllers are not good quality. Not only are they lower performance, which negates a major point of RAID, but they are much more prone to failure and bugs in general. Make sure you get a hardware RAID controller, such as one from 3Ware. The quality is higher, the performance is better, and when you have a problem with your array, the rebuilding and diagnostic tools are far superior. A good raid controller starts at $300. If you have a RAID controller on your $150 motherboard, what does that say about the quality? I know it is difficult to justify the cost of a hardware controller when your motherboard may already have a software controller onboard, but if you can't bring yourself to cover that cost, then I submit that RAID is not important enough to you.

Conclusions

It is quite obvious by now that in a desktop environment, I am dead set against RAID. Problems are common, and benefits are few. Just remember that I am taking this position due to experience. I even fell prey to the hype myself once! At one time I ran my office workstation on a RAID0 array. It had problems once or twice over the years, but because I kept a second hard drive for storage and I am perfectly comfortable with a Windows reinstall, it was never a large issue for me. I ended up moving back to a single drive because I didn't notice any performance difference.

My conclusions are based on benchmark data, as well as over six years running a custom computer company, a company whose target market is made up of all the people that are targeted by the hype of RAID. If you have anything to add, please email me! I'd be happy to consider adding it to this article.

youngbeard

I mostly agree, but wish you had more hard evidence of RAID failure (like the table about hard drive failures) than just employees saying "25-30%".

The simple version that I tell to my gaming buddies who ask me about building a system: A good backup schedule is way better than RAID 1. And, at best you'll get slightly faster load times with a RAID 0. Once the game/level/video is loaded and everything is in memory that RAID isn't doing you a bit of good. Only if you are swapping to HD would it give any speedup -- and if you're swapping in a game you will have crap performance regardless of your storage speed and you need to add more RAM, not a fancy RAID array.

Posted on 2007-02-06 18:15:50
LocutusEstBorg

All my data losses over the years were from RAID failures - array corrupted, array resize failed, array disappeared. I used top of the line $800 controllers with flash cache protection. Ironically I never had any disk failures in the arrays when using stackable vibration resistant drives like the WD Red.

Individual disk failures always gave me time to recover most of the data, but RAID failures nuked everything instantly.

Posted on 2016-04-14 05:16:30
Brian Buresh

With a proper backup, RAID wouldn't have caused you data loss. RAID isn't a backup, it's a designed for redundancy.

Posted on 2018-09-28 14:15:48
WaltC

Hello Jon! I actually signed up here to briefly comment on your opinions as expressed about RAID in general. So first, let me say that in your position and from your perspective I can certainly understand your comments. This about sums them all up for me:

It is generally well accepted that RAID0 carries a sizable risk of data loss.

...pretty much echoes what I consider to be a myth about RAID 0 and "the heightened risk" of data loss. I'll explain.

First of all, let's consider a situation where a person purchases two of these: Western Digital ATA100 80.0GB WD800JB's, just as an example. From your chart in the article, it has been your experience that this drive has a 1.72% failure rate in terms of how many of these drives your company has sold that have required replacement because they failed. Here's the point I want to suggest to you:

If your customer decides on purchasing two of these drives to be used as normal IDE drives instead of RAID drives, then based on your company's experience to date he's running about a 1.72% chance that either one or both of the drives he's purchased could be defective from the factory and fail prematurely. If it happens that one of the drives should fail while the other doesn't, then your customer will lose the data from that drive (assuming here no backups were done and so on) even though he may not even know how to spell the word RAID. If the drive that fails is a boot drive, then your customer will lose his boot drive and be unable to boot his system, and he'll have to reconfigure his second hard drive as the boot drive and reinstall his OS there if he wishes to boot his system at all prior to getting a replacement drive. All of this, of course, is possible even if no RAID was ever deployed in the system from the start.

The thing that most people do not understand in general is that two drives running as normal IDE drives are still the identical drives when configured under RAID 0. The drives themselves do not know about any differences between RAID and IDE, and they do not behave any differently at all when running as IDE instead of RAID 0, or vice-versa. Thus, the failure rate for each of the two drives running IDE is *exactly the same* as it would be for each of those same two drives running in a RAID 0 configuration. Single drives running in IDE have no fault tolerance, just as single drives running in RAID 0 have no fault tolerance. Again, the drives themselves just do not care whether you run them in IDE or in RAID 0--the drives have no preference--they run and operate exactly the same way regardless. This is an important point to remember when thinking about RAID and IDE drive operation.

OK, now, some people will say, "Aha, yes, but when running in RAID 0 you need two drives instead of one, and that means that your two drives, because they are two, are twice as likely to fail as the single IDE drive you could use instead of RAID 0."

Really? So, by that logic, then, the more components of a type that I have in my system, the higher the risk of failure accordingly? I do not believe, thankfully, that this is way things work...;) If things worked this way in reality, well, all of us would have a very tough time finding anything that worked to any degree of reliability...;) Here's what I mean:

Let's go back to your drive failure chart for a moment and consider the Western Digital ATA100 80.0GB WD800JB drive. According to your numbers you've sold 290 of those drives and yet the chance of failure for each drive you've sold is but 1.72%--which means that your company has seen ~5 (4.998...or so) of those 290 drives fail.

So, if we take the logic that says using two drives instead of one means that the possibility of a drive failing is 100% greater than the possibility of a single drive failing if only a single drive is deployed, then ought not the possibility of one of those 290 drives failing be 29000% greater than if your company had sold but ONE Western Digital ATA100 80.0GB WD800JB? Obviously, though, even though you have sold 290 Western Digital ATA100 80.0GB WD800JBs, the odds of a drive failure are only 1.72%, or roughly 5 drives out of the 290, so we can see plainly that deploying more than a single drive does not increase the odds of a drive failure by 100% x the total number of drives deployed. Does it?

And yet, this is the way that many people think about that issue, it seems to me. They think, wrongly, that if they have two drives in a system the odds of a single drive failing are 100% greater than if they had but a single drive. Obviously, the experience of your company in selling hundreds of these drives and, indeed, the experience of Western Digital in selling millions of these drives is *not* that the risk of drive failure accelerates by 100% for every additional drive you deploy after the first one. Allow me to suggest that if it was then neither you nor Western Digital could conduct business...;)

So what *is* the risk that *a* drive will fail, regardless of how many such drives you either sell or else deploy in a single system? You'll find that risk estimated by every hard drive manufacturer: it's called the MTBF-hours number. You'll also find that number listed plainly in the specifications each hard drive maker lists for each drive he manufactures and sells. So whether you have two drives in a system, five drives or ten drives, etc., that have a MTBF-hour rating of, say, 10,000 hours (just to throw out a number) then you can be assured that on average the manufacturer expects that *each* drive he makes and sells will run that long before failing. Of course, this is an average, and the actual drive you buy may last half that long or twice that long, but on average this is the kind of operational durability you should expect from each of your drives--whether you have one, or two or ten, etc. The *number* of drives in a system has, of course, no impact at all upon the manufacturer's MTBF-hours estimates of operational life for each of his drives that you own.

The second point that needs addressing is this: are you really any safer in terms of your data if you have a single 300Gb drive running as IDE than if you had two 150Gb drives running as one RAID 0 300Gb drive? Well, if it turns out that that the MTBF-hours estimates for the 300GB drive is the same as for the 150Gb drives--then the answer is "no." In that case, there is exactly the same risk between the 300Gb drive failing and one of the two 150Gb drives that comprise your RAID 0 drive. Indeed, in this scenario, if you lose you single 300Gb IDE drive, or you lose one of your two 150Gb RAID 0 drives, then you lose *all* your data, don't you? Likewise, if the 300Gb drive you buy to use as an IDE drive has a MTBF-hour rating of 50,000 hours, but each of the 150Gb drives you use in your RAID 0 configuration has a MTBF-hour rating of 25,000 hours, then you may expect that your IDE drive will possibly run twice as long as your RAID drives before failing. You could as easily reverse that to see your RAID 0 drives logically being expected to outlive your single IDE drive by twice as long, too.

So, it isn't the number of drives a user has in his system that determines the likelihood of a drive failure, it is the MTBF-hour rating that each drive has that is the only barometer for judging how likely it is that a drive will fail, and whether you are running your drives as IDE or running them as RAID 0 makes no difference whatever.

At home, for instance, I am in my *fourth year* of running RAID 0 configurations--and although I've used several different types of drives and RAID controllers over that span, I have yet to have a single RAID 0 drive failure. Conversely, though, in the years prior to that before I ever ran RAID 0 and was running either SCSI or IDE, I had two (that I can remember) drives fail--both of which were replaced under warranty by the manufacturer. If I thought like many people do about RAID 0 then certainly I would reach the conclusion that running IDE or SCSI was much more risky than running RAID 0--heh...;)--but of course I don't think like that so that's not what I think. If anything, I think that drives today are just made a lot better than they were a decade ago, and they just last a lot longer, too--and of course, whether I'm running them as RAID 0 or as IDE makes not a whit of difference there.

But, to answer your complaints about some RAID 0 configurations, I think you'll agree with me that the weakest link in a RAID 0 configuration is the RAID *controller* one chooses to use. In the last four years, since I decided to try RAID 0 for myself, I've used only Promise FastTrack TX RAID controllers at home--first the TX2200 and most recently the TX4200 (TX4200 is coupled to two Maxtor 300Gb SataII drives--it's two years now using the 4200 without a single drive error or failure. The TX2200 I moved to my wife's box, with a pair of WD100 0JBs, has been operating for *four years* without a failure of any kind.) OK, these are dedicated, hardware, PCI RAID controllers which I consider to be several steps above the mostly software-RAID-type controllers found on most motherboards these days as standard equipment. Yes, people *are* reporting a sizable number of RAID 0 (and other RAID mode problems) with these controllers. I think the issue boils down to the quality of the RAID controller--which is exactly what I mean when I say that the efficacy of a RAID setup is greatly dependent on "how" you set it up, and "what" you set it up with in terms of controllers, and of course hard drives, too.

IE, you go "cheap" then you get cheap, if you know what I mean...;) When it comes to components of any sort I believe that very often you get exactly what you pay for--which is also why I'm not a believer in motherboard sound, or motherboard graphics, either. Generally, it's been my experience that the drivers for motherboard-integrated devices of all kinds are just not as good nor as reliable as the kind of driver support you get with name-brand discrete peripherals.

Anyway, Jon, this has been my experience at home with RAID 0 over the last four years, and I appreciate the opportunity to share it. Thanks again.

Posted on 2007-02-06 18:19:50

Thanks for the commments!

Youngbead, exactly....for gaming there are a lot of other things in line for your money before you should consider RAID. Video card, CPU, memory...

WaltC, yes...the situation and the hardware definitely plays a role, which is why I was specifically talking about desktop usage, with onboard RAID controllers. However, your math is incorrect. If one drive has a 1.72% chance of failure, then if you have two drives, your chance of failure on one of the drives doubles. That's just simple math. Both drives still have a 1.72% chance of failure individually, but together as a set, the number doubles. You're right that looking at MTBF is another way of approaching it, but it is just another way of saying the same thing. My stats are in failures per year, and those stats are in hours per failure, it is just an inverse number with different units.

Posted on 2007-02-06 19:56:59
cowboyshootist

Thanks for the commments!

Youngbead, exactly....for gaming there are a lot of other things in line for your money before you should consider RAID. Video card, CPU, memory...

WaltC, yes...the situation and the hardware definitely plays a role, which is why I was specifically talking about desktop usage, with onboard RAID controllers. However, your math is incorrect. If one drive has a 1.72% chance of failure, then if you have two drives, your chance of failure on one of the drives doubles. That's just simple math. Both drives still have a 1.72% chance of failure individually, but together as a set, the number doubles. You're right that looking at MTBF is another way of approaching it, but it is just another way of saying the same thing. My stats are in failures per year, and those stats are in hours per failure, it is just an inverse number with different units.

Jon, I don't think your math is correct here. Just because a drive has a 1.72% chance of failure doesn't mean with 2 drives you have a 3.44% chance of having a drive failure. For example, (using numbers that make the math simple) if a drive has a 1% chance of failure then having 100 drives does not mean that you have a 100% chance that a failure will occur. Sure, if you extend the time frame out long enough then yes you will have a failure because as we know drives are mechanical devices that wear out over time.

With respect to your article on RAID I think your title should have been "Why RAID "CAN" be a terrible idea". I respectfully disagree that RAID is "usually" a terrible idea. There are many good reasons to use RAID on the desktop including performance, data availability and system high availability. Suggesting that implementing RAID is a bad idea because the built-in RAID controllers are of low quality is akin to saying that doing 3-D graphics is a bad idea because the built-in graphics cards are of low quality. If the controller isn't sufficient to the task then get one that is.

The main reason, IMHO, to use RAID in a desktop environment is data protection and availabiliy. As drive sizes increase the impact of losing a drive becomes greater and as we all know SATA drives have lower duty cycles than Fiber or SCSI drives and are more susceptible to failure. In a RAID 1 or RAID 5 configuration you'll be able to recover from degraded mode much quicker than with no RAID protection. Even if you back up your data to tape or CD restoring that data will take hours especially with one of the larger SATA drives. With tape you may not even be able to recover the data due to a 25-30% failure rate of tape media. In degraded mode you can still access your data and operate your computer while the drive is being rebuilt.

Personally I am not a fan of just doing RAID 0. If I wanted to get the performance benefits of RAID 0 I would combine it with RAID 1 or RAID 5 to get both striping and data protection. Naturally this type of configuration comes with a cost but what is that cost compared to having to recover data or not being able to recover data at all?

In the end, the decision to RAID or not to RAID should be based on the cost-benefit analysis not whether it's likely to generate a support call. Many people operate small businesses on desktop computers out of their homes and the loss of data or the inability to access that data due to drive failures can be costly to the business.

Sincerely,
TC

Posted on 2007-02-17 12:04:45
Slaw

This 9 year old math failure bothers me. Sorry for the necro. The chance of two independent events occurring would be r^n=1.74%^2=1.74%*1.74%=0.0303%. That is based on your percentage which was over a period of a year so really that would be the chance of two drives in RAID 1 failing in a year, but depending on your RAID type introducing additional reads/writes and the fact that the two drives aren't necessarily independent in different RAID types, your percentages are really invalid in this application. Here is the proper math to use with your individual hard drive failure rate: https://en.wikipedia.org/wi...

Posted on 2015-12-28 21:47:21
Giwrgos Saramas

Yes , to find the prob , of 2 independent events occuring , we use multiplication. But with this logic 100x drives is 0.0174^100 . So the more drives i put the less the probability of failure. No way. In the other hand, we use addition when we want to either one of the events to happen, but again adding the 0.0174 100 times , is equal to 1.74 > 1 which is wrong from probability perspective. Since it seems we talk whether the addition of a drive will lead to increase in failute prob, im going to assume a RAID-0 config.What really happening is this. Lets assume we have a bag, with 100drives and we decide to pick 2 for our RAID-0 .. Now every bag contains 1,74 bad drives. What's the possible outcome . Its either GoodGood BadGooD GoodBad and BadBad. In a RAID-0 its either drive fails whole aray fails Prob(fail) = BG + GB + BB = 1 - GG = 1 - (1-0.0174)*(1-0.0174) = 1 - 0.96550276 = 0.03449 ~ 0.035 = 3.5% . Prob(3-drive) = 1 - GGG = 1 - (1-bad)^3 = 1 - 0,948 ~ 5.1% .... FINALLY Prob(100-drive) = 1 - (1-0,0174)^100 ~ 82.7% .. Prob like these is like a Binary Tree , with a B or G branching in every node except the last.

Posted on 2016-02-14 01:31:25

For RAID1, 2 drives will have a lower failure rate *as a system* than one drive. Use the formula r^n. If the probability of losing data with one drive is 1.74%, then in RAID1 with two drives, the probability of losing data will be 1.74%^2 = 0.03%
For RAID0 however, the data will be lost if *any* drive fails. So in RAID0 the probability of failure increases with every additional drive. Use the formula 1 − (1 − r)^n, so with the same two drives the probability of losing data will be 1 − (1 − 0.0174)^2 = 3.44%

Posted on 2016-05-16 06:59:03
Christopher

The two drives being independent do not have a dependence relationship. Therefore, if you have any number of drives, the likelihood any of the drives will fail is still 1.72%. It is similar to flipping a coin multiple times, or flipping multiple coins. Just because the first toss is tails, does not mean that the next toss is going to be heads, or less likely to be tails. The odds are still 50%.

Posted on 2016-01-14 21:56:08

I am not a math major, but this doesn't sound right. There is no dependence, certainly, but simply the fact that you have more drives means additional chances to fail. To borrow from recent headlines, if I were to buy two lotto tickets I have a better chance of winning than with just one... not much better, but the difference is there. Or if I flip two coins, the chance that each one individually comes up heads is 50/50... but the chance that both come up heads is lower. I can't speak to the exact calculations involved, as it has been more than a decade since I last took a statistics class, but common sense bears the general principle out.

Posted on 2016-01-14 22:01:52
Giwrgos Saramas

William if you want a refresh of your statistics look up my answer and the Theory about Decision or Probability Trees.

Posted on 2016-02-14 01:36:55
Parrish

Agreed. It's an inventory problem. 1.72% of units sold went bad and came limping home, 98.28% of the units are still out in the wild. The chances of two units dying in the same system is phenomenal, or the chances of grabbing two new units of the shelf and having them both be bad. but to speak about drives distinctly as one unit of all the units in a warehouse...WAITA. Nevermind.

Posted on 2016-07-17 21:06:54
mirak

Exactly. Thats the exemple i had in mind.
So i will go for a raid 0 and backup

Posted on 2016-07-18 23:12:09
Ansky

Imagine you had 100 drives in a RAID 0 configuration. What is the chance of failure? Isn't it 100%?

Posted on 2017-07-03 07:22:04
Hornet Corset

Absolutely not. By that logic, adding another drive to the RAID 0 configuration would cause the chance of failure to be 101%, which is literally impossible.
When working with probabilities, you multiply the probabilities instead of adding them together. After all, flipping a coin gives you a 50% chance of getting heads, but flipping two coins doesn't give you a 100% chance of heads. The tricky part in this problem is determining if you need to multiply the rate of failure or the rate of success.
When you multiply two probabilities together, the probability you get is the likelihood that both events will occur. If you multiply the failure rate of one drive by itself, the resulting number is less than the original failure rate: 1.72 percent is equivalent to a rate of 0.0172. 0.0172 * 0.0172 = 0.00029584, which is equivalent to 0.029584%. That is less than three percent of one percent.

That is, however, the likelihood that both--both--of two drives will fail. In other words, that's the risk of data loss in a RAID 1 system with two drives. If you want to find the likelihood that either of two drives will fail--which is the actual metric of risk when using RAID 0--you must find the likelihood that both drives will not fail: that is, the likelihood of both drives "succeeding."

If the failure rate is 1.72% per drive, then that means that the "success rate" (going whatever period of time is the metric for this, without failure) is 98.28%. The likelihood of two drives succeeding, therefore, is 98.28% times itself, or 96.59%. Since that's the likelihood of both drives succeeding, then we can easily figure that the likelihood of either drive failing is 3.41%. Now, that's close to twice the likelihood of one drive failing (3.44%, if you're wondering), making the doubling an okay estimate, but it's not actually right: the true answer is just a smidge less.

If you want to imagine why it's just a little less, imagine that there were 10,000 people who set up identical RAID 0 systems with two drives each. Statistically, you'd find that 9,659 of them would go the entire period without either drive failing. 338 of them would have one of their drives fail during that time, and the very unfortunate three people remaining would find that both of their drives failed. That means that of the 20,000 drives, the expected number of 344 of them failed, but doesn't mean 334 RAID 0 systems failed because someone has to take the brunt of that tiny likelihood that both drives fail at the same time.

In your example, 100 drives in a RAID 0 system still doesn't give a 100% chance of failure. The likelihood of 100 drives (each having a 98.28% chance of success) going the entire period of time without failure is fairly small, but it's not even close to zero. It's a full 17.64 percent! That means that the failure rate of the 100-drive RAID 0 system is not 100% as you projected, but "only" 82.36%.
Of course, I would never trust my data to something that risky, but the likelihood of failure isn't 100%. In fact, to reach 99% failure rate, you'd need 266 drives in your system. As you continue to add more drives, the failure rate will always increase, but the amount it increases will be less each time, causing the number to approach 100%, but not even with a centillion (that's 1 x 10^303) drives would the chance of drive failure ever be truly absolute.

Posted on 2018-08-14 02:58:51

LOLs people talking statistics. I didnt do statistics at University but i wonder a few things.

As kind mentioned if you flip a coin two times its not guaranteed you are going to get 1 head or 1 tail, but in my opinion law of averages is that you will hit it more often than not. Probably more realistic is to push it out to 10 coin tosses and say getting 5 heads 5 tails is not guaranteed but i suspect it is going to happen more often than not compared to any other result.

If you get two drives and they dont fail it doesnt mean there was a zero percent chance of failure or if 1 drive fails it was a 50% chance of failure.
Further to this the figure of 1.72% will almost definitely not be correct in the first place if you are dealing with the absolute percentage chance of failure! You Would have to take an infinite sample of data to get the correct figure (and well an infinite sample of data is well..... not possible!)

I dont think its unreasonable to say by law of averages you double your chance of a hard drive failure if you have two drives.

Posted on 2016-05-13 12:53:25
AustinM

With two drives in a RAID 0 configuration, data loss occurs when either drive A *OR* drive B fails.

Take any introductory statistics course and you'll learn that the probability of an *OR* event is the sum of the probabilities of its component events minus any overlapping probability between component events. Because these component events are mutually exclusive, there is no overlapping probability - which means that the probability of data loss is the probability of drive A failure plus the probability of drive B failure. .0172 + .0172 = .0344, or 3.44% - indicating that chances of data loss are exactly twice as likely with two drives in a RAID 0 configuration compared with a single standalone drive.

Apologies WaltC...

Posted on 2007-02-06 21:35:44
chartguy

Take any introductory statistics course and you'll learn that the probability of an *OR* event is the sum of the probabilities of its component events minus any overlapping probability between component events. Because these component events are mutually exclusive, there is no overlapping probability - which means that the probability of data loss is the probability of drive A failure plus the probability of drive B failure. .0172 + .0172 = .0344, or 3.44% - indicating that chances of data loss are exactly twice as likely with two drives in a RAID 0 configuration compared with a single standalone drive.

Apologies WaltC...

Posted on 2007-06-22 10:09:08

OK, I'm not nearly skilled enough in mathematics to argue whether the correct method of calculating this is to use the complement or to use the sum of the probabilities (or know the difference between them) Your answers are pretty close to one another in any case.

However, I do have a background in engineering, and it appears to me that you've both left out an important component. What about the RAID controller? What are the typical failure rates for those, and how does that affect the rest of the system? It would certainly seem a controller failure or problem could also cause data corruption, so there's an additional risk factor you've introduced into the system.

Posted on 2008-05-14 17:30:31
Guest_47

You feel your point is so impressive, that you need to repeat it one year later?

Posted on 2015-11-22 18:16:35
Mike

Came upon this in my research and I feel I should correct this even though it is quite late.

Your calculation is not correct. The events are not mutually exclusive, i.e. drive A and B can both fail.

As an experiment, using your numbers, consider a RAID 0 of 100 drives, with each's probability of failure as 1.72% as you indicated. If we simply added them, we would receive a 172% probability of failure, an impossible value.

The correct methodology for a network such as this is to take the complement of the product of the complements of the probability of failure. So, for example, the probability of a RAID 0 of 100 drives failing is 1-(0.9826^100), or ~82%.

Posted on 2013-06-15 23:43:13
Guest_47

The problem is that AustinM/Chartbuy TOOK the introductory statistics course, but he failed it.

Posted on 2015-11-22 18:18:00
Roger Pham

No, AustimM and Chartguy understood it correctly. Please kindly read my reply to Mike above for detail.

Posted on 2016-02-15 21:40:18
Guest_47

Not really. Let's recap. AustinM/Chartbuy, who is so proud of his education in introductory statistics, made the incorrect assertion that the events are mutually exclusive. Mike came along, and pointed out that the drives can fail at the same time. Yet he made a different mistake, one where he incorrectly said that the 172% failure rate for the entire system of 100 drives is not possible. I don't have a problem with Mike because it was a silly mistake, and he wasn't being all haughty about it. Then you come along and pointed out Mike's mistake, and correctly noted that the failure rate refers to a single drive and not the entire system. However, you are still wrong, because like AustinM/Chartbuy, you have failed to take into account of multiple drive failures, the point made by Mike. In addition, you did not account for the variance of failures, which means that the statement "100% probability of failure in at least x drive in y months" is not correct. I am sure this is just an oversight, so it's no big deal. I am not saying I am absolutely right and you guys are all wrong. We all make mistakes. This is just some simple maths. What irks me is that AustinM/Chartbuy's attitude, that's all. Not really that interested in talking about statistics.

Posted on 2016-02-22 16:38:30
Roger Pham

>>>>>>>"As an experiment, using your numbers, consider a RAID 0 of 100 drives, with each's probability of failure as 1.72% as you indicated. If we simply added them, we would receive a 172% probability of failure, an impossible value."

1.72% of failure in ONE YEAR for ONE DRIVE. 172% probability of failure of at least ONE DRIVE out of an array of 100 drives in one year means that a 100% probability of failure of at least ONE DRIVE in about in about 7 months!
(100% / 172% = 0.58 year = 7 months)

Posted on 2016-02-15 21:38:07
Midas

Roger Pham -
Thats not how probability works. 172% probability is, by definition, not possible. Probability is ALWAYS 0 - 1 (i.e 0%-100%). The time factor is irrelevant.

Lets say they manufactured 10,000 drives. At 1.72% failure rate, that means 172 drives will fail, but 9,828 will work.
According to your logic, it is IMPOSSIBLE to have luck and buy 100 drives without getting one of the bad ones, which obviously is not correct. As long as at least 100 good drives were made, there is a probability for anyone to receive 100 good drives.

What actually happens is that for every additional drive you buy, the probability of getting at least one bad drive grows closer to 1 (100%), but never equal or above.

Compare with tossing a dice (buying a drive). Rolling it once, getting a 6 (broken drive) is not very likely. Roll it again, again, again... The more times you roll, the probability of at least one roll being a 6 increases. But, it is still possible to roll it thousands of time without getting a 6. Still never equal/above 100%.

Posted on 2017-04-16 01:05:24
Roger Pham

Semantically, you're correct. By definition, there cannot be over 100% probability. However, conceptually, if there's a probability of 1.72% chance of failure of one drive per year, then, for an array of 100 drives, the sum of those will add up to 172% failure rate in ONE year, CONCEPTUALLY, though, by definition, we must lower that down to 100% being the ceiling of probability. To make up for this, we can lower the time interval of 100% probability from 1 year down to 7 months to have the 100% probability of failure, to account for the sum of 172% chance of failure per year. So, a more appropriate statement would be an 100% probability of at least a single hard drive failure in an array of 100 drives in 7 months.

Posted on 2017-04-16 05:45:28
Midas

Sorry, but no, you are completely incorrect. The probability 1-(0.9826^100) = ~83% as others have mentioned.

Posted on 2017-04-16 09:09:10
Roger Pham

Well, Midas, you sure have the golden touch! Yes, the law of permutation is such that if the probability of ONE drive to fail in one year is 1.72%, then the probability of that drive NOT to fail in one year is 0.9826. If one would roll that dice 100 times, then the probability of zero failure for that 100 rolls would be 0.9826^100 = 17.2%. So there is 17.2% probability of zero drive out of 100 would fail in 1 year, and thus, the probability of a single drive out of a RAID 0 of 100 drives to fail in one year would be 100% - 17.2% = 82.8%. Make sense. Thanks for the correction.

Posted on 2017-04-17 06:33:22
Bert

You can't be 100% certain that something with a probability of 1.72% will happen within any given timeframe.

Look at it this way. Imagine that you run the drives for 14 months. Maybe those first 7 months everything goes really well and you have zero failures. However, maybe the latter half of your testing doesn't go so well and you have two failures. Now you have a 1.72% failure rate even though zero drives failed in the first 7 months. This gives you the elected failure rate, but doesn't line up with your incorrect prediction that three is a 100% chance that a drive will fail every 7 months.

You should take a statistics class sometime. It's quite a fascinating subject and it really isn't as straight forward as it appears at a glance.

Posted on 2017-04-16 17:08:28
Roger Pham

Read my above reply to Midas for reply. It's been too many years since I've taken the statistics class.

Posted on 2017-04-17 06:34:23
Bert

It's all good. Mistakes happen.

Posted on 2017-04-17 12:03:37
Guest_47

"Take any introductory statistics course and you'll learn that...", said the anti-social guy who thinks he knows something about statistics. I hope you've matured in the last nine years.

Posted on 2015-11-22 18:15:29
Lee Thompson

I don't think I'd ever use RAID, anything goes wrong on any drive in that array and you are SOL. Seems like a lot of risk for just a minor read speed bump. Moreover, the more disks in the array the worse it gets.

I use RAID5 for data storage, and I purposely mix brands (but not too much so they still work) so that they don't all have the same MTBF. (I'm using Infrant ReadyNAS NV boxes which also are doing hardware raid). (Even RAID5 is pushing your luck with 4 drives (3+1), you're better off doing RAID6 (3+2) or at least having a hot spare.

I mix brand also because over the years, I've had some kind of problem wtih every brand of drive. It happens, it will happen again. Sometimes it's Hiatchi, sometimes Seagate, sometimes Maxtor, sometimes WDC... hardware's being pushed so rapidly nothing is really getting *perfected* and glitches are guaranteed to show up. My RAID5 strategy is both to make sure the MTBFs do NOT match and to vary brands because they all seem to have bad batches and at least this way I will (hopefully) only have one failure at a time. :) Is it perfect? Is there still a chance of data loss? "No and Yes" But hopefully I've minimized as much as I can in this price range... :)

On my workstations/gaming box, I don't do any RAID at all. There's no need for it.

Posted on 2007-02-08 23:43:58
baneaic

So here's a question.

Say you've got a RAID0 configuration and want to get rid of it, using the 2 drives as separate drives.

Is it as simple as backing up, disconnecting the drives and reconnecting them each to the board just by their own individual SATA cables? Would the drives then have to be reformatted to be used (since some data could be broken into parts with some on one drive, some on the other)?

Posted on 2007-02-12 08:24:28

Yep, that'd be a perfect use of a disk imaging software. You can just recable the hard drives to non-RAID ports, or often times you can simply disable RAID in the BIOS, which makes those same ports into standard SATA ports. Both hard drives would need to be repartitioned and formatted.

Posted on 2007-02-12 11:02:26
raohara

I am not sure the simple addition of failure rates is the right approach to this problem. Instead, I suggest evaluating the system reliability using parallel and or series availability methods. For example, if two drives are in a redundant configuration such that the system will continue to function if one drive fails then use the parallel approach. When calculating system availability for devices operating in parallel the combined availability of the two devices is always higher than the availability of the individual parts.

There are numerous articles that can explain this in more detail. Here is one:

http://www.eventhelix.com/R...

Regards,

Rich

Posted on 2007-02-13 09:10:39

At that point in the article, I was talking about failure of the array from the standpoint of hassle, not data loss. If we're looking at the chances of a degraded array, then addition of failure rates is appropriate.

Sorry!

Posted on 2007-02-13 10:10:27
PCC- Builder Dan

From a purely technical perspective, as the numbers of drives both sold and failed are finite, yes, having one drive fail does reduce the odds of another failing. I believe that the statistically correct way of looking at it is to multiply the odds of a drive not failing with itself as many times as you have drives, to get the odds of no drive failures at all. When the numbers of drives are sufficiently large (and I'd call our total hard drive sales sufficiently large), adding the two probabilities of failure together gives at least a close approximation of the correct probability.

However, those differences are practically nill compared to, say, the odds of the drives falling out of RAID, or any other of a half-million controller-related issues. While you and I are certainly power-users enough to replace drives and rebuild the array without calling tech support, my Aunt Tillie most certainly is not. She, and not us, is who the article is aimed at.

Posted on 2007-02-17 19:59:52
a.lizard

Using RAID, if something like a power surge eats both drives, you're SOL.

With a simple offline drive image on a mirror drive, one is back up as soon as the mirror is hooked up... I keep the mobile rack with the mirror drive somewhere else and only have it plugged in when actually backing up.

I do an rsync image backup (I use Linux, I presume there's a Windoze equivalent, though I think one could actually use what's described here) to a drive mirror using a slightly customized Knoppix disk. (if one uses LVM, building a UUID gets kind of ugly, but it's doable) With Knoppix running as an OS as a LiveCD, there's no disk concurrency problem, and rsync only updates changed files, i.e. once the drive image is transferred, one does differential backup only. So I let the computer boot to the Knoppix home page, click to the backup page, click the rsync icon which is hooked to a script... and go away for a few minutes.

Drive recovery, assuming that nothing else is fried is simply a matter of bolting in the backup, changing the drive LVM ID using Knoppix, and rebooting in the native OS. (Debian Etch in my case)

Posted on 2007-04-11 00:07:03
Locke

RAID 0 performance boosts for gaming is probably one of the most overrated things on the PC market today.

RAID setups see the biggest advantages in arrays of 8+ disks, like the ones found in high end video editing computers.

I have my Vista on a RAID 0 setup, but I don't see a huge performance boost over using a single drive, especially in every day applications.

On the other hand, my HD editing setup (which is probably running at the bare minimum required) has an awesome performance boost when raiding.

In fact, it's the only way to even capture/edit/view uncompressed HD video.

Posted on 2008-05-15 09:27:14

I have my Vista on a RAID 0 setup, but I don't see a huge performance boost over using a single drive, especially in every day applications.

On the other hand, my HD editing setup (which is probably running at the bare minimum required) has an awesome performance boost when raiding.

OK, so what is the difference between the two systems? How are they set up and what hardware are they? Since I'm not familiar with either one of your systems, it's hard for me to understand why it works great for one, but is barely noticable on the other... (I'm guessing the video editing system has higher specs, but I'd like specifics)

Posted on 2008-05-15 13:26:18
Locke

OK, so what is the difference between the two systems? How are they set up and what hardware are they? Since I'm not familiar with either one of your systems, it's hard for me to understand why it works great for one, but is barely noticable on the other... (I'm guessing the video editing system has higher specs, but I'd like specifics)

Actually, both RAID configurations are on the same setup.

My OS and applications are installed on 2 x WD Raptors, while my HD storage array is 4 x Seagate Barracuda

I am not using on-board RAID, rather a 6-port PCI-Express.

System Specs:

Q6600 @ 2.8GHz
ASUS P5E
2 x 2Gb Crucial Ballistix DDR2
XFX GeForce 8800GT

Everything after that is just icing on the cake.

Like I mentioned before, not a huge difference running the OS and apps. in a RAID0, at least not enough to justify the cost.

Posted on 2008-05-20 14:42:23

> both RAID configurations are on the same setup...

OK, so I guess I'm just not understanding the difference then...

You say one isn't much difference at all, but on the other it's a huge difference. Is it the software that's making the big difference, or the fact that video needs so much data your single drive just can't keep up?

Posted on 2008-05-21 09:07:11
Locke

> both RAID configurations are on the same setup...

OK, so I guess I'm just not understanding the difference then...

You say one isn't much difference at all, but on the other it's a huge difference. Is it the software that's making the big difference, or the fact that video needs so much data your single drive just can't keep up?

Capturing uncompressed HD with a single drive isn't possible (using conventional 7200RPM or even 10k RMP drives)

In order to capture/edit/playback these files, you need to have a hard drive array capable of very high read/write capabilities.

For this particular application, one can obviously see the benefits of RAID because a single drive simply isn't an option.

RAID for your applications (which I also have setup) was really a bit of a desappointment to me. I bought two Rators, my buddy bought one. He and I get equal load times for the Operating System and Apps.

Posted on 2008-05-21 14:29:57
dwight

Great article. It's good to see people starting to wake up to the fact that RAID doesn't deliver on the hype which has been attributed to it. Though it's hard for some (if not many) to admit that they've blown money on a solution which just isn't close to delivering on its promises.

Some researchers at Google published a similar conclusion a year ago last fall, IIRC. That's one of the reasons why (they claimed) Google doesn't waste money on RAID. But there's so much mindset in place with RAID that people have difficulty accepting the fact that it's really just a waste of money for many, if not most, of its designated uses.

Regarding data integrity, no one has mentioned that parity(!) is a joke, especially as it's applied to modern day RAID. Parity might be cutting edge 1980's technology, but honestly, it's a joke IMHO now. Use cryptographic checksums if you want real data integrity checking. Multiway cgcs's if you absolute have to be certain of the data. But not parity.

Thanks again.

Posted on 2008-05-23 21:53:27
Giaour

I use RAID solutions on my workstations at home but I also use enterprise level hardware (SCSI / separate controller / etc.). I have had little issue over the nearly six years of usage (across multiple machines albeit).

The hardware identified by the article is still designed around a desktop platform which means to me not running 24-7 so the failure rates may be missing the all important - time to failure aspect; which could be accelerated using non-enterprise hardware in an enterprise way.

In my opinion, as the gap between enterprise and non-enterprise hardware gets smaller, one the ability to buy enterprise grade hardware shouldn't be so steep.

Anywho ... I wanted to say if you *MUST* do RAID ... please use a dedicated controller not the ensure a better chance of longevity and determine the typical weekly hourly usage of the drives.

Posted on 2008-05-30 14:42:31
SergeantMajor15

I was just wondering...why do people use raid when they can just get one big hard drive? Wouldn't that just be easier?

Posted on 2008-06-01 13:03:01

I was just wondering...why do people use raid when they can just get one big hard drive? Wouldn't that just be easier?

There are two reasons to use RAID.

1) Speed - Since a hard drive is a physical device, it can only transmit data so fast. The drive spins at a certain speed, and the drive head moves back and forth to access the data.

By using two drives in a RAID configuration, you can acess your data faster. Just how much faster depends on a huge number of factors, and can vary from not at all to nearly twice as fast. Gamers like RAID since the game is never waiting to load data, or if it is waiting, the wait is less than it would be otherwise.

2) Data Security - RAID drives can be set up to make a copy of all your data. They can do it by mirroring, which simply makes two copies, or in more complex installations by striping, which I won't try to explain here, but data security striping takes 3 or more drives and complex software.

No matter which technique, the end result is that if you lose a hard drive due to disk failure, you data won't be lost. That is an important feature in some applications.

Please note that it pretty much only protects you from physical disk problems. If you get a nasty virus for example, or delete a file by mistake, RAID users are usually no better off than regular computer users.

Posted on 2008-06-01 18:23:20
dwight

I agree. Those are the only two real justifications for raid. But then you have to ask yourself whether it's work the money?

Personally, I prefer to spend the money on a hard disk somewhere else, and just use rsync. That way, I don't have to worry about someone accidentally deleting all of the files.

It really depends on what you're using it for, of course. But what I chuckle at are the people who toss out the raid buzzword, and spend a lot of money, without really understanding what their requirements are.

Hey, if it meets your needs, that's great. But just don't expect to impress everyone because you're using RAID. Some will laugh at you. :)

Posted on 2008-06-02 19:14:02
Semper Fuzz

Say hi to the "VelociRaptor" I have two Raptor 150's and they were in Raid 0 and when they worked it was great. Have any crashes or have to reload and the average Joe like me could not get the raid back up. Now i am running on one Raptor and the other sits being liquid cooled and it laughs at me.

I am buying the Velociraptor as my main operating and will use my Raptors as storage !

Posted on 2008-06-30 19:45:41
Grease Monkey

Hello,

Good info here.

On the topic of RAID 0 in the context of gaming, I understand that the benefits are almost nil with hard disks because of the seek time.

Now what if we use solid state drives, or hyperdrives 4 - I would expect this setup to actually significantly outperform ssd . hyped-4 by two fold.

Any thoughts?

Posted on 2008-11-15 17:18:19
scottish

I apologize for reviving this post, but I have to say that I am SO appreciative in regard to Jeff Stubbers and his answers to my questions in regard to a RAID set-up for the new system I ordered with Puget.

I initially submitted the configuration with two WD Raptors 300GB 10k in Raid 0 - Jeff immediately recommended a good quality controller. A bit later (another 3-4 emails) I was questioning whether I really needed a RAID set-up (this is a new gaming system btw) and Jeff detailed exactly what a RAID set-up is for. He didn't say I should or shouldn't have one, but gave me the information I needed to make a decent decision on my own.

In the end I decided against the RAID and after reading this thread I'm tickled to death that I did so (Jeff commented as well that he thought it a wise decision). An added benefit was the 700.00+ savings that resulted from the change! :)

Just awesome service from you guys here!

Posted on 2009-05-22 17:08:15
Computer_Newbie

Jon, your my hero. Reading that article made me realize what has been wrong with my computer for 2 years. My computer will(it seems just to spite me) go BSOD on me. Sometimes it rarely happens and sometimes it happens every 20 minutes. Now I have realized that it has been caused by a Degraded RAID Array. My computer has been sick and giving very loud hints and I've been blaming it for all the problems It's been having. I feel so ashamed! Damn it feels good getting that out. Now I've got to go do something. *Gets down on hands and knees begging the computer for forgiveness**Computer slaps me* "OUCH!!" Now where was I... Oh Yeah! Now whose to blame for all of these problems! The computer companies who sold computers with RAID with insufficient RAID Controllers! Let's Tar and Feather the lot of them! *Goes to General Store to pick up Tar* "Hello kind sir. I'm having a bit of a dispute with my local Computer Company. Where do you keep the Tar and Feathers?" "isle 8. Anything else, sir?" "Um yes, where do keep the AR-15 rounds and those things you bury and when you step on them they explode?" "Landmines?" "Landmines, landmines! That's what I was thinking of!" They're on isle 11." "Thank you." "Have a nice day." *Returns to Forum Thread* Well I'm ready, now! You know that was a very nice stock boy I hope they're paying him well. And If not, they'll be the next target!;):cool:

Posted on 2011-03-09 21:14:53
Lukas

This article is retarded!  All of those Failure rates should be at 100%  Do you think hard drives last forever?  Your looking at a window of time.  So what, guess what, life as we know it, doesn't end at the end of your window of time, it keeps going, and going and going.  Every hard drive will fail, there is no "Super" hard drive that will last forever.

If you use your computer for anything meaningful, have a desire not to have any down time due to a single hard drive failure and want to protect your data: Implement a RAID configuration.

This is the most ridiculous article I have read on the internet to this date.

Quote:  "The real question is: Is RAID1 really worth being 15-20 times more likely to have a problem?"

WTF!!!!  Are you on crack?  Having a RAID 1 does not make you 15-20 times more likely to have a problem.  If you ever work for a company that provides SMB server solutions and spew any of this ridiculous "anti-raid" nonsense, they will laugh you out the door.

Just because some noob plays with his hard drives and degrades his RAID, or doesn't know how to install RAID drivers during Windows setup does not mean its 15-20 times more likely to have problems. 

You state that:

"
....Not only are you now relying on two hard drives to work properly, but you are also relying on a much more complicated controller (the RAID controller). If any one of those items has a problem, the array fails. If one hard drive has a problem even for a moment, you have a degraded array on your hands...
"

So if you have one hard drive, you are not relying on a "complicated controller" . Like, if you have a non-RAID controller, it's easier to fix the electronics in it or something?  Get Real. 

If one hard drive has a problem , even for a moment, you have a degraded array - as opposed to a complete system failure?  What is your point exactly?  So having ONE hard drive AND experiencing a problem with that ONE hard drive (even for a moment) is better?

Walk into any SMB , look at their Dell/HP/IBM server and see if it has RAID.  Look at any mission-critical workstation that has a decent system administrator behind it, it has RAID.
Just because retards play with their system and screw up their RAID, doesn't mean its something you shouldn't recommend.  It means you should slap these noobs upside the head and say; Don't fuck around with your system, just use it!

If I sound angry or upset in this article, its because I AM!  This is absolutely ridiculous and your are anti-educating the population about technologies that are meant to protect.

Posted on 2012-04-18 23:23:29

Hi Lucas, I think you missed a few things in the article.  The failure rates cited were over the period of one year.  I am specifically talking about desktop onboard RAID (I even call out server applications as NOT what I'm talking about).

"This is the most ridiculous article I have read on the internet to this date." I should get a plaque!

Posted on 2012-04-19 23:10:53
no reply

  Dedicated raid at a pricepoint of ~$300 vs cheap PCI or onboard controller makes a difference when we're talking about raid 5, or other more complicated configurations.  When it comes to Rad 0,1, 0+1 etc, they are virtually identical in performance.  The expensive hardware and brains on those raid controllers is for the overhead associated with calculating and distributing parity across all of the drives.  Spending $300 on an expensive controller only to stripe or mirror is a waste of money, those are not complicated procedures and don't need a complicated brain to run them like you find on those expensive 3ware/adaptec controllers.  It's all about what you want - if you want speed and don't care about data integrity, and there are some desktop and workstation situations where this can be true - go ahead and set up a raid 0, but expect equal or less data integrity than running a single disk.  You want protection against hard disk failure, go ahead and setup a Raid 1, but expect the same protection (read:none) against deleting files, trojans, viruses, malware.  You want speed and protection against hard drive failure, set up a striped+mirrored RAID setup.  There are end user, non enterprise server settings where each of these might work.  For my setup, I have striped-mirrored setup running on a file-server, on an old MB+cpu, that automatically images on a weekly basis to one of two removeable drives, that are rotated, and stored off premises.  Works great.  This box runs as file server, web server for our office intranet, and SQL server as back-end to our ASP pages.  Works great.  We're in a small office of about 3-5 people and it suits our needs, but it could just as easily be my home server hosting my library of movies etc, and is a fine setup for a enthusiast/sophisticated user.  Sure know what you're getting into, but dabbling in this sort of thing, asking these questions, and working through these issues to find what's right is a great way for people to learn.  I'm sure the cost of your service calls means you want things to work all the time, the flip side is, instead of telling your clients why one thing or another is a good or bad tell them why - educate them.  The only problem with RAID 1 is the misconception that it provides protection against deletion of files, trojans, viruses ets, but in the end it always come back to creating an offsite copy of your data that can be reverted to in the case of not just data loss, but also data corruption.  In terms of statistics, if we're going to talk numbers, let's do them right.  if Drive X has a failure rate of 1%, and  Drive Y has a failure rate of 1%, it is not an issue of adding up the two failure rates, instead you have to look at the rate they won't fail.  If each has a 99% chance of not failing, then the rate of both of them not failing is .99x.99 = 98.01% chance that they both will not fail, or a 1.99% chance that one of them will fail.  To put this in terms of the numbers you cited it would be (1.0 - 0.0172)(1.0 - 0.0172) = 0.96589584, or a 3.41% chance one of them will fail.  Yeah, that's pretty darn close to 3.44%, but that's because we're dealing with probabilities on either end -e.g. either a very HIGH liklihood that they will NOT fail, or a very LOW liklihood that they WILL fail.  Point is, it's not accurate simply to say that it's double, and as you add more disks to the array, this difference will be magnified.  Here's an example of a power user setup that is simpe and would benefit from RAID 0:  User has OS installed on striped RAID, has second disk where images of OS are stored, and where data files are stored.   When all is well, OS and programs run fast, when it shits the bed, data is on separate drive, and you simply reimage.  User cares about data safety? Back it up to a NAS or USB thumbdrive.  Easy sleazy.

Posted on 2012-05-17 19:18:32
no reply

And remember - servers are not just limited to the old VAX/VMS at the local Uni - look at any household that has more than one computer - tons of these people are running their boxes as servers, home servers are ubiquitus nowadays, and tons of enthusiasts have some incarnation of a server running, even though they aren't "enterprises"

Posted on 2012-05-17 19:22:39
whiterabbit

I just had an SSD drive go bad after only 10 months.  I was down for a week before the new drive got to me and I got it set up again.  In the interest of preventing downtime in the future, I went with RAID 1 this time.  If a drive goes bad I can just switch to the other one.  I also do backups, but I think this gives me the least amount of downtime in case of a single disk failure.

Then the question is hw vs sw RAID?  It seems to me that when using RAID 1 you are moving the single point of failure from the drive to the RAID controller: if the RAID controller fails, you've just lost everything *and* you have to get an *identical* RAID controller to access the data on your drives.  RAID controllers (esp. on motherboards) change so often that I think this might be a difficult task.  This seems to be a point that I never read about.  Most articles strongly recommend hw RAID vs sw RAID, but leave out this important point that you will need to find an identical controller if yours goes bad.  Maybe I'm missing something, but this is the way it looks to me.  I'd really like to know if I'm mistaken here.  On my motherboard, switching the disk from AHCI to RAID requires the disk to be reformatted, so you can't just easily switch back and forth.  It's a Z68 chipset; say I used the motherboard RAID and the motherboard dies in a couple of years... motherboards will likely have a different controller by then, and I won't easily be able to test my disks with a new motherboard.  Will I be able to find a controller (even a discrete controller) that can read my disks? 

I chose Windows 7 software mirroring so that I have duplicated disks that can be plugged into any computer in the case of a motherboard failure.  I don't notice any performance hit at all.  I thought it was interesting that this article refers to the motherboard RAID controller as "software" RAID; if that's the case then Windows RAID should be just as good.

During this event (and after buying two drives), I realized that Windows Backup backs up your entire system, including all installed programs.  I had no idea it backed up all programs.  Amazing.  This seems like huge progress, and should only take a few hours to recover the entire system.  This is my secondary backup plan.  :)  If I'd realized this I could've saved a lot of time.  The more I've learned about WinRE and Windows Backup makes me think downtime could be very minimal if you use these tools, and the need for any type of RAID becomes less.

Puget Systems was excellent during this adventure. :)  Having all of the drivers so easily accessible, up-to-date, and easy to see which ones you need was really helpful and impressive!  Also having the BIOS screenshots was helpul (but I wish I'd realized all I need to do was load the saved profile!)  My only suggestion might be to have an article recommending a backup strategy and the steps to recover from a disk failure; for example, using Windows Backup with a system backup, and then how to recover it using WinRE.  And noting that the BIOS can be loaded in profile 1 would be good too.  They have provided an excellent set of tools and disks with the computer that would've been extremely helpful had I had them with me. :)  But just a little more info on how to recover would've been great.  I'm still highly impressed!  Thank you!!

Posted on 2012-07-16 00:59:25
Chris Collins

Its about making the right decision.
Personally I will never use raid5 or raid6 they are over complex forms of raid and are prone to a complete failure on a rebuild.
This leaves raid 0,1 and 10.
I think in the era of super fast SSD's raid0 no longer has a place, raid1 provides the only truly stable form of raid redundancy, raid10 the same redundancy but with double performance. The main issue with raid1 is cost, as you writing of 50% of storage in return for some protection against a drive failure.

This is where I think software raid is supreme as its much more flexible than hardware or firmware raid.

So lets say you have 2 types of data.
1 - data that if you lose will be annoying but not a big deal, because you either have backups or can just redownload it, this might be e.g. steam games, movies that are easy to redownload, or other content that is easy to get again, this sort of data can be stored on an individual disk with no redundancy.
2 - data that can be tragic if you lose, such as family photos, work documents, emails, media thats not easy to reobtain and so forth.

The thing is #2 data might not be enough to fill say even half a 3tb drive, and as such doing a raid1 on 2 3tb drives is wasting tons of space needlessly. What you can do is e.g. in windows make dynamic disks, make 1 1tb partition on each 3tb disk and also a 2tb partition on each disk. The 2tb partitions would be used as single partitions, but the 1tb partitions mirrored, and you put your #2 data on the mirrored 1tb and #1 data on the 2tb partitions. Best of both worlds. If you need high i/o performance get a decent ssd.

ZFS is even better as that provides protection against bit rot. A dumb raid which is pretty much all hardware raid and also a big chuck of software raid solutions doesnt know which data is correct or wrong, it has to guess when it comes across a situation, zfs will know which is good or bad due to how it works and as such you have protection against bit rot.

Posted on 2016-11-17 01:54:51
SXXX

if drive has 1.000.000 or 1.500.000 hours untilll  failure convertit to earth years will be around 100-170 years. good luck RAIDERS

Posted on 2012-08-03 02:11:41

Using the MTBF to indicate expected lifespan isn't quite how it works.  I found this explanation amusing:

"What is the MTBF of an 25 year old human being? 70 years? 80? No, it’s actually over 800 years which highlights the difference between lifetime and MTBF. Take a large population of, say, 500,000 over a year, and seeing how many ‘failed’ (died) that year – e.g. 600 – so the failure rate is 600 per 500,000 ‘people-years’, i.e. 0.12% per year and the MTBF is the inverse of that which is 830 years. An individual won’t last that long, they will wear out long before then (unless they are Doctor Who), but for the population as a whole, in that ‘high reliability’ portion of their lifespan, it holds true – in a typical year you will only have to ‘replace’ 600 of them."

Quoted from:
http://www.qualityandproduc... 

Posted on 2012-08-03 02:33:05
smr

It is hard to adapt probabilities that are certain only with high quantities, and considering failures rate of a single personal disk.
We know that throwing a dice, we can expect to get number 4 with a 1/6 probability. But in reality, it can happen you throw a dice 10 or 20 times without getting a 4. Probability becomes certain only for high quantities.
Jon Bach got 3% failure rates on 300 or more disks sold. It doesn't mean your particular disk will have 3 failures in a 100 days period. It could have 0 failure (or more).
An other practical example is I'm very happy having two cars, because it is hard to face a car failure when you have only one, you can't drive for many days as long the car is under repair. When you have two cars, you can use second car when one fails.

So is RAID concept. Probability to have failure on one disk is say 3%. Having 2 disks causes to have 6% (3+3) probability of failure on one of two disks but, as Whiterabbit said, probability to have failure on two disks at same time is the product of each case, so 3% * 3% = 9/10 000. This is why RAID is interesting. Probabilty of a disk failure is higher, but probability of two simultaneous disk failures is much less. And reality is much much less.

More than this, as said, this is true only for high number of experiences. In particular experience, I never had my two cars in failure at same time! They are both old cars, with higher failure rate than a Rolls Royce, but I'm confident because they never fail at same time!

Posted on 2012-10-13 10:06:48

Statistics is NOT only true for high number of experiences!  That's not how statistics works :)  If you personally have hard drives that don't fail, it means that someone else has twice as many that fail.  It all evens out!  It is true that the purpose of RAID is to insulate the user from the ill effects of disk failure.  The point of this article is that I'm saying that for desktop users, the other downsides of poor RAID implementation outweigh those benefits.  To your analogy, it is like owning two junker cars vs. one good quality car.  Yes, you have "redundancy" in the two junker cars, but you'll constantly be dealing with failures and headaches.  That's not a great analogy because with RAID, the quality of the disks isn't the variable factor, but I can't think of a better analogy at the moment :)  With desktop class RAID, it isn't the disks you should worry about.  It is the poor controller, the buggy drivers, the poorly written UI, and the false sense of security that uneducated users derive regarding things RAID does nothing to protect you from.

Posted on 2012-10-13 17:45:53
Montrose439

Your case of having 6 % probability of failure on one of two disks is proven wrong, when probability exceeds 100 percent(for example 50 drives times 3% = %150 percent) because you cannot have more than 100 percent failure in real life , the formula is proven wrong. Because you cannot prove that one drive will go bad in the first hour, even if you have 200 drives on at once.
But you CAN have a exponential curve formula that never goes over 100 percent and is not proven wrong.

So this is what I believe it is:
I could be wrong but I had a good Statistics teacher in school and think its right.

n  = 1 hard drive.  Disk Array is 1+1 and so forth.
If failure rate is 3%.  Then:

So  one drive is 3%/n . 
2 drives is  (3%/n ) + (3%/2n).
3 drives is (3%/n) +(3%/2n) + (3%/3n)
4 drives is (3%/n) +(3%/2n) + (3%/3n) + (3%/4n)

This gives probability of one drive going out.

Then divide that number by the number of hard drives in your system to get total unrecoverable error rate.

Let me know if you think of a better formula that does not go over %100 percent.
Thanks. Marty C.

Posted on 2012-11-01 04:59:19
Frank Van Der Mast

Jon,

You've written a very compelling article. I completely agree with you, but would like to point out that you are making it sound as if 'RAID' itself is flawed while you should be saying 'on-board software RAID' is flawed.

I had a 4x1TB RAID5 (ICH10R) array on my previous pc and, as you might have guessed, it gave me nothing but trouble. disks dropped out of the array for no apparent reason, rebuilding could be tiresome and complicated and at one point it even went horribly wrong when I tried to restore the array forcefully (using 3rd party tools) after 2 disks had dropped out simultaneously. I managed to restore the array but I (by lack of knowledge/research) screwed up the partitioning ending up with a 2TB partition instead of the 2.7TB it should have been... basically almost half of my files had gotten corrupted due to that 'oversight'.

Now, in my new computer, I did things differently.

I bought an ARECA ARC 8 ports RAID controller card and hooked up 6x2TB in RAID6 and 2x1TB in RAID1. I have an SSD with windows and apps, the raid1 stores all my games and the raid6 stores all my data and multimedia. I backup my entire OS disk to the RAID6 and an external 2TB HDD weekly and I also backup the vital data (documents and photos etc) from the RAID6 to the external HDD.

You mention 4 other possible causes for data-loss in your article but I am afraid I have to admit that I have never EVER lost data that way. I hope my setup at least satisfies your request to always backup your stuff. (if a RAID6 + external HDD with its own power source isnt safe enough I don't know what is)

Posted on 2013-02-16 18:40:15

I'll throw in my 2 cents. When it comes to software RAID and integrated RAID support on motherboards and I'll agree that if you're seriously considering RAID get the right stuff. When it comes to a small business, who isn't going to spend the extra bucks for all the fancy stuff, a simple RAID 1 solely for the purpose of having a mirror (due to failure) isn't a bad idea. While a good backup schedule is essential, it is much easier to swap drives around should one fail. I've seen plenty of drives fail well before their time, and I've also have one here that is over 11 years old and 65,000 hours and keeps on running. While it is far more likely that something else fails before the hard drives, you can't replace data. I don't know how many businesses around here have zero backups, or a crappy one at that. While RAID doesn't protect against accidental data loss or a severe power surge, it does protect against a drive failure. Had one of these businesses used RAID1 they wouldn't be forking out $1,500 to recover their data. I've seen it happen time and time again with small businesses.

Posted on 2013-04-14 13:39:24
Archibald

You're missing out on a major market here. If people want it give it to them but develop a system where the raid makes sense. Perhaps a 5 drive raid10 setup. Two drives can fail and life goes on as normal. Striped and mirrored. Perhaps source a decent pci-e x3 raid controller and sell it with your high end rig as the RAID option. Don't offer to configure consumer board raid features on lower tiered builds. People who want raid get a quality raid with a solid backbone.

Posted on 2014-01-22 07:12:01

Yep! That's exactly what we've been doing for years. That prices it out of a range that makes sense for most consumers, which is who this article was targeted for. Professionals, businesses, and server applications....RAID most certainly makes more sense there, and for those situations, we do it right.

Posted on 2014-01-22 17:14:40

Great article!

Posted on 2014-03-29 21:50:04
Amy Gwin

I spent two years dealing with failing RAIDs and more importantly failing RAID controllers. The more parts to the system, the more points for failure. Yes, it a new drive can replace the old one, but while all the parity bits get into place... the dreaded REBUILDING... your storage is at a slow turtle pace until complete. It isn't always as turnkey as it should be.

Posted on 2014-12-12 03:57:54
WDE MAN

Why am I keeping this thread alive?...sigh. Anyhow you are all mostly wrong. Yes, raid increases the probability of device failure but it also decreases the probability of data loss (when using the appropriate raid level and devices). With that said, raid is largely an irrelevant technology in the consumer space because no one bothers to implement it correctly. Nearly every consumer implementation I see use non battery backed write cache controllers and consumer grade disk units as opposed to enterprise grade SAS or SCSI setups. No consumer drive being sold can reliably survive the high I/O environment that raid creates; yes that's right, raid (ignoring raid-0) setups significantly increase I/O strain on drives. Any honest storage engineer will tell you that this is the reason RAID fails in the consumer space. Drives built for RAID are engineered differently, they have less aggressive head parking firmware, time limited error recovery, high RPM spindles, usually small diameter platters (for rotational and seeking efficiency), they use better bearings, the list just goes on and on....Realistically, a real raid drive (just one) usually costs five to six times more than a consumer equivalent and usually has half of the space. This is why RAID fails in the consumer space, people don't buy the right kit. If you are not spending at least $2k for a decent controller and a set of perhaps 3-5 drives, you are wasting your time and money. Another thing that people just don't understand is that the most probable time for a disk failure is actually during a rebuild or array transformation event, these are high I/O events and they simply nail disk drives. It doesn't help that most folks don't even bother to assign hot spares when setting up arrays.When its all said and done, RAID is not for the brain dead consumer masses, people are too cheap and stupid to do it right. Shame on MB manufacturers for giving people the idea they can pull it off with cheap desktop drives and brittle integrated firmware based controllers.

Posted on 2015-04-20 04:13:05

Yup, you are exactly right! That is why we don't encourage RAID for the majority of users, but when we do it we use enterprise-grade drives (like the WD RE series) which are purpose-built for it. I would say, though, that $2k for a controller seems excessive. A good LSI controller in the 4-8 port range with a power backup is usually ~$1k or less (not including the drives). Maybe the $2k figure was including drives? That would make sense.

Posted on 2015-04-20 15:41:07
WDE MAN

Yes, $2k = controller + drives.

Posted on 2015-04-21 01:16:28
Bob aka PuterPro

Just throwing in my 2 cents ... I
have run my main Home / Small Business PC setup like this (over MANY years, across three different PC's):

2 Mirrored Raptors for Boot / OS / Progs. Have graduated from 74GB to 150GB to 300GB Raptors, finally to WD2000FYYZ (2TB RE). 2 Failures, both Warranted, no data loss.
Have run both from the Mbd Software RAID and from my PCI-E RAID Card. Currently off Intel Mbd setup that supports a 64GB MSata SSD as cache. Reasonably fast, but about to replace it all with a new system. Currently thinking 1TB Samsung SSD with Backup instead of RAID 1 Mirror.

Raid Card is Highpoint 4320, far from a $1k card, but has been reliable for me. There are faster cards, but more $$

For my DATA drives, I ran RAID 5 for years until I had a drive failure AND an unrepaired bad block on another that crashed the rebuild. THEN my backup drive acted up. (Wasn't a good week.) Thank GOD for Spinrite, this got me back up with a couple files trashed, but mostly OK.

BUH-BUH RAID 5!!! Switched to RAID 6, and never looked back. Running WD RE drives (5 x 750GB), stable as a rock.

As far as $2K, yeah, you can spend that, and more! You can ALWAYS spend more $, they make it easy for you ;-)

That said, in my about to occur new Build, assuming I got the replacement Highpoint 4520 card - $299, plus 5 x 4GB WD Reds gets us to $1099 for the whole thing.

They're NOT, however, RE's, which would cost $1235, pricing out to $1534; but that's still WELL shy of $2K.

Personally, I like the security RAID 6 + Backup (Actually two ...) offers my data. Of course, being a Tech who's been factory trained to repair up to 5Kva UPS systems, I ALWAYS run my hardware behind a UPS.
Heck, I have 'em all over my house: On my Routers, my TiVo’s (great for not missing recording a show, LOL), and most of my other electronics.
Living in Florida, I'd say they've saved me thousands in killed equipment over the years. But it IS a cost factor that many don't think about.

If you're using RAID, you should have a UPS, budget for it! Surge protectors alone don't cut it, I know, having been an Electronics Tech for > 44 years and a Computer Tech for > 34 years. (It IS rather amusing when we lose power and they start beeping all over the house, never in sync! LOL!)

So, for me, RAID 6, until someone shows me a safer data setup. At least for me, I saw little speed difference from RAID 5 but I'm not hammering on it ...! Even if I was, I'd gladly trade my Data's safety for a slight speed difference.

IMHO ...

All the Best, PuterPro

Posted on 2015-06-20 13:31:08
Malcolm

I agree with what you say, having set up a few enterprise SANs in banking - although with much more than 3-5 drives :)

Posted on 2015-05-06 19:08:52

Very useful advice.
Im currently setting up a server and realised the MB does not support RAID... which turns out to be a goo thing.
I also use a QNAP NAS with 5x 2TB consumer grade drives. It was set to RAID5 with the assumption that coverage for one drive failure would be enough.
Well we had two drives fail within minutes of each other. So we rebuilt with RAID6, but lost all data. Ironic, since the whole purpose of having this NAS is for data backup! Also, RAID5 and RAID6, as pointed out, require a lot of I/O. This also results in quite high power consumption.

Posted on 2015-06-02 01:21:47
Melody

Actually I have been running a 4 drive RAID 0 and a 6 Drive RAID JBOD (Just a Bunch Of Drives) to make 6TB and my Raid 0 at 2TB and i have never had an issue with it and it works to keep my server running and I can play my games, My OS is on the JBOD and my data server is on the RAID 0 never had a problem.

Posted on 2015-08-14 15:58:09
aazzyyy

Don't use the BIOS-configured "fake" RAID described in this article. Instead, use software RAID. Then you're not tied to that motherboard - when you want to move to a new system, just move all the SATA drives to a new system and their RAID configuration moves with them, even if you had a redundant drive and don't move it with the others, you can still run in degraded mode, and add a new drive to rebuild from. The motherboard-supported "RAID" is a feature that I'd rather NOT have on a motherboard. If BIOS loses its setting it could corrupt your drives.

Posted on 2015-09-15 18:47:42
name1234

For most people, access time is irrelevant. All most people care about is how long it takes to transfer over their 5 GB movie files, which of course is all about throughput. RAID0 is an excellent choice for anyone who backs up their data.

"If you're working with HD video and need to be able to save 2-3TB of data, then a RAID array is necessary to have a working space that large."
This is not exactly true. It is possible to span a single logical volume across multiple disks (of any sizes) using LVM. You do not have data striping and the associated performance gains of RAID0, but if one drive fails you can still recover data from the other drives.

Posted on 2015-10-12 08:01:28
Leo SigloXX

What's so bad about RAID 1? It's like backing up only the backing up happens by itself and on the fly.

[quote]Accidental deletion or user error
Viruses or malware
Theft or catastrophic damage
Data corruption due to other failed hardware or power loss[/quote]

All true, but those things can happen with or without a RAID 1 setup. Therefore if one combines having a RAID 1 setup with still doing old fashion backups (only farther apart) is like the best of both worlds. Right!?

Posted on 2015-10-19 22:29:57
Skip

I am reading this because my server just died. It had a motherboard RAID 1 mirror, and it just plain died. Luckily I have been good about backups, but motherboard RAID 1 tends to fail MORE often than just a drive. Sometimes you can rebuild a degraded drive, but often you cannot. Right now I have one supposedly good drive, but i cannot boot from it. I have seen this a lot on forums.

RAID 1 on your motherboard is more likely to wear out your drives and/or simply die. You are better off not using it. If you want this on a server, then buy an LSI RAID card with a battery and purchase commercial grade SAS drives. For your home PC, RAID simply does not make sense.

Posted on 2015-11-02 18:13:15
James Gang

RAID-1 chat only..........
I have had an HP E9280T I-7, 12gb, with twin 1TB HD's (DRIVE-C), in a RAID-1 config for 6 years with NO failures of any kind, (unbelievably), so I can't speak about Raid recovery on a PC > I worked as a Disk specialist (in a main frame/Server environment), and had Raid-5 typically in a 4 disk up to a 16 disk array. With a HARD DRIVE (only), failure, I never experienced a 'data loss' situation.
With my Raid-1, Mirror config, I'm assuming a single HD failure (only), should not cause 'data loss'. Isn't that true?
1. My main concern has been lack of information from HP on What Port my 2 HD's are on? Where physically are HD-0 & 1? How can I tell which HD failed if have a failure?
No One in HP has been able to answer these questions.
IF I have a Raid-1 single HD failure, and replace the failed HD, Is my Drive-C (with Win-7), completely backed up?

2. I am also looking for a new HP Win-10 Desktop PC with RAID-1, but I have talked RAID-1 with several HP tech folks, and No one seems to even understand what RAID IS?
HP SOMETIMES shows Win-10 Desktops with RAID-1, but 1 month later, that Raid PC no longer is advertised as even having RAID?
Sorry to chat so much on RAID, but I am trying to find out answers to Item's #1 & 2.
Thanking You,
Jerry Peterson
jerrylite@fuse.net

Posted on 2015-11-21 05:04:52
Theborgman

The problem is not in the drives, but in the memory. A striped and mirror array can be corrupt with out even a failed drive. To have a safe raid environment you need ECC memory. Raid is never safe with out it.

Posted on 2016-04-04 01:10:21

If you have a RAID failure in a mirror (RAID 1), including back in 2007 when this was written, then you just recover the data by disconnecting the drives from the RAID card and booting them natively.

I think this article was written for desktop users who were trying to use RAID for performance, and I agree with the author that RAID is overkill for a desktop computer in all aspects. With SSDs, you really don't need RAID to get high performance on your workstation.

But for servers and NAS, you need RAID. RAID6 is the one I go to the most, but everyone has their preference.

Posted on 2016-04-26 15:18:09
Paul Ronco

Although RAID does add a significant layer of complexity to a system, as you point out, it also offers two important (and sometimes critical) benefits, namely speed, real-time data redundancy and integrity protection, or both. A quality, properly-configured RAID array should not just arbitrarily fail. In coming to your failure rate calculations, did you take into consideration those instances where the RAID actually did its job in failing by encountering a faulty third-party hardware component, such as a cable, drive, memory stick, power supply, etc.? Yes, RAID does put more stress on all of these systems, but if what you do with your computer is important enough for you to add RAID, then all the more reason to use only quality components in the first place (or upgrade to them).

Thanks for starting and providing a forum for this important discussion. You bring up many great points. Particularly that if true hardware-based RAID is not worth it to the user, then RAID is not important enough to the user to install. RAID is serious stuff. Although I am by no means an expert on RAID, I am convinced of your argument, which is shared by many, such that I ignore any non-fully-hardware-based RAID options as if they do not exist. Software/hybrid RAID expansion cards, and in-built consumer motherboard RAID, might be useful for beginners to tinker around with to get a feel for RAID with noncritical drives and data, but that is all.

That said, what constitutes "requiring" RAID in terms of how the small business or home user intends to benefit from it is ultimately a matter of subjectivity, and not a discussion I would be too interested in pursuing with the client one way or the other short of providing them with some basic information they may not already know. The real sticking point of RAID is its cost, to include the initial setup, future power consumption, potential wear on components, and maintenance if the array should "fail" due to true, unforeseeable hardware errors (not user abuse, or installer error/oversight). At a minimum, for RAID to be worth installing, you need a quality hardware-based RAID controller card; quality, error-free enterprise-class hard drives; quality data cables; a good internal cooling system for your case, and particularly for the card itself, which will get very hot; new, quality surge protection, ideally no more than two years old; and an uninterruptible power supply. The rest of the computer should be made of quality components which have been thoroughly stress-tested, ideally for 24 hours (some argue that ECC memory is a must, but this might technically be overkill for some valid lower-end RAID applications). If you're not ready to implement all of this, then you're not ready for RAID.

Posted on 2016-05-31 07:36:34
kejjer

Wow this post is still active ten years later.

I have to agree--a backup plan is way better than any RAID--I have been doing this for 15 years and work on servers.
Just came across a punctured RAID 10--data loss and a half day of rebuilding the RAID.
We had swapped drives thinking that we just had some bad drives.
Then it was the DELL PERC controller, and finally the backplane and cables.
by the time we realized it for what it was --the back rotation had been overridden and blame --DATA LOSS.

Posted on 2016-10-11 05:13:15
Eric Perry

I know this is an old article, but I'm curious if you know of any good tools to detect bit rot, or something that will scrub the drives to detect bad data beyond what the RAID controller does via scrubbing?

Posted on 2016-10-12 23:08:22
whispering malarkey

ZFS, Btrfs, ReFS, or APFS are what you're looking for. We're pushing the limits of "classic RAID" to the extent where RAID5 shouldn't be used anymore because single parity isn't enough for the data sizes currently in use. RAID6 only buys a little more time.

Posted on 2017-08-02 21:52:13
Dale Mahalko

This article may be 10 years old, but it is still as relevant today as it was back then. Though I would go further than this article and say that if you're using a redundant array, and you aren't buying a hardware RAID controller with either a battery-backed cache or a flash-backed cache, then you are buying a piece of garbage for a RAID controller.

The problem is that if you get a sudden power loss or hardware reset before a data write to a redundant array completes, your array is now corrupt and out of sync. The ONLY way to fix this is to:
1. Discard the last incomplete write, which may corrupt data
2. Rebuild the entire array, which can take hours and leaves ALL drive data vulnerable to total loss until the rebuild completes

A RAID controller backup write cache fixes this by storing incomplete writes until they finish. When the computer turns back on and the drives come back online, the FIRST thing the controller does is write out that last blip of data held in the cache before the power failure, and no loss of data or array corruption occurs.

Battery-backed caches are now old technology. The data only stays in the cache as long as the battery can last, usually about 3 days.

Flash-backed caches are the new hot thing. A supercapacitor powers the flash for a short period after a power failure, just long enough to copy data from cache memory into the flash memory. The data now stays in the flash without power, and stays there until the data can be written out. So if you lose power due to a storm and it takes days to get power again, the cached write will still be there waiting to be written out, when power is restored.

Finally, it also has not changed in the last 10 years that RAID with a battery/flash-backed cache are expensive. Expect to shell out at least $500 to $1000 for one of these. This is the only way to get true reliability with redundant RAID arrays.

Posted on 2016-10-18 17:39:57
jeff

bought a computer from dell in late 2008. Way over spec'd it with a hot i7, but planning for future growth. It was expensive. Had 12 GB ram, and 2 drive 1TB raid 1. Ran Vista 64, still running Vista 64. Love it, very stable. The hardware still works to modern performance expectations, though software compatibility is getting sketchy. As for the Raid, its been great. I've had drive failures, and then I put a new one in. Keep a freshy on the shelf. 10 minute swap, then let it mirror for a day, no missing data. I then order another drive. Raid is great. thinking of buying a new machine. Given SSD, wondering if I should go back to raid. Vista issue getting old, may just try to upgrade Windows. Any advice?

Posted on 2016-11-27 09:45:47

While I know this is a very old post, the topic is a good one. I have run RAID 1 for the last 5 years and I'm done. It has yet to protect me. Everyone knows you are replicating any bad stuff but the hope is if one drive fails, you will keep going. I have an Intel integrated raid controller. The performance of which is less than a single drive. Intel "says" it will improve performance but it doesn't. I've tested with and without Raid 1 and it is always worse with Raid 1. There are a couple of threads out their confirming this with someone else testing it too. To date, Raid 1 has never saved me from anything. A corruption goes across both drives and once fixed there is the replication time. The other day I lost a drive and it corrupted files on the remaining drive. I had to restore from backup. So that is it for me. I am going to forgo Raid 1 for more frequent backups. I'll replace both drives, but use one for an internal backup (and continue to keep an external backup too).

Posted on 2016-12-18 11:47:26
Harshit Gupta

Having a RAID server not doubt provides high speed and many people can work simultaneously on the the server. But in case of Raid Crash, the Raid server recovery is always a tricky and costly task.

Posted on 2017-03-22 10:58:42
Harshit Gupta

I do believe that RAID Servers are the most complex. And in case of any data loss, Raid Server Recovery is very tricky and expensive task.

Posted on 2017-03-24 11:30:40
Suhpreme

You are very right however the performance gain from running a raid configuration is worth a trip to the bios to disable the raid controller when you need to troubleshoot or diagnose a problem.

Not overcomplcating, just asking the user for more system knowledge, when dealing with new firmwares and unfamiliar configurations. Thats all.

-suhpreme

Posted on 2017-06-10 21:51:49
gpete19

RAID-1 Worked FINE for me.

I have been in a Raid-1 (2-1TB HD's) for 8 years and lo and behold, 3 months ago, 1 of the 2 HD's failed. My PC and Win-7 OS stayed up perfectly with the one remaining working HD. Thanks to my HP's "Intel Matrix Storage MGR's" software, I was notified of the error and replaced the failed HD and re-Synced back to a working Raid-1 Config with NO DATA LOSS and no interruption, other than the time it took to replace the HD. MY OS never missed a beat.

Posted on 2017-06-11 17:10:40
Hussain Akbar

I disagree. Setting up a system from scratch is a very time consuming process when one has to install the OS, updates, database, users, related software, configurations and data restoration. With daily or even hourly backups, you still lose the transactions that occurred since the last backup. Backups are fine if it is your gaming PC and you have enough Twinkies to last the night while you wait for the system rebuild.

In a corporate environment, I've had a client cancel the contract when a database rebuild resulted in a 3-hour downtime.

Now I favor RAID1 or RAID5 environments only. My servers are over a decade old and have been running 24/7. A burnt out disk is a simple task for the operator to replace at night with a max downtime of 10 minutes while he pops in a new drive and reboots. On that particular set of 6 servers, I've had maybe 5 drive replacements in the last decade plus a couple of power supplies.

Side note on the age of those machines: I find those no-name brand, wired-together servers way more reliable than the branded ones which are a headache (read: expensive) to maintain. The SAS drives are more expensive than SATA ones, and the parts highly prone to being discontinued by IBM/Dell/et al.

Posted on 2017-10-12 06:02:52
goblin072 .

Disagree on Raid 1. Raid is not a backup, no matter if you raid or not backups are something you should do.

Raid 1 is simple, if the drive fails that drive can be used on ANY computer, there is no parity drive to deal with. I've had drives go bad and I just pull the drive and put in a new one. It self mirrors to the other one while I work. No issues in 20+ years. I used adaptek raid cards. I don't suggest software raid though.

Most people have SSDs and they do not give much of a warning on failure. If it fails you will loose all data from the time it fails and your last backup which for many is days old. If you had raid 1 you are not going to loose anything.

Its not hype, if done properly there are no issues. It is more complicated than a single drive but I would not demonize it just because you have to work more (service calls). Maybe the people that have blown drives don't bother to call and there could be many of them that did not make your statistics.

I think most people can get by without raid and they do. But Raid 1 is going to have LESS down time, I have a raid 1 that is 20 years old and had 3 failures in that time and never lost any data and never had to reinstall windows from a backup.

Again this was using a adaptek raid card not software raid.

Yes I know this is old but if this information was not useful the entire thread should be purged from the system.

Posted on 2018-03-20 13:36:50
Brian Buresh

RAID IS NOT A DATA BACKUP MECHANISM. RAID is a REDUNDANCY. It is designed to continue operations through disk failures. It IS NOT and WAS NEVER intended to be a data backup mechanism.

Data backup is always important. Always have a backup, even with/especially with a RAID.

Posted on 2018-08-18 20:44:58
JohnnyWalker2K1

Not sure about "especially with", unless you're running something silly like a RAID 0, but yes you're right. RAID is to protect against drive failure, nothing else.

Posted on 2018-09-28 00:41:44
Brian Buresh

I say especially with because of the chances of RAID failure adding to the potential causes of data loss. You can have controller failures or other things happen that can potentially increase the chances for data loss.

Posted on 2018-09-28 14:14:34
JohnnyWalker2K1

This entire article can be summed up thusly: Our customer service department would be less busy if our customers stopped using RAID! Sure they may fall foul to harddrive failure, but usually not in the first year, so we won't have to deal with it!

Posted on 2018-09-28 00:44:48