Table of Contents
Here at Puget Systems, we record a huge amount of information during the production process of our workstations including benchmarks, BIOS screenshots, thermal images, and system photos. We believe deeply in transparency and making data freely available, which is why much of what we gather is available in either our hardware articles or as a part of our part information pages.
While this data is very useful, one of the most important things we track is the failure rates of individual hardware components. This data is invaluable, allowing us to track individual component failure rates, product line trends, and overall brand failure rates to ensure that we are offering the best possible hardware to our customers. However, without the full picture (including sales numbers, exact failure reason, etc.), this data can easily be misleading which is why we typically do not show it to the general public.
With 2018 at a close, we want to share some of the general reliability trends we saw over the last year as well as what brands and models we found to be the most reliable. One thing to note is that since we don't use every brand and model of hardware available, this does not mean that any models we list are the absolute best available since there may be an even better model that we did not use in our systems for whatever reason. Instead, this is a way for us to give credit to specific brands and models that we have found to be exceptionally reliable in our workstations.
A reliable motherboard is essential to build a high-quality computer since it is essentially the glue that holds the entire system together. In addition, motherboards are very difficult to swap out and the effects of a faulty motherboard can be far-reaching and difficult to troubleshoot.
Motherboards have among the highest failure rates of any core component we sell – not because the quality is necessarily bad but because they are incredibly complex. There are SATA, USB, fan, and network controllers as well as the physical ports, audio chips, and everything else that is needed to connect every component in your system. This is a huge number of delicate parts that have to work perfectly together and any one of these could potentially have a problem. If there is a single dead USB port, slight static over the audio, or if the voltage levels are measured outside of the norm, it does not meet our standards and is considered to have failed.
However, the good news is that while motherboards seem to be getting more and more complex, 2018 was a really good year in terms of reliability. This last year, the overall failure rate was just 2.1% or about 1 out of every 49 motherboards failing for one reason or another which is about half what we saw in 2017. This may seem like a high failure rate, but the silver lining is that nearly all of these failures were caught in-house before the system was shipped to the customer. In fact, after our stringent production and quality control process, motherboards as a whole only have a .2% failure rate (or 1 out of every 500) for our customers in the field!
Out of all the boards we carried, there was one that we sold a large number of yet had absolutely no failures whatsoever:
For power supplies, we tend to stick with EVGA for the vast majority of our PSU needs since they are reliable and the supply is consistent. Because of this, we want to make clear that we really can only speak for EVGA in terms of which model, in particular, was extremely reliable in 2018. Still, from an overall perspective (including all the other brands and models we sold) we saw a total failure rate of 1.15% (1 in every 87), while our field failure rate after our production and QC process was just .4% (1 in every 222).
Likely due to the fact that the vast majority of our PSU needs are met by just 5 different EVGA models, there were none that had no failures whatsoever. However, there was one model that had just a single DOA failure in 2018 due to a bad fan bearing causing it to be noisier than it should have been:
Honorable mention goes to the EVGA SuperNOVA 1200W P2 Power Supply which had just two failures in 2018 – both of which were caught by our production department so they did not impact the customer.
Video cards are an interesting category to look at since supply constraints over the last year meant we had to carry a wide variety of individual models. For example, rather than carrying two models of the NVIDIA GTX 1080 Ti – a blower style for multi GPU setups and a dual fan design for quieter, single GPU configurations – we actually had to inventory and sell 8 different brands and models!
In addition to the supply constraints, there were also a number of software and driver issues we had to work through in 2018. However, from a physical hardware standpoint, the failure rates were pretty decent. Overall, we saw a 1.11% failure rate overall (1 in every 90), while our field failure rate after our production and QC process was just .5% (1 in every 200).
Interestingly enough, if we sort according to consumer (GeForce), workstation (Quadro), and the in-between Titan line, the Titan cards actually ended up being the most reliable. For Titan cards, we saw just a .45% failure rate overall (1 in every 222) with no failures at all for our customers in the field. For reference, GeForce was 1.24% and .62% for overall and field failure rates respectively, while Quadro was 1.16% and .39%.
As we stated earlier, it is a bit tough to call out the most reliable GPU since we did not actually sell a high volume of very many individual models. However, there was one card that was not only readily available throughout 2018, but also had no failures at all:
Honorable mention goes to the NVIDIA Titan Xp 12GB, PNY Quadro P2000 PCI-E 5GB, and EVGA GeForce GTX 1080 8GB ACX 3.0 as they each had less than a 1% overall failure rate and had no failures in the field for our customers.
RAM used to be among the least reliable component in a computer, but over the last 5 or so years it has improved greatly. In 2018, RAM in general had an overall failure rate of .41% (1 in every 244), but the field failure rate was just .07% (1 in every 1400). So, while RAM is still at risk of failing – especially since you often have 4 or more sticks in a system – after it goes through all our testing and quality control process it is actually very reliable for our customers.
However, since RAM comes in two main flavors for desktops, we also wanted to examine the failure rates for both standard and ECC (error correcting) RAM:
- Standard: Overall failure rate of .59% (1 in every 170). Field failure rate of .1% (1 in every 1000).
- ECC: Overall failure rate of .12% (1 in every 833). Field failure rate of .02% (1 in every 5000).
Due to how reliable RAM is these days, there are quite a few models that were extremely reliable in 2018 but the top three were:
Honorable mention goes to the Crucial DDR4-2666 8GB, Samsung DDR4-2400 8GB ECC Reg., and Crucial DDR4-2666 4GB ECC Reg. models which all only had one stick fail in 2018 – all of which were caught during our production process.
Storage (Hard Drive)
Storage, in general, tends to be one of the most reliable components in our workstations with an overall failure rate in 2018 of .18% (1 in every 570) and a field failure rate of just .05% (1 in every 2000). "Storage" is a very wide term, however, so we also decided to separate it out between our three most common types of drives:
- Platter drives (primarily WD): Overall failure rate of .34% (1 in every 292). Field failure rate of .11% (1 in every 910).
- SATA SSD (Samsung 850/860 EVO/Pro): Overall failure rate of .11% (1 in every 874). Field failure rate of 0% (none failed).
- M.2 NVMe (Samsung 960/970 EVO/Pro): Overall failure rate of .08% (1 in every 1250). Field failure rate of .08% (1 in every 1250).
To be honest, all of these failure rates are really good. It is not surprising that the SSD drives are more reliable than the platter drives since there are no moving parts, but we were surprised to find how reliable the Samsung NVMe drives were in 2018. When NVMe first came out, those drives had much higher failure rates than SATA SSDs simply due to how new the technology was. Now that it has matured, it looks like they are right in line with the more traditional SSDs.
Rather than calling out dozens of individual models, we are simply going to call "Samsung SSD/NVMe" drives the winner here since even combined only about 1 drive in 1000 had any issues in 2018:
While CPUs can (and do) fail, once they make it through our production process they are easily the most reliable components in our workstations. CPUs overall had a small overall failure rate of just .2% (1 in every 500), but what is really amazing is that in 2018 not a single CPU failed in the field for our customers. Another tidbit of information is that there was no appreciable difference between the Intel Core series and the Intel Xeon series of CPUs. Each only had a handful of CPU failures, all of which were caught by our production department.
We sell far more Intel CPUs than AMD (although that is starting to shift a bit with the latest Threadripper models), which means we don't have enough data to really dig into the reliability of AMD CPUs. From the data we do have, they look to be just as reliable as Intel, but we do not have a large enough sample size to be 100% confident in that conclusion.
Since there were no real trends showing one Intel CPU model as being better or worse than another in terms of reliability, we are simply going to give Intel processors, in general, our "Most Reliable CPU of 2018" award.
Overall, 2018 was a very good year for hardware reliability in our workstations. We can't speak for the market as a whole since we do a lot of qualification work to ensure that we are only using the best hardware available, but in our systems we saw about half as many parts fail this year versus 2015, 2016, or 2017.
While it is hard to pin down what exactly changed in 2018, we believe it to primarily be a combination of four factors:
- We completed our migration to Gigabyte as our primary brand for motherboards. So far, the reliability of these boards has been excellent.
- Pricing of SSD storage drives has allowed them to be used more often over the traditional platter-style drives. Since SSDs are more reliable, this means that the overall reliability rates have improved.
- RAM was much more reliable in 2018 with about half as many failures compared to previous years. We didn't change much in terms of brands and models (just minor updates), but DDR4 RAM has been the standard for quite a while now which means that it has become a very mature product.
- Intel CPUs were about 4x more reliable in 2018 compared to 2017. This is likely due to the fact that there few major architecture changes made in 2018 with most of the new models being simple core count or frequency bumps.
We hope this information has been useful or at the very least an interesting read. We are excited to see what 2019 brings and if this reliability trend will continue for yet another year!
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.