Advantages of ECC MemoryWritten on November 5, 2013 by Matt Bach
What is ECC?
ECC (which stands for Error Correction Code) RAM is very popular in servers or other systems with high-value data as it protects against data corruption by automatically detecting and correcting memory errors. Standard RAM uses banks of eight memory chips in which data is stored and provided to the CPU on demand. ECC RAM is different as it has an additional memory chip which acts as both error detection and correction for the other eight RAM chips.
Prior to ECC memory, error detection was done via even or odd parity bits. In a computer, data is most commonly stored 8-bit chunks. When parity is being used, an additional ninth bit - or parity bit - is written which allows the system to detect when there is an error. If the system uses even parity, then the 1's and 0's (including the additional parity bit) should add up to an even number. For example, if the data written to the RAM is "10011011", since even parity is being used, a 1 would be added to the data so that when you add up the numbers (1+0+0+1+1+0+1+1+1), you get an even number. If an error were to occur and the data the RAM sends to the system is instead "10011001+1" (which adds up to an odd number), then the system knows that the data is corrupt.
ECC is an extension to parity as it uses multiple parity bits assigned to larger chunks of data to not only detect single bit errors, but correct them automatically as well. Instead of a single parity bit for every 8 bits of data, ECC uses a 7 bit code that is automatically generated for every 64 bits of data that is stored in the RAM. When the 64 bits of data is read by the system, a second 7 bit code is generated, then compared to the original 7 bit code. If the codes match, then the data is free of errors. If the codes do not match, the system can determine where the error is and fix it by comparing the two 7 bit codes.
The method of comparing the two codes is most commonly done by what is called the Reed-Solomon code. Warning, only attempt to understand the Reed-Solomon code if you really, really like math.
What about Registered Memory?
Registered (often referred to as "buffered") memory uses a technology that is often paired with, but not directly related to, ECC RAM. Registered memory has a "register" that resides between the RAM and the system's memory controller which lessens the load that is placed on the memory controller itself. This allows for more memory modules to be used at one time than would otherwise be possible.
While ECC RAM is not always Registered (since you may need the error correction of ECC without the large quantities made possible by Registered memory), almost all Registered memory will be ECC. This is simply due to the fact that systems that use large amounts of memory are almost always going to prioritize stability as well.
ECC Failure Rate Analysis
ECC RAM is theoretically more stable and reliable than standard RAM, but many times theory does not match up with fact. To see if ECC RAM really is more reliable, we looked up our failure rates for ECC and non-ECC RAM over the past 3 years.
One thing to note is that while we have tried many different brands of memory over the years, we have always returned to Kingston due to their consistently lower failure rates - up to 6x better in some cases! Because of this, we decided to include only Kingston desktop/server memory in our failure rate analysis. Including other brands makes ECC RAM look even better, but we feel that comparing within a single brand is a much more realistic comparison.
As the graph above shows, ECC RAM has a much lower failure rate than non-ECC RAM. The ~1% failure rate of the Kingston non-ECC RAM is still very, very good (which is why we primarily use Kingston), but the ECC RAM is even better at an average .24% failure rate.
One thing to notice is that over the past three years, Kingston RAM has become even more reliable over time. This is true for both ECC and non-ECC RAM and is currently at the point where we have not had a single stick of ECC RAM failure this year at all.
While a lower failure rate is certainly great, it is worth a little more investigating to determine what the cause of the failure was. Memory errors or system instability is much worse than a simple no POST failure. A faulty stick of RAM causing the system to not POST is an inconvenience, but is very unlikely to affect the data stored on the system. Memory errors, on the other hand, are much more likely to corrupt data if left unchecked.
The incredible thing about the graphs above is that over the past three years, we have not had a single case of memory errors or system instability caused by ECC RAM. Every single failure was due to either no POST or the system rebooting when we tested the memory for errors. While the rebooting issue is not ideal, the 25% reboot failure actually adds up to only 2 sticks ever with that specific problem, and both were all the way back in 2011.
The failures for non-ECC RAM, on the other hand, are overwhelmingly caused by memory errors. In fact, only 9% of the failures (No post, other/misc, and incorrect size/speed) were the type of failures that would not put your data at risk. The other 91% of failures were the type that you absolutely do not want to see in a server or other system that contains valuable data.
One thing we do want to make clear is that although non-ECC RAM currently has about a 1% failure rate, the testing we perform on all of our systems catches the majority of the issues. In the field, the failure rate for non-ECC Kingston RAM is only about .4%, or roughly one stick for every 250 sticks we sell. So while ECC RAM is certainly important for servers and systems with high-value data, non-ECC RAM is more than stable enough for use in most home or work systems.
Downsides of ECC RAM
ECC is designed to be more stable than traditional RAM, and our failure records show that this is indeed the case. However, there are a few downsides to using ECC RAM. The first, and most obvious, is that not every computer can use ECC memory. Most server and workstation motheboards require ECC RAM, but the majority of desktop systems either won't work at all with ECC RAM or the ECC functionality will be disabled.
Second, due to the additional memory chip and the inherently more complex nature of ECC RAM, it costs more than non-ECC RAM. The amount varies, but you should expect to pay roughly 10-20% more depending on the size of the memory stick. The larger the stick, the higher the price premium.
Finally, ECC RAM is slightly slower than non-ECC RAM. Many memory manufacturers say that ECC RAM will be roughly 2% slower than standard RAM due to the additional time it takes for the system to check for any memory errors. To verify this, we examined multiple benchmarks that we run on each system we produce. By using comparable CPUs (For example: Intel Core i7 4771 3.5GHz Quad Core 8MB versus Intel Xeon E3-1275 V3 3.5GHZ Quad Core 8MB) we found that this 2% estimate to be roughly correct. Our own benchmarks showed a performance hit ranging from .72 to 2.2% which, given normal testing deviations, is right in line with the 2% estimate.
If you have a server or system with high-value data where system stability is of upmost importance, these few drawbacks are very likely not even close to being an issue. The cost of RAM has come down so much recently that even a 20% increase in price only equates to about $10 per stick, which in a server environment is a very worthwhile investment. As for the performance decrease, 2% is such a small amount that it is likely never going to be perceptible outside of performance benchmarks.
At the cost of a little money and performance, ECC RAM is many times more reliable than non-ECC RAM. And when high-value data is involved, that increase in reliability is almost always going to be worth the small monetary and performance costs. In fact, anytime it is possible to do so, we would recommend using ECC RAM.