Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/855
Article Thumbnail

What is Aggregate CPU Frequency and why is it wrong?

Written on October 3, 2016 by Matt Bach
Share:

What is Aggregate CPU Frequency?

While not yet a common term, over the past year or so we have started to see a rise in the usage of the term "Aggregate CPU Frequency" as a way to estimate the performance between different CPU models. This term appears to be used most often when people are discussing high core count Xeon or dual Xeon CPU configurations, but lately we have seen it used when looking at CPUs with as few as just four cores.

At first glance, this term seems reasonable enough: it simply takes the frequency of the CPU (how fast it can complete calculations) and multiplies it by the number of cores (the number of simultaneous calculations it can perform) to arrive at a total or "aggregate" frequency for the processor. For example, below is the number of cores and base frequency for four different CPU models along with their calculated "aggregate frequency":

  # of Cores Base Frequency "Aggregate Frequency"
Intel Core i7 6700K 4 Cores 4.0 GHz 4*4 = 16 GHz
Intel Xeon E5-1650 V3 6 Cores 3.5 GHz 6*3.5 = 21 GHz
Intel Core i7 6850K 6 Cores 3.6 GHz 6*3.6 = 21.6 GHz
2x Intel Xeon E5-2690 V4 28 Cores 2.6 GHz 28*2.6 = 72.8 GHz

Unfortunately, in the majority of cases trying to estimate the relative performance of a CPU in this manner is simply going to give you inaccurate and wrong results. So before this term begins to be used more commonly, we wanted to to explain why "aggregate frequency" should not be used and give some examples showing how (in)accurate it really is.

Why is it wrong?

There are quite a few reasons why "aggregate frequency" is an inaccurate representation of CPU performance, but the largest primary reasons are the following:

  1. It is typically calculated using the advertised base frequency. Most modern Intel CPUs have a wide range of frequencies they run at including the base frequency (the frequency that is advertised in the model name) and the various Turbo frequencies. Turbo Boost allows the CPU to run at higher frequencies depending on three main factors: the number of cores being used, the temperature of the cores, and the amount of power available to the CPU. On modern desktop systems with quality components, however, the cooling and power considerations are pretty much non-factors which means that the frequency of an Intel CPU should only be limited by the number of cores that are being used. In fact, Turbo Boost is so reliable on modern CPUs that, except in a few edge cases, every system that ships out our door is checked to ensure that it is able to maintain the all-core Turbo Frequency even when the system is put under an extremely heavy load

    How big of a difference would it make to use the all-core Turbo Boost frequency instead of the base frequency? If you were to calculate the "aggregate frequency" for an Intel Xeon E5-2690 V4 CPU, you would get a result of 36.4 GHz since that CPU has 14 cores and a base frequency of 2.6 GHz. However, if you instead use the all-core Turbo frequency of 3.2 GHz (which any well-designed and adequately cooled workstation should be able to achieve indefinitely), the aggregate frequency changes to 44.8 GHz which is a difference of 30%

  2. It does not take the rest of the CPU and system specs into account including the amount of cache, architecture, and chipset. Processors are extremely complex, and just looking at the number of cores and frequency ignores everything else that can make one CPU faster or slower than another. This can include the amount of cache (whether it is L1, L2, L3, or Smart Cache), the bus type and speed, and the type and speed of memory it can use. However, more than almost anything else it ignores the architecture and manufacturing process that was used to produce the CPU. While the amount of difference all of these other specs can make varies from application to application, as an example we saw up to a 35% difference in SOLIDWORKS between a Skylake CPU and a Haswell-E CPU when both were operating with 4 cores at 4.0 GHz.

  3. It assumes that programs can make perfect use of all the CPU cores. More than anything else, this is the main problem with aggregate frequency. Using the base frequency can throw things off, but in most cases probably only by a maximum of about 10-30%. Likewise, as long as you only compare CPUs from the same product family, the architecture of the CPUs likely won't come into play. But working under the assumption that a program is going to be able to make perfect use of all of the CPUs cores is just so wrong that that it makes using an "aggregate frequency" less accurate in most cases than simply choosing a CPU at random.

    It appears that most people who use this term understand that there are some programs that are single threaded (parametric CAD programs are a prime example), but many of our articles have shown over and over that even if a program tries to use all the available cores, how effectively it can do varies wildly. The reason depends on a number of factors including how the program is coded, how well the task lends itself to multi-threading, and how much the other components in the system (including the hard drive, GPU, and RAM) affect performance. There are some programs that are very effective at utilizing multiple CPU cores in parallel, but even the best of them (such as offline rendering) are at best only ~99.5% efficient, and often as low as 90% efficient. This is extremely good, but still low enough that it will throw off any attempt to use an "aggregate frequency" to estimate performance.

    Unfortunately, the only way to know how well a program can use multiple cores is to do comprehensive testing on that specific application. We have tested a number of programs ourselves (including Premiere Pro, After Effects, Photoshop, Lightroom, SOLIDWORKS, Keyshot, Iray, Mental Ray, and Photoscan) but this is unfortunately only a tiny drop in a giant bucket compared to the number of programs that exist today.

Examples

We can talk about all the reasons why we believe "aggregate frequency" is wildly inaccurate, but there is no substitution for specific examples using actual benchmark data. To help prove our point, we are going to look at a number of different applications and compare the performance between a variety of CPUs using the "aggregate frequency" and the actual performance in reality. We like to be fair, so to give this term the best chance possible we are only going to use CPUs with same architecture (Broadwell-E/EP). If you were to mix older and newer architectures (such as a Core i7 6700K versus a Core i7 6850K or a Xeon V3 versus a Xeon V4), expect the "aggregate frequency" to become even more inaccurate.

To make it easier to see how close or far from reality the "aggregate frequency" is, whenever the expected performance using the "aggregate frequency" is within 10% of the actual performance, we will color the results in green. Anything that is 10-50% off will be in orange, and anything more than 50% off will be in red.

Example 1 - Cinema4D CPU Rendering

Offline rendering of 3D images and animations is among the most efficient tasks you can run on a CPU which makes rendering engines like those found in Cinema4D exceptional at using high numbers of CPU cores. This also makes it a best-case scenario for the term "aggregate frequency":

CineBench R15
Multi CPU
Specs "Aggregate
Frequency"
Expected Performance
Compared to
Core i7 6850K
Actual Performance
Compared to
Core i7 6850K
Intel Core i7 6850K 6 Cores, 3.6 GHz
(3.7-4.0 GHz Turbo)
21.6 GHz 100% 100%
Intel Core i7 6950X 10 Cores, 3.0 GHz
(3.4-4.0 GHz Turbo)
30 GHz 139% 156%
(off by 17%)
2x Intel Xeon E5-2630 V4 20 Cores, 2.2 GHz
(2.4-3.1 GHz Turbo)
44 GHz 204% 207%
(off by 3%)
2x Intel Xeon E5-2690 V4 28 Cores, 2.6 GHz
(3.2-3.5 GHz Turbo)
72.8 GHz 337% 358%
(off by 21%)

We run CineBench R15 on nearly every system that goes out our door, and the results above are taken directly from our benchmark logs. Comparing the expected performance to the actual performance between the different CPUs, in every case the "aggregate frequency" ended up expecting lower performance than what each CPU was able to achieve in reality. The most accurate result was the dual Xeon E5-2630 V4 in which the expected performance difference compared to the i7 6850K was only off by about 3% which is actually extremely accurate. However, the other two CPU results were off by about 20% which means that although "aggregate frequency" has a chance of being fairly accurate, it also has a good chance of being off by a moderate amount.

Example 2 - Premiere Pro

We have found in our testing that Premiere Pro is decently effective at using a moderate amount of CPU cores, but with modern hardware there is little need for something like a dual CPU workstation. Still, there are many recommendations on the web to use a dual Xeon workstation for Premiere Pro, so lets take a look at how the actual performance you would see in Premiere Pro compares to what you would expect from the "aggregate frequency":

Premiere Pro
Overall Performance
Specs "Aggregate
Frequency"
Expected Performance
Compared to
Core i7 6850K
Actual Performance
Compared to
Core i7 6850K
Intel Core i7 6850K 6 Cores, 3.6 GHz
(3.7-4.0 GHz Turbo)
21.6 GHz 100% 100%
Intel Core i7 6950X 10 Cores, 3.0 GHz
(3.4-4.0 GHz Turbo)
30 GHz 139% 123%
(off by 16%)
2x Intel Xeon E5-2643 V4 12 Cores, 3.4 GHz
(3.6-3.7 GHz Turbo)
40.8 GHz 189% 117%
(off by 77%)
2x Intel Xeon E5-2690 V4 28 Cores, 2.6 GHz
(3.2-3.5 GHz Turbo)
72.8 GHz 337% 111%
(off by 226%)

The results in the chart above are taken from our Adobe Premiere Pro CC 2015.3 CPU Comparison article that looked at exporting and generating previews in Premiere Pro with a variety of codecs and resolutions. While the results aren't too far off with somewhat similar CPUs, the "aggregate frequency" expected the i7 6950X to be about 16% faster compared to the i7 6850K than it is in reality. This isn't completely out in left field, but the difference between a 39% improvement in performance and a 23% improvement from a CPU that is more than twice as expensive is likely to make quite a big difference if you are trying to decide on which CPU to purchase.

For the dual CPU options, the "aggregate frequency" was much further off from reality being about 77% off on the dual Xeon E5-2643 V4 and a huge 226% off on the dual Xeon E5-2690 V4. In fact, where the "aggregate frequency" predicted the dual E5-2690 V4 CPUs to be the fastest option, they were in fact slower than the dual E5-2643 V4 CPUs (or even the Core i7 6950X) while costing significantly more.

Example 3 - 3ds Max

3ds Max is a 3d modeling and animation program that is primarily single threaded, so you would expect it to be a worst case scenario for "aggregate frequency". You may argue that no one should use this term for these types of lightly threaded tasks, but we have started to see this term pop up even when talking about single or lightly threaded tasks so we wanted to show just how inaccurate it may be when someone uses "aggregate frequency" as a catch-all term for CPU performance:

Premiere Pro
Overall Performance
Specs "Aggregate
Frequency"
Expected Performance
Compared to
Core i7 6850K
Actual Performance
Compared to
Core i7 6850K
Intel Core i7 6850K 6 Cores, 3.6 GHz
(3.7-4.0 GHz Turbo)
21.6 GHz 100% 100%
Intel Core i7 6950X 10 Cores, 3.0 GHz
(3.4-4.0 GHz Turbo)
30 GHz 139% 102%
(off by 37%)
2x Intel Xeon E5-2690 V4 28 Cores, 2.6 GHz
(3.2-3.5 GHz Turbo)
72.8 GHz 337% 89%
(off by 248%)

The results in the chart above are taken from our AutoDesk 3ds Max 2017 CPU Performance article that looked at animations, viewport FPS, and scanline rendering with a variety of projects. As expected from a mostly single-threaded application, the "aggregate frequency" was very optimistic in each case. Depending on which CPU you look at, the expected performance if you used the "Aggregate frequency" compared to the actual performance ranged from being 37% off to being 248% off! On the extreme end - with a pair of high core count Xeons - this means that instead of more than a 3x increase in performance compared to a i7 6850K, in reality you would actually see a 10% decrease in performance.

Example 4 - After Effects

After Effects is an interesting application because it used to be very well threaded and benefited greatly from high core count workstations. However, in the 2015 version Adobe changed it's focus from multi-threading to GPU acceleration. In the long-term this should greatly improve performance for AE users, but the result is that with modern hardware there is little need for a CPU with more than 6-8 CPU cores and often actually a decrease in performance with higher core count and dual CPU setups. So while you may think a single-threaded application like 3ds Max would be the worst case for the term "aggregate frequency", After Effects should be even worse:

After Effects
2D Animation
Specs "Aggregate
Frequency"
Expected Performance
Compared to
Core i7 6850K
Actual Performance
Compared to
Core i7 6850K
Intel Core i7 6850K 6 Cores, 3.6 GHz
(3.7-4.0 GHz Turbo)
21.6 GHz 100% 100%
Intel Core i7 6950X 10 Cores, 3.0 GHz
(3.4-4.0 GHz Turbo)
30 GHz 139% 96%
(off by 43%)
2x Intel Xeon E5-2643 V4 12 Cores, 3.4 GHz
(3.6-3.7 GHz Turbo)
40.8 GHz 189% 90%
(off by 99%)
2x Intel Xeon E5-2690 V4 28 Cores, 2.6 GHz
(3.2-3.5 GHz Turbo)
72.8 GHz 337% 86%
(off by 251%)

The results in the chart above are taken from the 2D Animation portion of our Adobe After Effects CC 2015.3 CPU Comparison article which tested rendering and timeline scrubbing across six different projects. As you can see, trying to use an "aggregate frequency" to estimate the difference between different CPU models is going to be wildly inaccurate. Compared to the i7 6850K, the other CPU choices - which should be anywhere from 39% faster to over 3 times faster - are instead all slower than the Core i7 6850K. In fact, the faster the "aggregate frequency" predicted a CPU configuration to be, the slower it ended up being in reality!

Conclusion

The allure of an all-pervasive specification like "aggregate frequency" is something we completely understand. It would be great if there was an easy way to know which CPU will be faster than another and by roughly how much, but unfortunately there is no magic bullet. To be completely fair, for highly threaded tasks like rendering, the "aggregate frequency" should be close enough that you at least wouldn't end up spending more money for lower performance, but it still isn't going to be great at estimating precisely how much of a performance increase you would see with one CPU over another.

Outside of rendering and a few other highly parallel applications, however, there is no way to know whether the "aggregate frequency" is going to be accurate or not without detailed benchmarking. For example, simulations are often touted as being highly parallel (which means it should be perfect for this term), but we have found that performing simulations in SOLIDWORKS is only moderately efficient - worse in many cases than Premiere Pro! Other simulations packages like ANSYS or COMSOL should be more efficient, but without specific testing there is no way to know for sure.

So if "aggregate frequency" is not accurate, what should people use to decide which CPU to purchase? Like we said earlier, there is no magic bullet for this. If your application is CPU-bound (the GPU, HD, and RAM don't impact performance significantly), you could use Amdahl's Law which taken into account the parallel efficiency of the program to calculate the theoretical performance difference between two CPUs. If you are interested in this, we recommend reading our guide on how to use Amdahl's Law. You are still limited to CPUs of the same architecture, it doesn't take into account things like CPU cache, and you have to do a lot of testing up front to determine the parallel efficiency of the program - but this method should be much more accurate than simply multiplying together a CPU's cores and frequency.

If your application does utilize the GPU to improve performance or if you want to compare CPUs with different architectures, however, there is really no easy way to estimate which CPU will be faster than another and by how much. In these situations, the only reliable method is good old-fashioned benchmarking. Again, we wish there was a better method that was still accurate - it would save us so much time! - but this is simply the fact of reality. This is why we at Puget Systems have started to benchmark different CPUs on as many professional applications as we have the time and expertise to handle to ensure that we are recommending exactly the right CPU to our customers. We unfortunately can't test every program we wish we could (or really even a majority), but keep on an eye on our article list as we expand our testing across more and more applications.

Tags: Aggregate Frequency, CPU
JordanViray

Who is actually using "Aggregate CPU Frequency" as a metric these days? Any reviewer or seller doing so is a good candidate for blacklisting. It was sort-of understandable when multi-core CPUs gained traction a decade ago with the Pentium D, but definitely not nowadays where benchmarks describing core scaling are very common.

Posted on 2016-10-24 02:25:39
Ray Gralak

Actually I partially disagree with the conclusion of this article because I think it leaves out at least one important detail in the testing... the leftover potential of each CPU. Amdahl's law only applies to a single software application, not the accumulated sum of all applications on a modern multitasking operating system. You can thus run multiple applications in parallel to effectively use all cores of a CPU. If you think about a computer in that way, then I think that the concept of aggregate CPU "bandwidth" makes sense.(Or, call it "aggregate CPU frequency" if you want.)

For example, a 4-core CPU might use 90% cumulative CPU bandwidth during a test, while an 8-core CPU might use only 50% of its CPU bandwidth. According to a benchmark it might appear that the 8-core CPU is only slightly faster (or even slower) than the 4-core CPU. But, it doesn't show that the 8-core system still had 50% potential CPU bandwidth available for other applications to be run, yet the 4-core only had 10% CPU bandwidth remaining.

It shouldn’t be a surprise that most software applications have not been designed to use all cores of a CPU, so using a single application as a benchmark is probably not the best test of aggregate CPU performance. However, that doesn't mean that an application cannot effectively utilize most cores. I think one example of such an application is Keyshot:

https://www.pugetsystems.co...

I do agree that one needs to balance the CPU choice with the type of applications that will be used. The one thing that I wish Matt would change in his future Premier Pro benchmarks is the disk subsystem. I think his test system only used a single SATA SSD. I think that could have been a bottleneck.

BTW, for disclosure purposes, I am a software engineer with over 30 years of development experience. I have designed several software applications that can utilize better than 90% of multi-core CPUs. I do not, nor have I ever, been in the business of selling computer hardware. I am just a consumer of fast hardware. :-)

Posted on 2016-10-27 04:47:48
JordanViray

From his conclusion:

"The allure of an all-pervasive specification like "aggregate frequency" is something we completely understand. It would be great if there was an easy way to know which CPU will be faster than another and by roughly how much, but unfortunately there is no magic bullet. "

This is completely correct. It's just that I haven't seen any serious reviewer or seller using aggregate frequency as a metric.

No one is arguing that more cores can't provide better performance, particularly in well multithreaded/multitasking scenarios; just that the user ultimately needs to understand their needs and software capabilities before they can make that determination. Simply knowing the "aggregate frequency" is not enough.

Posted on 2016-10-27 05:10:43
Ray Gralak

>This is completely correct. It's just that I haven't seen any serious reviewer or seller using aggregate frequency as a metric.

It's not explicitly specified metric value but is reflected in the price that Intel sets for their CPUs. Otherwise one could say that Intel is cheating everyone because a $4k 22-core Xeon can't run Solidworks or a game as fast as a $300 6700K.

If you could run 22 instances of Solidworks on the 22-core Xeon I don't think that any of the instances would show a slowdown unless there was another resource limitation. However, running 22 instances on the 6700K would not work very well. I think that aggregate CPU performance is a real metric, useful to those that have access to applications that can take advantage of it.

But I do agree that one must try to match up the computer specs to the set of applications that will be run. One needs to balance the entire system (CPU, memory, disk, applications) for best performance. I intentionally have multiple computers, and I run some applications on the system that tends to maximize that application's performance. Of course, for some applications the computer performance is irrelevant (e.g. most web browsing, MS Word, etc.).

Posted on 2016-10-27 15:19:13
JordanViray

>It's not explicitly specified metric value but is reflected in the price that Intel sets for their CPUs.

And as the article shows, paying for all that extra silicon doesn't guarantee extra performance; in many cases it's worse. The key is to FIRST understand what kind of usage a buyer intends. Running 22 instances of Solidworks or doing virtualization? More cores make more sense. Need the highest performance for one instance of program that isn't optimized for multithreading (which is still the majority), the higher frequency parts make more sense.

The point is that you can't simply compare CPU aggregate performance and make a simple determination as two which is "faster".

Posted on 2016-10-27 20:16:19
Ray Gralak

>And as the article shows, paying for all that extra silicon doesn't guarantee extra performance; in many cases it's worse.

That's obvious, and I think I even said that. But I think you are still missing the point I made. That is, even when the performance appears lower, there can be a huge amount of unused CPU bandwidth that could be used for parallel applications.

Often when I create a MP4 output file in Premiere I can in parallel compile a large application, listen to a background internet radio station, have Outlook pulling my email, have a couple virtual machines running different instances of Windows or LINUX running automated software testing, etc.. I can do that on my dual 10-core Xeon system, but not on my overclocked 6700K system. This kind of usage is officially called "High performance Computing", and many people do it. For those people, I think that aggregate CPU bandwidth is an important metric.

Posted on 2016-10-28 04:31:59
JordanViray

I think you missed the entire point of the article.

To be clear: the user should understand their computing needs first in order to decide whether more cores or more frequency will give them faster performance. Many, if not most, users are unconcerned about unusually large multitasking and would prefer higher single threaded performance. Even VM users who have a baseline performance requirement aren't necessarily better served by going with more cores, e.g. the Broadwell 2699 tops out at around 2.5GHz on all-core turbo which isn't going to be great for many virtual servers. But for many VM users, 2.5GHz is perfectly adequate. It all depends on their computing needs.

Even video encoding, which is particularly well suited to multithreading, often performs worse on dual-CPU setups like your Xeon machine despite the huge core and "aggregate CPU frequency" advantage.

The point of the article isn't to show that aggregate CPU frequency isn't "an important metric" but that "in the majority of cases trying to estimate the relative performance of a CPU in this manner is simply going to give you inaccurate and wrong results."

Posted on 2016-10-28 06:03:42
Ray Gralak

>I think you missed the entire point of the article.

>To be clear: the user should understand their computing needs first in
order to decide whether more cores or more frequency will give them
faster performance.

No, I don't think I missed the point of the article. As I said previously, I partially disagree with the conclusion because the benchmarking doesn't take into account the amount of unused CPU bandwidth in each system. I don't think "CPU performance" is just about which CPU can finish a single threaded task, or specific application benchmark, first. I think it should include the maximum cumulative capability of the CPU. Whether or not one runs applications that can make effective use of the CPU's full cumulative capability is irrelevant. If the CPU is not at 100% utilization, then there's more potential. Would you agree with that?

>Many, if not most, users are unconcerned about unusually large multitasking and would prefer higher single threaded performance.

If you haven't noticed, the maximum stock CPU speed has not been increasing much, if any, over the last few years. Even overclocking speeds have been declining, so if you want more aggregate CPU performance you need to increase core count.

That said, it has been well known for years that CPUs with lesser cores can be usually overclocked more and have the higher single thread CPU performance than CPUs with a larger number of cores.

Here are the three statements made at the beginning of the article:

1. It is typically calculated using the advertised base frequency.

It's true that it's hard to find the actual steppings for some CPUs, but not impossible.

2. It does not take the rest of the CPU and system specs into account including the amount of cache, architecture, and chipset.

The CPU architecture and chipset do matter. Each generation of CPU's generally have become more efficient and require less CPU cycles to complete tasks. I absolutely agree that the system specs need to be balanced to obtain the highest performance. However, when measuring the total cumulative potential of the CPU, not all system specs should matter. For example, say that there is comparison benchmark that is resource-limited by disk write speed, and it the results show all of the systems neck to neck. I do not think that result is a good measure of the CPU's full performance.

3. It assumes that programs can make perfect use of all the CPU cores.

This is where I have the biggest disagreement with the article. I think it assumes that everyone wants to run a single task and do nothing else. For example, when exporting a 4K video using Premiere Pro. I don't want to have to walk away from my system because it is totally locked up in that computation. I will accept that it might be slower than a system with lesser cores, because the few seconds or minutes extra it might take are more than made up by the work I can do in other applications and virtual machines in parallel without "feeling" any slowdown. I admit that not everyone works like this, but I think many professionals do.

> Even VM users who have a baseline performance requirement aren't
necessarily better served by going with more cores, e.g. the Broadwell
2699 tops out at around 2.5GHz on all-core turbo which isn't going to be
great for many virtual servers.

Yes, but it can if you want to allocate VM's with multiple cores. I can allocate several dual or quad-core VM's.

I think a big factor for VM performance is how well the CPU supports VM's (e.g. Xeons sometimes have better VM support than Core i7 CPUs) and the speed of the drive on which the VM resides. I use an Intel DC P3700, which provides pretty good performance.

Posted on 2016-10-28 14:06:56
JordanViray

"Whether or not one runs applications that can make effective use of the CPU's full cumulative capability is irrelevant. "

This is where we disagree. The main purpose of comparing CPU speeds is to find out whether a given CPU will perform better than another *for a given workload*. Knowing the "full cumulative capability" doesn't really tell you enough.

"If the CPU is not at 100% utilization, then there's more potential. Would you agree with that?"

Revised to "If no thread is at 100% utilization, there's more potential", that I'd agree with. Easy to have an application be CPU limited (more likely in high core count lower frequency machines like your dual Xeon) even though task manager is reporting low CPU utilization.

"I think it assumes that everyone wants to run a single task and do nothing else. "

It does not make that assumption at all! You are getting worked up over a nonexistent problem.

"Yes, but it can if you want to allocate VM's with multiple cores. I can allocate several dual or quad-core VM's."

And if the server application can't use multiple cores? Allocating more cores won't help, only faster ones will. All of this, of course, assumes no bottlenecks in the subsystem which is honestly your only valid complaint, i.e., the Premiere Pro benchmark. Some of my VMs are on RAM disks because SSDs, including enterprise ones like the P3700 are too slow.

Posted on 2016-10-28 18:17:38
Ray Gralak

"This is where we disagree. The main purpose of comparing CPU speeds is

to find out whether a given CPU will perform better than another *for a
given workload*. Knowing the "full cumulative capability" doesn't really
tell you enough."

It's not a matter of "telling you enough." It's just one of many metrics. i can run more applications simultaneously with my dual e5-2687W V3 Xeons than I can with my 4.8GHz overclocked 6700K because the cumulative performance of the Xeons is far greater than the 6700K. You don't need to take my word for it, there are many benchmark sites which indicate this.

"If no thread is at 100% utilization, there's more potential", that
I'd agree with. "

If you have a 4-core CPU and one core is 100% are you saying that there is no more potential for that CPU? If so, I think that's just wrong!

>And if the server application can't use multiple cores?

I've already stated that applications that I write use multiple cores and I use VM's to test the apps. If I don't need multiple cores I would just allocate a VM with a single core.

> Some of my VMs are on RAM disks because SSDs, including enterprise ones like the P3700 are too slow.

Last I checked, VM's might load from disk but run in memory like regular operating systems:-). You would have a hard time distinguishing between 2000+ MB/sec Reads/Writes and RAM, because mostly the VM's execute out of RAM ona system like mine with 256GB of RAM. Also, Xeon's generally perform better for VM's than many Intel and AMD consumer CPUs.

Posted on 2016-10-28 18:55:55
JordanViray

"It's just one of many metrics."

Exactly, which is the point of the article! It's obvious that more cores allow more applications to run, but whether that translates to better performance for the end-user is uncertain.

"If you have a 4-core CPU and one core is 100% are you saying that there is no more potential for that CPU? If so, I think that's just wrong!"

There's no more potential for that user looking for their program to run faster. Just because Task Manager reports 10% CPU use for 1 core running at 100% and the others idling doesn't necessarily mean those other 9 cores can be used to speed up the task. If you believe otherwise, you're wrong. There's unused computing capacity that just *doesn't matter* if the user cares about single-threaded performance.

"I've already stated that applications that I write use multiple cores and I use VM's to test the apps. "

Yeah, and for the people not using applications that use multiple cores? Higher performance with faster cores, not more cores. Just because *you* use applications that use multiple cores doesn't mean everyone else does.

"Last I checked, VM's might load from disk but run in memory like regular operating systems:-). You would have a hard time distinguishing between 2000+ MB/sec Reads/Writes and RAM, because mostly the VM's execute out of RAM ona system like mine with 256GB of RAM. "

2000MB sequential. If your server application involves a lot of accesses of small files with a low QD, the performance of the P3700 is a pathetic 47MB/s compared to around 1500MB/s for a RAMdisk. Maybe your VM's don't require that sort of performance, but mine do. But you don't seem to think there are different requirements for computing performance outside your own.

Posted on 2016-10-28 20:04:16
Ray Gralak

"Exactly, which is the point of the article! It's obvious that more cores
allow more applications to run, but whether that translates to better
performance for the end-user is uncertain."

You can never be completely certain of the performance of any particular application on a given system until you test it. However, in this article I believe that even though the 6950x and Xeon were sometimes bested by lower-cored CPUs, they both still had extra CPU bandwidth available. That extra CPU bandwidth is not imaginary. Actual work could be done. The rough upper-end limit for any CPU is thus the maximum aggregate frequency across the cores.

So, by your metric the best performing CPU for users might be a dual core Intel G3258, which can be overclocked much higher than most Core i5/i7 CPUs. However, I will stick with my lineup of i7 and Xeon systems. :-)

"Yeah, and for the people not using applications that use multiple cores?
Higher performance with faster cores, not more cores. Just because
*you* use applications that use multiple cores doesn't mean everyone
else does."

Yes, I said that already, I admit that I am power user. Most people are not, but that doesn't mean that there are not many other users like me.

"2000MB sequential. If your server application involves a lot of accesses
of small files with a low QD, the performance of the P3700 is a
pathetic 47MB/s compared to around 1500MB/s for a RAMdisk. Maybe your
VM's don't require that sort of performance, but mine do. But you don't
seem to think there are different requirements for computing performance
outside your own."

But you have to copy the entire VM to RAM first to start it and that sum
total I/O is almost certainly going to be more than executing it from the P3700. :-) I could of course allocate a few 64GB VM's in the amount of RAM I have on my main Xeon workstation. I could also increase the RAM to 768GB so I could have 10 virtual machines in RAM and that each have 64GB of memory. And since the series i have has two separate memory buses, it has roughly twice the memory bandwidth as a consumer system.

And when a consumer i7 gets a blue screen because a RAM bit error, the ECC RAM in a Xeon system would correct it and not lose the entire VM. So, I got to ask, what kind of VM's are you running that you can stand to lose?

Posted on 2016-10-28 21:13:24
JordanViray

"You can never be completely certain of the performance of any particular application on a given system until you test it."

Bingo.

"So, by your metric the best performing CPU for users might be a dual core Intel G3258, which can be overclocked much higher than most Core i5/i7 CPUs. However, I will stick with my lineup of i7 and Xeon systems."

My personal metric is the old fashioned benchmark. And yes, a highly clocked G3258 will absolutely outperform lower clocked i5s and i7s depending on the application. I have nothing against i7s and Xeons; my daily driver is a 5 year old 6 core i7 running at 4.8GHz and my main file server uses a Xeon since ZFS works best with ECC and Xeon is where you find ECC support.

"I admit that I am power user. Most people are not, but that doesn't mean that there are not many other users like me."

And yet even for many power users, frequency is more important that having more cores. Finding out that balance requires more than "aggregate CPU frequency", can you at least agree with that?

As for the VM case I was describing, the huge dataset is a shared one that is essentially read only. It has multiple backups. The bottleneck for these clients is CPU frequency and access latency for which your dual Xeon + P3700 system would be a distinct downgrade.

Posted on 2016-10-28 21:30:26
Ray Gralak

>>"You can never be completely certain of the performance of any particular application on a given system until you test it."

>"Bingo"

Ok... but as I said I don't usually run JUST one application at a time. :-) The measurement I'm interested in is a CPU's total potential performance.

"And yet even for many power users, frequency is more important that
having more cores. Finding out that balance requires more than
"aggregate CPU frequency", can you at least agree with that?"

As I said in a previous post, I handle this situation by having multiple computers. Computers are just tools. You need to use the right one for the job. My main computer is a pretty powerful tool that let's me do most of the things that I need to do conveniently on one system. But I also have a number of specialized computers for other tasks, including one 4.8GHz 6700K, and two 4790K's (one overclocked to 4.8). The 4.8GHz 4790K has a passmark just below 8000 (And yes, it is one of many systems I built myself).

"As for the VM case I was describing, the huge dataset is a shared one
that is essentially read only. It has multiple backups. The bottleneck
for these clients is CPU frequency and access latency for which your
dual Xeon + P3700 system would be a distinct downgrade."

Sorry, I am having a hard time understanding how the "huge dataset" could be contained in your VM. How much memory is "huge"? . It seems that if you have to read from the host computer's storage that you are no longer in memory and the P3700 is gonna be looking pretty good, especially with the Xeon's direct IO facility within a virtual machine.

Posted on 2016-10-28 21:58:15
JordanViray

"Ok... but as I said I don't usually run JUST one application at a time. :-) The measurement I'm interested in is a CPU's total potential performance."

Great, and you can benchmark that too :-)

"As I said in a previous post, I handle this situation by having multiple computers."

As do I. Sometimes performance requirements mean that slower many cored CPUs are just not enough even if "aggregate CPU frequency" suggests otherwise and that multiple higher frequency CPUs are the answer, e.g. game streams. Sometimes slower many cored CPUs do make more sense. The existence of these difference scenarios is why the article is correct in stating that simply knowing "aggregate CPU frequency" is simply not enough.

"Sorry, I am having a hard time understanding how the "huge dataset" could be contained in your VM. "

The dataset isn't contained in each VM itself. The setup is 4x 3GB VMs each accessing a shared 50GB dataset. Disk traces for the client workload are mostly smallish QD1. Sure I could try 4 53GB VMs but that would mean compromising memory speeds to maybe DDR1333 which is, for this application, is much slower. Though even a DDR1333 RAMDisk still blows the P3700 out of the water in 4K QD1 through QD4.

Posted on 2016-10-28 22:29:10
Ray Gralak

"The existence of these difference scenarios is why the article is
correct in stating that simply knowing "aggregate CPU frequency" is
simply not enough."

But I think the tests to "support" that conclusion left out the percentage of latent potential CPU bandwidth for each CPU. Most people pick a CPU by it's clock speed and core count. It's not unreasonable to estimate cumulative CPU performance by taking a rough multiple of clock speed and core count.

Let me make an analogy... Say you need two things done as quickly as possible. The first thing is to repair a door in your house and the second is to build a house.

To repair the door you might hire a single worker. If you add another worker it might get done a little quicker. If you had eight workers then most would probably be sitting idle and the job doesn't get done any faster. In fact it might take longer if the workers end up each trying to take turns helping. :-) This is the analogy for a single threaded application.

To build the house using just one worker would take a long time. Using all eight workers allows the house to be finished much sooner. This is an analogy for a multi-threaded application.

Now say there are 8 homes, each with a repair job to be done. If there is one worker then the time to finish all jobs is roughly 8x the amount of time it would take each of the eight workers to go to a separate job. Thi is an analogy of running multiple applications simultaneously.

"The dataset isn't contained in each VM itself. The setup is 4x 3GB VMs
each accessing a shared 50GB dataset. Disk traces for the client
workload are mostly smallish QD1."

What are the host and client operating systems and which CPU are you running? You still need to copy 3GB to ram for each VM instance. All things being equal that's a lot of "ground" to make up for. What is the CPU- intensive application that has to be single-threaded? It seems that maybe you need a good software engineer is redesign it to be a multi-threaded.

Posted on 2016-10-28 22:58:49
JordanViray

"It's not unreasonable to estimate cumulative CPU performance by taking a rough multiple of clock speed and core count."

Sure, but better cumulative CPU performance doesn't necessarily translate to better actual performance as the article detailed. Actual performance, not analogies.

"You still need to copy 3GB to ram for each VM instance."

Sure, and the 50GB dataset also needs to be copied to RAM. It doesn't take long even with the plain SSDs they are using. Though if they constantly needed to reload the VMs and dataset from scratch, a PCI-E drive would help. But a PCI-E drive simply is too slow to replace the RAMDrive.

I'm don't think they'd be comfortable with me describing specifics but let's just say that they don't have the resources to get the software maker to redesign the program for better multi-threaded support - if it's even possible. It'd be great if all applications scaled well with cores because I'd just get the 22-core and call it a night, but they don't.

Posted on 2016-10-28 23:39:31
Ray Gralak

"Sure, but better cumulative CPU performance doesn't necessarily

translate to better actual performance as the article detailed. Actual
performance, not analogies."

Ok, I guess we're going to have to agree to disagree. You say that CPU performance is measured primarily by its single threaded performance. By your definition a highly overclocked G3258 is thus much faster overall than the 6700K, 6950X, and every Xeon ever made. I can't stop you from believing that even though the CPU passmark scores don't back that up.

"Sure, and the 50GB dataset also"

What, no details about your VM/operating system/hardware/software application? :-)

BTW, 50GB would fit nicely on a RAM Disk within a VM on my System, and would be protected by ECC RAM.

"It'd be great if all applications scaled well with cores"

Some day I think that most applications will be able to take advantage of multiple CPU cores. Besides, unless there is a technology breakthrough, the maximum single core CPU speeds are not going to increase much, if at all. I think the best real choice is to use applications that can take advantage of multiple cores.

"because I'd just get the 22-core and call it a night, but they don't."

That's not the Xeon you would want to choose today (Oct. 2016). The 26xx-series Xeon with the best balance of cores vs clock speed would be the 2687W V4. And I think that the 26xx V4 series Xeon with the highest cumulative performance is the OEM E5-2679 V4.

Posted on 2016-10-29 03:53:00
JordanViray

"You say that CPU performance is measured primarily by its single threaded performance."

Not at all. CPU performance is measured by how it performs for a given user's tasks. Nowhere did I say a highly clocked G3258 is faster overall than every other Intel processor; though it *can* be in some workloads.

"What, no details about your VM/operating system/hardware/software application?"

Nope, I set that up for a customer and I'm not the sort to share customer information, even if it's fairly general.

"Some day I think that most applications will be able to take advantage of multiple CPU cores. Besides, unless there is a technology breakthrough, the maximum single core CPU speeds are not going to increase much, if at all. I think the best real choice is to use applications that can take advantage of multiple cores."

Yeah someday, but even then, it ultimately depends on what the user wants to do. And some applications are simply not available with a multi-threaded optimized analogue. Often times you are constrained in the type of software you can use for several reasons and you have to optimize for that.

"That's not the Xeon you would want to choose today (Oct. 2016). "

Either way it doesn't matter, because the highest cumulative performance does not give me, and many other people, the highest actual performance.

Posted on 2016-10-29 08:56:53
Ray Gralak

Oh.. so your example isn't even of your own equipment and you can't even state the operating system or CPU?

Anyway, you don't have to listen to me. Just look at the Passmark hi-end or dual CPU ratings and see where the CPU's fall. They approximately line up to aggregate frequency. There's the proof of what I have been stating. If you have any alternative evidence, please post it. Otherwise, QED :-)

Posted on 2016-10-29 22:13:10
JordanViray

Passmark doesn't indicate actual performance differences for every case and I personally don't even consider it to be a good benchmark in general. Alternative evidence? Maybe you should read the article!

But VMWare with the VMs running Windows. 5960x @ 4.6GHz, RAM running at whatever the highest OC is. They tried a Xeon Workstation CPU with significantly worse results.

I'm obviously wasting time with someone who thinks that someone looking to get higher CPU performance need only look at "aggregate CPU frequency" or Passmark (lol) without considering the actual workload. But maybe if your job involves getting the highest Passmark scores possible, you have a point.

Just tell someone looking for faster x264 video encoding to buy a $6,000 dual E5-2697 machine because "Hey, it has an aggregate CPU 'bandwidth' of 86.8 and a 29000 CPU Passmark score!"

Meanwhile a $2500 6950x build with a mild 4 GHz overclock and a measly CPU "bandwidth" of 40 and a lower Passmark score ... outperforms the dual Xeon.

Not to worry though, because at least they can "listen to a background internet radio station" and run Outlook while waiting on the Xeons. They can console themselves with the fact that, even though they wasted money on a slower performing machine, at least their Passmark scores are awesome. Because in the end, isn't that what the computers are for?

Don't bother to reply (or do, I don't care). You have the honor of being the first person on my block list :-)

Posted on 2016-10-29 22:52:41
Ray Gralak

All I have to say is that using a highly overclocked i7 to run virtual machines in RAM for commercial clients seems unbelievably risky to me. Good luck with that!

>Maybe you should read the article!

The article compares performance of a single application on each system so I don't believe that the entire capability of each CPU was being utilized. Using multiple applications or a benchmark that utilizes all cores is needed to determine the cumulative performance of each CPU.

>Passmark doesn't indicate actual performance differences for every case
and I personally
>don't even consider it to be a good benchmark in
general.

If you don't believe Passmark, then there are many other benchmark sites that show that the multi-threaded performance of the higher end Xeons exceeds that of any of the Intel consumer CPUs. It may not be a perfect measurement, but I think that aggregate frequency is usually a good way to compare the total potential performance of different CPUs.

Again, the trick to utilizing the performance of multiple cores is in the software. Unfortunately it sounds like the software that is used on that overclocked server is not capable of that. :-(

Posted on 2016-10-30 13:37:35

You make a good point, and one that definitely impacts some users. For a single-purpose system, where a computer will be tasked with one thing and one thing only, Matt's article is perfect. We get customers like that too - especially folks in scientific fields and those doing very specific types of media editing. But we also work with customers who might want to edit photos while waiting for a video to render in the background, or play a game while waiting for photos to import / process into a catalog, etc. In those situations you have to go beyond what the ideal hardware is for one application and try to look at the bigger picture of what a user will have running at the same time. There are so many potential combinations, though, that it would be hard to approach that scientifically. Even the way Windows handles threading and core affinity gets tricky. So I'm not sure how we could effectively write about that, but it is definitely something to keep in mind when spec'ing out a computer :)

Posted on 2016-10-27 18:22:32
Ray Gralak

I think that the only extra bit of useful information that could have been provided was the percentage of CPU used during each test. I think that would have given readers like me an idea of how much CPU headroom they might have when running a particular application on each system. To some, like Mr. JordanViray, it might not matter, but to others it might.

As I mentioned above, I have one computer with dual 2687W V3's. I was drawn to your site because I use some of the applications that have been tested in various articles. In particular I am interested in performance information about Xeons as I am considering upgrading to dual 2687W V4's or 2690 V4's. I may also be interested in a system with a single 6950x if that best fits my needs. I tend to run many applications simultaneously on this system so I want it to be rock solid. The Xeon systems I have had in the past have met that requirement so that is why I tend to favor them. I need to take system reliability over speed in the kind of work I do with my computers.

Lastly, I very much appreciate the performance articles that you guys post. Very seldom have I had a different view from most everything written, and I almost always have learned something new. Please keep up the good work, Matt, William, and others!

Posted on 2016-10-29 22:46:22