Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/883
Article Thumbnail

ANSYS Mechanical & Fluent Benchmark Analysis

Written on January 6, 2017 by Matt Bach
Share:

Introduction

When purchasing a workstation for ANSYS® Mechanical™ or ANSYS® Fluent®, it can be a daunting task to ensure that you are choosing the hardware that will give you the best performance for your money. The challenge is ensuring the hardware information you find is up to date, geared towards your type of setup (workstation vs cluster), and not made up of cherry-picked facts used for marketing purposes. ANSYS actually has a wealth of raw benchmark data available on their website for both ANSYS Mechanical and ANSYS Fluent that is terrific, but the data can often be difficult to make sense of unless you are extremely familiar with how computers work. There are some gaps in the information - such as no Fluent benchmarks with a GPU - but there is still a ton of useful information to be gained by examining the results.

In this article, we want to analyze some of the benchmark data to help you to understand what kind of hardware should give you the best performance for the different ANSYS simulation software packages. Based on these benchmarks, we want to answer two main questions:

  1. How well does ANSYS Mechanical and Fluent software scale across multiple CPU cores?
  2. Should you use a GPU or accelerator card for ANSYS Mechanical?

ANSYS Mechanical CPU Core Scaling

To see how well ANSYS Mechanical scales with more CPU cores, we chose two platforms from the ANSYS benchmark page that had in-depth single machine core solver rating results. Based on these results, we will be able to utilize Amdahl's Law to determine exactly how efficient ANSYS is at using a high number of CPU cores - which will in turn allow us to make educated recommendations as to which CPU models should work best. Rather than looking at every benchmark ANSYS has available, we are going to focus on the Power Supply Module (V17cg-1), Tractor Rear Axle (V17cg-2), Gear Box (V17ln-1), and Semi-Submersible (V17sp-2) benchmarks. The two platforms (as named on ANSYS.com) we will be using for this analysis are:

  • Hewlett Packard Enterprise DL380 Gen9, Intel E5-2667v4 CPU, 512GBs 16 2400MHz DIMMs GB RAM, RHEL 7.2, N/A, DMP config, 16 Total CPU Cores, 0 GPUs
  • Dell Inc. PowerEdge R730, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz CPU, 256 GB RAM, Red Hat Enterprise Linux Server release 7.2 (Maipo), NVIDIA Tesla K80, DMP config, 28 Total CPU Cores, 2 GPUs

Dual Intel Xeon E5-2667 V4 8 Core (16 cores total)
No GPU




Dual Intel Xeon E5-2680 V4 14 Core (28 cores total)
NVIDIA Tesla K80




For the Dual Xeon E5-2667 V4 system without a video card, we saw very decent scaling with an average multi core efficiency of 95.5% and a maximum of 98%. What this means is that every core you add is on average going to be about 95.5% as effective as the core before it. This is fairly good from an efficiency standpoint, although at higher core counts there will still be a point where it will be better to purchase a CPU with a slightly higher clock speed rather than a slightly higher core count.

When we look at the system with an NVIDIA Tesla K80 accelerator, however, the scaling changes quite a bit. Since the Tesla card is doing a portion of the calculations, the CPU is no longer the limiting factor. Because of this, each CPU core that is added is not as effective as it otherwise would have been resulting in an average multi core efficiency of just 87%. You can clearly see how big of a difference this makes in the charts above when you compare the performance with 16 cores against the performance with 28 cores. In most cases, there is only a very small increase in performance. The raw time to finish a calculation will still be faster if you use an accelerator card like the Tesla K80, although we will go into this in more detail in the Should you use a GPU or accelerator card for ANSYS Mechanical? section.

ANSYS Fluent CPU Core Scaling

ANSYS does not have a wide range of single workstation benchmark results available for Fluent, but there are enough that we are still able to do some analysis of how well it is able to take advantage of higher CPU core counts. Again, we are going to focus on just four benchmarks, this time the Cavity flow in a centrifugal pump (pump_2m), External Flow Over an Aircraft Wing (aircraft_2m), Boeing Landing gear analysis (landing_gear_15m), and Flow through a combustor (combustor_12m) benchmarks. There is only a single platform that uses both a modern CPU and has in-depth single machine results:

  • HP Proliant XL230 Gen9, 2.6 GHz 16 core Broadwell, 128GB RAM, RHEL 7.2 Single Node, turbo off

Dual Intel Xeon E5-2697A V4 16 Core (32 cores total)
No GPU



For the ANSYS Fluent software, we saw extremely good multi core efficiency with an average of 98.4%. This is good enough that unless the two CPU options you are looking at have vastly different operating frequencies you will almost always want to go with the highest core count CPU that is within your budget.

Unfortunately, there are currently no results available that include the use of a GPU or accelerator card so we cannot say how that might affect CPU scaling.

Should you use a GPU or accelerator card for ANSYS Mechanical?

GPU acceleration is a very popular topic in the HPC community right now, but not every software package benefits from it to the same degree. To see if using an NVIDIA Quadro or Tesla card is necessary or beneficial for ANSYS Mechanical, we decided to look at the results from a variety of platforms across the majority of the mechanical benchmarks ANSYS has available. Rather than trying to go over the individual results for each platform we are instead going to look at the overall average performance between each. If you would like to view the individual benchmark results (as well as a link to the benchmark page and the full platform name if you wish to examine the data in more detail) feel free to expand the option below:

[+] Show raw benchmark results

ANSYS Mechanical Benchmark Comparison
To help make sense of the performance between each platform, we normalized the results to the Dual Xeon E5-2667 V4 system with no GPU. As a further guide, we also included the pricing for a workstation with the same CPU (and Tesla card when appropriate) if you were to purchase it from Puget Systems today. This should help give a sense of not only the performance difference between each system, but the cost as well.

Starting at the bottom with the two Xeon E5-1680 V4 systems, it is clear that adding a Tesla K40 gives a great boost in performance. Since the results in the chart are relative to the Dual Xeon E5-2667 V4 system, it isn't perfectly clear what the actual performance difference is between these two configurations, but it works out to about a 47% increase in performance for a 67% increase in cost.

However, you will notice that the Dual Xeon E5-2690 V4 system with no GPU is actually a bit cheaper than the E5-1680 V4 system with a Tesla K40, but is significantly faster - about 60% faster to be precise. If you wanted to compare this to the E5-1680 V4 system with no GPU, you would be looking at a 2.2x increase in performance for just a 60% increase in price. That is a much more effective investment!

Overall, it appears that while using a GPU or accelerator can improve performance, simply spending the same amount of money on higher-end CPUs should net you even greater performance gains. Based on the benchmark results available on ANSYS.com, there are really three times when we would advise significantly investing in a GPU or accelerator for ANSYS:

  1. You are already using the fastest Dual Xeon configuration and can't justify the cost jump to a Quad Xeon workstation.
  2. You have a Quad Xeon system already, and want to get even more performance out of a single workstation.
  3. You are space limited and need to cram the most performance possible in a small space.

Conclusion

Examining the benchmark data available on ANSYS.com provides some really great insights for how different CPU and GPU options affect performance for their simulation packages. Not only have we been able to calculate an approximation of how well the software scales across multiple CPU cores, we also have a good idea of how well ANSYS simulation software is able to utilize GPU acceleration. Based on our multi core scaling analysis, we have been able to utilize Amdahl's Law to get a good idea of which CPUs should give you the best performance:


There are plenty of other CPUs that can work very well with ANSYS Mechanical or Fluent but these are the models we would recommend using in your workstation. Each model was chosen to ensure that as you spend more money, you are also getting a noticeable increase in performance. Due to how well the software scales, you will notice that the majority of our recommendations are Dual Xeon configurations. There are a couple single CPU options that should work well, but much of the performance gains are going to be found with multiple physical CPUs. At the same time, due to the extremely high core counts available on modern CPUs, there is really only one Quad Xeon option that we would recommend. There are other Quad Xeon CPUs available for purchase, but based on the multi core efficiency we calculated this is the only one that will give significantly better performance than a Dual Xeon E5-2699 V4 machine.

While we did see some good performance gains with an NVIDIA Tesla accelerator, which presumably means a Quadro card would also give some gains, the price to performance ratio is not nearly as good as simply upgrading to a more powerful CPU option. If your budget is in-between two of our CPU recommendations (or you can afford more than the Quad Xeon offers by itself) it might be a good idea to invest in a Quadro/Tesla card, but otherwise we would advise you to focus on the CPU first.

If you are looking for an ANSYS simulation workstation, we have a number of platforms available for Single, Dual, and Quad Xeon configurations that offer the CPU options listed above as well as NVIDIA Quadro and Tesla accelerator cards:

Single Xeon
Peak Mini

Purchase

Dual Xeon
Peak Tower

Purchase

Quad Xeon
Peak Tower

Purchase

Tags: ANSYS, CPU, Processor, GPU
Jairo Vindas

Excellent article. It's pretty difficult for a beginner to understand what kind of hardware is the best for physics simulations. You did some tests with Solidworks and now present the results from one of the more frequentenly used physics simulation packages. Most of the other hardware sites focus on synthetic benchmarks or games when testing hardware. And they usually forget to test applications that have real world use. One of the most difficult parts or learning a craft is choosing a platform that won't slow you down while learning, and in this respect, you give great insight. Keep up the good work!!!

Posted on 2017-01-10 03:37:04
Muteb

Great work. It would be intersting to see how broadwell-E CPUs perfrom in these tests when compared to xeons. Where do you think the 6950X would be placed? higher than a single E5 2680 v4 or inbetween the E5 2680 v4 and E5-1660 v4?

Posted on 2017-01-21 09:47:05
Dominic Afonso

it would be great if you could add the cost to your final chart of relative performance.

Posted on 2017-03-16 17:03:05