# Mental Ray Multi Core Performance

Written on March 30, 2016 by Matt Bach
Share:

### Introduction

Mental Ray is a CPU-based rendering engine that is currently owned and developed by NVIDIA. While NVIDIA is primarily a GPU-focused company, Mental Ray does not currently have many features that are GPU accelerated. Instead, the primary hardware component that is going to impact render times is the CPU. In this article we want to benchmark Mental Ray and analyze how effective it is at using multiple CPU cores. Based on this analysis, we can then determine exactly what CPU models should give you the best possible performance for your budget.

Before we get too far, it is important to understand the two most basic CPU specifications:

1. The frequency is essentially how many operations a single CPU core can complete in a second (how fast it is).
2. The number of cores is how many physical cores there are within a CPU (how many operations it can run simultaneously).

If life was easy, you would be able to compare the performance of different CPUs by simply multiplying the number of cores by the frequency - creating a sort of "combined" or "aggregate" frequency. Unfortunately, that is not how it works in the real world. In reality, the way to calculate the effective performance of a CPU is a bit more complicated and includes using a formula derived from Amdahl's Law

### Theoretical CPU Performance:

$S=F*\frac{1}{(1-P)+\frac{P}{n}}$

• $S$ is the theoretical overall speed of the CPU
• $F$ is the frequency of the CPU
• $P$ is the fraction of the task that can be made parallel
(parallel efficiency)
• $n$ is the number of CPU cores

This doesn't take into account the differences between CPU architectures - which can make a one generation newer CPU as much as 10-15% faster than an older CPU with identical specs - but what this formula is saying is that the theoretical overall speed of a CPU (S) depends on both the frequency it runs at (F) and an "adjusted" core count that takes into account how much of the task is actually able to be run across multiple cores (P). This P value is commonly called the "parallel efficiency" and while it is very often overlooked when comparing CPU models, it is actually extremely important. The higher the efficiency, the better performance you will see with a higher core count. The lower the efficiency, the less effective more CPU cores will be which makes it more important it is to have a CPU that runs at a higher frequency.

What we will be doing is performing a series of benchmarks to find out the speed of Mental Ray with various numbers of cores, then using another formula from Amdahl's Law to determine it's parallel efficiency (P). Once we have this efficiency number, we can then leverage the formula above to estimate the relative performance of a CPU based on the number of cores and the frequency it runs at. By doing so, we can ensure that you will not purchase a more expensive CPU that actually gives you less performance than a cheaper option.

If you want a more in-depth explanation of how we use Amdahl's Law, we have a guide available for Estimating CPU Performance using Amdahls Law.

### Test Setup

To accurately benchmark Mental Ray, we will be using a pair of Xeon E5-2687W V3 3.1GHz Ten Core CPUs . This will allow us to test with up to 20 physical CPU cores across two CPUs to see exactly how well Mental Ray is able to utilize both a high number of cores as well as multiple physical CPUs. The full specifications for our test system are:

 Testing Hardware Motherboard: Asus Z10PE-D8 WS CPU: 2x Intel Xeon E5-2687W V3 3.1GHz Ten Core RAM: 8x Crucial DDR4-2133 16GB ECC Reg.​ (128GB total) GPU: NVIDIA GeForce GTX 980 4GB Hard Drive: Samsung 850 Pro 512GB SATA 6Gb/s SSD OS: Windows 10 Pro 64-bit PSU: Antec HCP Platinum 1000W Software: 3ds Max 2016 SP3 V2 using Mental Ray

To make sure our results are as consistent as possible we used a custom script using AutoIt to start 3ds Max, load the test scene, adjust how many CPU cores are available by setting the affinity of the process, then time how long it took to render the scene. For our test scenes, we used three Mental Ray sample scenes from the 3ds Max 2016 samples files. We left these scenes on the pre-set render settings when possible although we did modify the Arch Interior scene slightly. For that scene, we increased the resolution to 1920x1080 to make the single frame testing a bit more intensive. When we tested that scene as an animation, however, we left the resolution at 640x480 and limited the animation to just 11 frames as otherwise it would have taken an absurdly long time to render with the lower core counts.

Night Caustics
(autodesk_mrwhite_handout_night_caustics.max)
Single frame - camera1 - 640x360 - 99 objects - 43 lights

Living Room
(living.room.max)
Single Frame -camera03 - 510x680 - 59 objects - 3 lights

Evermotion Arch Interior Vol1 Scene09
(evermotion.arch.interior.vol.01.scene.09.mentalray.max)
Single frame - camera10 - 1920x1080 (HDTV) - 219 objects - 3 lights
Animation - camera10 - 640x480 - 11 frames - 219 objects - 3 lights

To analyze the results of our testing, we will be presenting our results in terms of how long it took to render the scene with up to twenty cores compared to how long it took to complete with just a single core. From these results, we will then use Amdahl's Law to estimate the parallel efficiency for the render. 100% is perfect efficiency where a high core count CPU is ideal, but as the efficiency drops having a CPU with a higher frequency (even at the cost of a lower core count) becomes more and more important.

One thing we want to make very clear is that our testing is only 100% accurate for the files and settings we used. While this should be able to give us a fairly accurate measurement for how well Mental Ray can use multiple CPU cores, if you want more accurate results for the scenes you tend to work with we recommend following our Estimating CPU Performance using Amdahls Law guide. It can be a time consuming process (even with automation and rendering the scenes at only moderate settings the testing for this article took a significant amount of machine time) but it is really the only way to know for sure what the parallel efficiency is for exactly what you do.

About our testing: We rely on our customers and the community at large to point out anything we may have missed in our testing. If there is some critical part of Mental Ray you think we skipped in our testing, please let us know in the comments at the bottom of the page! Especially if you are able to provide a file that we can integrate into our testing, we really want to hear from you.

### Single Frame Render Test Results

From these three renders, we saw an efficiency of 98%, 95.25%, and 97% respectively. If we combine all of these results, we get a overall multi core efficiency of ~97% when rendering a single frame using Mental Ray. 97% may sound very high (and it is actually quite decent), but what this basically means is that instead of seeing a speedup of 20x with 20 CPU cores - which is what a program with 100% efficiency would have - we actually only saw an average speedup of about 13x. Due to this, in many cases a CPU with a slightly lower core count but higher frequency may out-perform a CPU with a higher core count and lower frequency.

### Animation Render Test Results

Animations have a bit more going on than just a single frame render, since the program needs to do a number of calculations between each frame. These in-between tasks are often single-threaded (only able to use one core) so the overall parallel efficiency for an animation tends to be lower compared to rendering a single frame. In fact, that is exactly what we saw in our testing.

While the single frame render for this scene had an efficiency of 97%, rendering an 11 frame animation dropped the efficiency down to 95.25% That isn't a huge drop, but it basically makes our 20 cores about 25% less effective than they were when rendering just a single frame. One thing to keep in mind is that the longer it takes to render each individual frame of an animation, the less impact there should be the overall multi core efficiency. This is due to the fact that doing things like changing the camera's location in space tends to take a fairly fixed amount of time - so the less relative time is spent on those actions, the less impact there should be on the overall efficiency.

### Conclusion

Overall, Mental Ray has fairly good CPU scaling and is able to take advantage of a large number of CPU cores. It is not perfect, however, so in order to ensure that you are purchasing the exact right CPU we need to determine the relative performance of Mental Ray for various CPUs. To do this, we will once again Amdahl's Law - but this time in reverse to find the theoretical performance of a CPU based on it's core count, frequency, and the parallel efficiency we measured. Note that these results were calculated using the "all-core Turbo Boost" frequency (the speed at which the CPU actually runs when all cores are active) and not the base frequency that is in the product name. We've covered the difference between the advertised frequency, maximum Turbo Boost and that all-core Turbo Boost in a number of recent posts including Xeon E5v3 All Core Turbo Boost and Amdahl's Law and Actual CPU Speeds - What You See is Not Always What You Get. Using the all-core Turbo (which unfortunately is not often listed int he CPU's specifications) results in a much more accurate estimation of the real-world performance for each CPU.

[+] Show estimated performance for all Xeon E5 V3 CPUs

If you wish to see the estimated performance for all single and dual E5 V3 CPUs (as well as the Core i7 equivalents), feel free to expand the option above. If you really dig into these numbers, you will find that it is very easy to end up spending more money on a CPU that will actually give you lower performance with Mental Ray. To help prevent that, we closely examined both the estimated performance numbers as well as the cost associated with each CPU choice and found seven CPU options (three single CPU and four dual CPU) that should give you the best performance for their price:

To give you an idea about how each of these CPUs will perform relative to each other (without having to dig through the collapsed table above) we created a graph showing the theoretical time to render a scene in Mental Ray that has the same parallel efficiency as those we tested:

As you can see, each increase in model (as well as price) gives a significant decrease in render times. The difference in render times between each of the CPUs above ranges from 8% to almost 20%, but on average the difference is about 14.5%.

One last thing we will note is that while these CPUs should give you the best bang for your buck for Mental Ray, they may not be the best CPU for your system as a whole. Depending on what else you use the system for, it may be better to sacrifice a bit of performance in Mental Ray to substantially increase the performance of any other programs you regularly use. This can often be a tricky balance to find, so we highly recommend speaking with one of our consultants before purchasing a system that is going to be used for multiple tasks.

Phone: 425.458.0273
Email: sales@pugetsystems.com

If you are configuring a system for rendering with a CPU-based engine, we have a number of articles regarding the hardware requirements for various rendering engines that you may be interested in:

Recommended Hardware for CPU-Based Rendering
Summary of what you need to know when choosing hardware for a CPU-based rendering workstation.

Mental Ray Multi Core Performance
How well does Mental Ray utilize a high number of CPU cores?

Keyshot Multi Core Performance
How well does Keyshot utilize a high number of CPU cores?

NVIDIA Iray CPU Scaling
Does having more CPU cores give you more performance in Iray?

### Recommended Systems for CPU-based Rendering

Also great for:

• Arnold
• Solidworks Photoscan 360
• Autodesk Scanline
• Autodesk Raytracer (ART)
• Corona Renderer
• Any other CPU-based rendering engine

### Single CPU

Compact workstation

• Intel Core i7 CPU
(up to 10 cores)
• Supports up to 256GB of RAM