Puget Systems print logo


Read this article at https://www.pugetsystems.com/guides/786
Article Thumbnail

NVIDIA Iray CPU Scaling

Written on April 13, 2016 by Matt Bach


Iray is a GPU-based rendering engine that is currently owned and developed by NVIDIA. Even though it is GPU-based (it uses the video card rather than the CPU to do most of the rendering), we discovered some interesting interactions with the CPU during the testing for our NVIDIA Iray GPU Performance Comparison article.

Before we get too far, it is important to understand the two most basic CPU specifications:

  1. The frequency is essentially how many operations a single CPU core can complete in a second (how fast it is).
  2. The number of cores is how many physical cores there are within a CPU (how many operations it can run simultaneously).

If life was easy, you would be able to compare the performance of different CPUs by simply multiplying the number of cores by the frequency - creating a sort of "combined" or "aggregate" frequency. Unfortunately, even for 100% CPU-bound tasks that is not how it works in the real world. In fact, it is even less true for Iray since the CPU is only a relatively minor consideration compared to the video card. To see exactly how Iray reacts to different numbers of CPU cores we will need to benchmark with both various numbers of CPU cores as well as with different GPU configurations since that is highly likely to change the results.

If you would rather skip all of the testing and simply view our conclusions, feel free to jump ahead to the conclusion section.

Test Setup

To accurately benchmark Iray, we will be using a pair of Xeon E5-2687W V3 3.1GHz Ten Core CPUs . This will allow us to test with up to 20 physical CPU cores across two CPUs to see exactly how well Iray is able to utilize both a high number of cores as well as multiple physical CPUs. The full specifications for our test system are:

Testing Hardware
Motherboard: Asus Z10PE-D8 WS
CPU: 2x Intel Xeon E5-2687W V3 3.1GHz Ten Core
RAM: 8x Crucial DDR4-2133 16GB ECC Reg.​ (128GB total)
GPU: 1-4x NVIDIA GeForce GTX 970 4GB
1-4x NVIDIA GeForce GTX Titan X 12GB
Hard Drive: Samsung 850 Pro 512GB SATA 6Gb/s SSD
OS: Windows 10 Pro 64-bit
PSU: Antec HCP Platinum 1000W
Software: 3ds Max 2016 SP3 V2 using Iray 2016 1.0.1

To make sure our results are as consistent as possible we used a custom script using AutoIt to start 3ds Max, load the test scene, adjust how many CPU cores are available by setting the number of allocated CPUs, then time how long it took to render the scene. We used two test scenes - One from the 3ds Max 2016 samples files and one published on maxforums.org by Nik Clark as a test Iray benchmark scene. 

Night Caustics
camera1 - 800x600 - 1000 passes - 99 objects - 43 lights

Test Scene
(by Nik Clark on Maxforums.org)
Quad 4 - 1920x1080 - 500 passes - 13 objects - 1 light

To analyze the results of our testing, we will be presenting our results in terms of how long it took to render the scene with up to twenty cores with between one and four video cards. One thing we want to make very clear is that our testing is only 100% accurate for the files and settings we used. While this should be able to give us a fairly accurate measurement for how well Iray can use multiple CPU cores, it may not exactly match what you will see with your own scenes.

About our testing: We rely on our customers and the community at large to point out anything we may have missed in our testing. If there is some critical part of Iray you think we skipped in our testing, please let us know in the comments at the bottom of the page. Especially if you are able to provide a file that we can integrate into our testing, we really want to hear your feedback!

1-20 Core Testing with 1-4x GTX 970

In the chart above, we are showing the time it took to render these two scenes on either one, two, or four GTX 970 video cards with between one and twenty CPU cores being active. There are a number of very interesting observations to be made, but the most important is that the amount of speedup you get from having more cores decreases as you add more video cards. While a single GTX 970 saw an average performance gain of ~70% from one core to twenty cores, this dropped to 51% with two GTX 970s and only 14% with four GTX 970s.

What is odd about this speedup is that it is actually nearly linear. Typically, you will see diminishing returns as you add more and more cores to a system (based on the parallel efficiency curve determined by Amdahl's Law) but that is not true in the case of Iray. Instead, we found that for the first CPU, every core you add gives a 4.5% performance bump with a single GTX 970, 4% with two GTX 970s, and only .9% with four GTX 970s. Once you get to the second CPU, however, this drops to just 3%, 1.5%, and .6% respectively (likely due to the additional overhead associated with multiple CPUs).

1-20 Core Testing with 1-4x GTX Titan X

NVIDIA Iray CPU performance

Moving up from GTX 970 video cards to GTX Titan X cards, we see a similar trend in performance. The only real difference here is that the amount of performance gain per core is much smaller. Instead of seeing a 4.5%, 4%, and .9% gain in performance with the first CPU, we are only seeing a gain of 3%, 1%, and .45% for one, two, and four GTX Titan X cards. Once we started using the second CPU, this dropped even further to only 2%, .8%, and .35% respectively.

Combining these results with the ones from the GTX 970, it appears that the more GPU power you have in the system (whether it is from a higher-end GPU or simply more physical cards), the smaller the impact the number of CPU cores has. 

Does CPU Frequency Matter?

For CPU-specific tasks, it can usually be taken as an assumption that a CPU with a higher frequency will be faster than a CPU with a lower frequency in a somewhat linear fashion. Iray, however, is primarily a GPU-oriented rendering engine so this may not be accurate. We've already found that the performance gain from adding more CPU cores is a bit odd so we wanted to run a few tests to see if the CPU frequency also causes any strange behavior.

To find out, we took the Intel Core i7 6700K results from our NVIDIA Iray GPU Performance Comparison article and compared them to the results we saw from the previous two sections when we were running the Xeon E5-2687W V3 CPUs with just four cores in total. This gives us two comparison points: One with a Intel Core i7 6700K CPU that is running at 4GHz with 4 cores and a Intel Xeon E5-2687W V3 CPU that is running at 3.2GHz that has been locked to 4 cores. Note that due to Turbo Boost, a CPUs frequency varies primarily based on the number of cores that are active and this 3.2GHz is the speed the 2687W V3 is actually running at when four cores are active.

NVIDIA Iray CPU performance with 4 cores

As expected, the Intel Core i7 6700K with a higher frequency is faster than the Xeon E5-2687W V3 that has been locked to just 4 cores. However, the difference in performance is much smaller than it should be considering the fact that the 6700K is running at a frequency that is 20% higher than the E5-2687W V3. Despite the much higher frequency, the 6700K was only 3% faster on average.

In fact, the difference is so small that it is very likely that the chipset the two systems are based on (Z170 vs C612) or the architecture of the CPU is likely what is causing the difference in performance. Z170 with a 6700K CPU as a platform is currently two revisions newer than C612 with a Xeon E5 V3 CPU and contains many advances that should make it clock-for-clock faster. This very likely explains the difference in performance rather than the frequency the CPUs are running at.


NVIDIA Iray may be a GPU-based rendering engine, but our testing has shown some very interesting behavior when it comes to the CPU as well. Our testing can be summarized with the following four points:

  1. In some situations you can achieve decently higher performance in Iray by having more CPU cores, but it is extremely cost-inefficient.
  2. The performance gain per core depends highly on the amount of GPU power you have in the system. The higher the total GPU power - through either higher-end cards or multiple physical cards - the smaller the performance gains per core.
  3. This performance gain ranges from a high of just 4% per core with a single GTX 970 all the way down to .35% (basically nothing) with four GTX Titan X video cards.
  4. CPU frequency has a minimal impact on the performance of Iray. It is possible that you will see a slowdown if you use a CPU with a very low frequency, but we saw no difference between 3.2GHz and 4GHz.

Combined with the results from our NVIDIA Iray GPU Performance Comparison article, this makes it very clear that if you are designing a new system specifically for Iray you should prioritize your budget towards purchasing a high number of powerful video cards long before worrying about upgrading the CPU. A faster CPU can help with things like scene load times, but a system with plenty of GPU power will make your CPU choice basically moot when it comes to rendering

It may seem odd that we took the time to publish a an article about CPU scaling just to say that it doesn't matter, but this is extremely useful information to have. Just as an example of how important it is to prioritize the GPU for Iray, below are two systems that you could configure on our website. One puts an emphasis on the number of CPU cores and the other on having more video cards:

Puget Genesis II

2x Intel Xeon E5-2660 V3 (2.6GHz, Ten Core)
NVIDIA GeForce GTX 970

Cost: ~$6600

Puget Peak Single Xeon

Intel Xeon E5-1630 V3 (3.7GHz, 4 core)
4x NVIDIA GeForce GTX 970

Cost: ~$4500

Even though the Genesis II system is ~$2000 more expensive and has a much higher core count (20 cores vs 4 cores), it should actually be much slower than the other system. In fact, the system on the right with four GTX 970 video cards should be more than three times faster at only ~2/3 the cost! And if you actually had a $6600 budget, you could upgrade those GTX 970 video cards to GTX 980 Ti cards for a further performance bump and still be slightly under budget. That is a huge difference in performance and really shows how important it is to have the right hardware in your system rather than just buying whatever configuration happens to match your budget.

Recommended Systems for GPU-Based Rendering

Tags: Iray, Rendering, CPU, Processor
Alex Taguchi

Fantastic information. SOLIDWORKS Visualize, previously known as Bunkspeed Shot/HyperShot, is an IRay based rendering engine and offers CPU, GPU, and Hybrid options. I'm wondering if your results here and from the GPU scaling article are still applicable in that Hybrid option. Since it's a SOLIDWORKS product, most of these users have 0 need for multi-GPU and will almost always have a NVIDIA Quadro and not a GeForce. This might make hardware recommendations tricky for those who are SOLIDWORKS CAD users AND Visualize users. Multi-Quadro is likely the better option in these scenarios I think to receive full certification for SOLIDWORKS, yet still have decent Iray rendering performance.

Posted on 2016-08-12 15:35:56