Article Thumbnail

NVIDIA Iray GPU Performance Comparison

Written on April 13, 2016 by Matt Bach
Share:
Table of Contents:
  1. Introduction
  2. Test Setup
  3. PCI-E 3.0 x8 vs x16
  4. GPU Performance with 2x Xeon E5-2687W V3
  5. GPU Performance with Intel Core i7 6700K
  6. Conclusion
  7. Recommended Articles
  8. Recommended Systems for GPU-based Rendering

Introduction

Iray is primarily a GPU-based rendering engine and while the CPU can impact performance in some interesting ways, spending more of your budget on increased video card performance rather than CPU power should always result in much shorter rendering times. In this article, we will be looking at a number of video cards (including both GeForce and Quadro) to determine the relative performance for each model. In addition, we will be testing with up to four cards to find exactly how well Iray is able to utilize multiple video cards.

If you would rather simply view our conclusions, feel free to jump ahead to the conclusion section.

Test Setup

Since the performance of a video card can depend somewhat on the motherboard's chipset and CPU used, we will be performing our testing across two different platforms. Our main test platform will be based around a pair of Xeon E5-2687W V3 CPUs while our second platform will be based around the Intel Core i7 6700K. The Dual Xeon system will be able to provide a huge amount of CPU power and will allow us to test up to four cards at full PCI-E 3.0 x16 speeds. The Core i7 system, however, will be limited to two cards at PCI-E 3.0 x8 speeds.

Basic specifications for both machines are below:

For our test video cards, we used the following models:

You will notice that we are primarily focusing on GeForce and only testing a couple Quadro cards. This is because there is rarely a need to use Quadro for rendering but we wanted to include a couple Quadro cards to act as comparison points to make sure there are no surprises with Iray.

To make sure our results are as consistent as possible we used a custom script using AutoIt to start 3ds Max, load the test scene, then time how long it took to render the scene. We used two test scenes - one from the 3ds Max 2016 samples files and one published on maxforums.org by Nik Clark as a test Iray benchmark scene. 

Night Caustics
(autodesk_mrwhite_handout_night_caustics.max)
camera1 - 800x600 - 1000 passes - 99 objects - 43 lights

Test Scene
(by Nik Clark on Maxforums.org)
Quad 4 - 1920x1080 - 500 passes - 13 objects - 1 light

One thing we want to make very clear is that our testing is only 100% accurate for the files and settings we used. While this should be able to give us a good baseline for how well Iray runs on difference video cards, you may see slightly different results with your own scenes.

About our testing: We rely on our customers and the community at large to point out anything we may have missed in our testing. If there is some critical part of Iray you think we skipped in our testing, please let us know in the comments at the bottom of the page. Especially if you are able to provide a file that we can integrate into our testing, we really want to hear your feedback!

PCI-E 3.0 x8 vs x16

Before we get too far into our testing, the first thing we want to do is to determine if there is any performance difference between running a video card at x8 or x16 speeds with Iray. This is important because many motherboards will be limited to PCI-E 3.0 x8 speeds if you are using more than one or two video cards. While PCI-E 3.0 x16 technically has twice the bandwidth of PCI-E 3.0 x8, it is very rare for a program to fully saturate even PCI-E 3.0 x8 so we do not expect to see a difference in performance. If there is a performance difference, however, we want to find this out before we get into the majority of our testing as it could change some of our testing methodology.

PCI-E 3.0 x8 vs x16

Both of these renders were performed on our Dual Xeon system as it allowed us to test two cards at full PCI-E 3.0 x16 speeds. To force them to run at x8 speeds we simply covered half of the pins on each video card with an insulating material (a post-it note).

As you can see from the charts above, even with the fastest NVIDIA GPU available today (a GeForce GTX Titan X) we saw less than a 1% difference between PCI-E 3.0 x8 and PCI-E 3.0 x16. In fact, while we would consider all of the results to be well within the margin of error for this type of benchmark, for some odd reason x8 actually benchmarked as being slightly faster than x16.

What this means is that you do not have to worry about whether your motherboard will be able to run a video card(s) at PCI-E 3.0 x8 or x16 speeds. This is great as it really opens up the options for what platform to use if you want to have more than a single video card.

GPU Performance with 2x Xeon E5-2687W V3

NVIDIA Iray GPU Benchmarks


In the charts above, note that we only have hard results for three and four video cards with the GTX 970 and GTX Titan X. We were a bit limited on the cards we had available, but we were able to do our triple and quad GPU testing using both the fastest and the slowest GeForce video cards. Using the performance measured from those cards, we were able to calculate the approximate amount of performance gain (or speedup) you would see with three or four cards. From this, we can estimate the performance of up to four cards for the other models.

The first thing to notice is that there is clearly no advantage to using Quadro over GeForce. In fact, the Quadro M4000 was about 20% slower than the GTX 970 even though the GTX 970 is almost a third the cost. Second, when it comes to using multiple GPUs Iray performance scales pretty well. While you don't get twice the performance with two cards compared to just one card, you do get a very nice bump in performance. On average, going from one card to two reduced render times by about 33% (a third). Going from one card to three cards results in about half the render times (49% to be exact), and going from one card to four cards reduced render times by a bit less than two thirds (60%). This is enough of a gain that the strategy of using multiple, more affordable cards should give you better performance than fewer, more expensive cards.

GPU Performance with Intel Core i7 6700K


Moving on to the results for our system based around the Intel Core i7 6700K, we came across a few surprises. Everything may look as expected at first, but there are two very interesting points hidden within this data:

First, the performance gain from multiple video cards was higher than we expected. With the Dual Xeon system, we saw a decrease in render times of about 30% going from one GPU to two, but for the Core i7 system the decrease in time was roughly 44%. This means that the scaling is much better on the Core i7 system. The reason for this brings us to our second point:

If you compare these results to the results from the previous section, you will notice that even though the scaling is better on this system, in every case the Dual Xeon system was actually much faster than the Core i7 system. Looking at just the GeForce cards, the Dual Xeon system was on average about 22% faster with a single GPU and 9% faster with two GPUs. We were under the impression that Iray was supposed to be primarily a GPU-based rendering engine, but it turns out that the number of CPU cores can make a big impact on performance. In fact, the difference is enough that we published a second article just to cover CPU scaling in NVIDIA Iray.

To summarize that article, the number of CPU cores you have can increase performance by as much as 50% or more. However, the more GPU power you have (including multiple GPUs), the smaller the impact the CPU has on performance. With one GTX Titan X you might see a performance gain of ~30% with 20 cores compared to just 4 cores. If you add a second GPU, this performance gain from the extra cores drops to ~15% and further drops to only 4% if you have four video cards.

All this really means is that if you only have one or two video cards, you can get a decent performance boost by having a high CPU core count. On the other hand, it usually costs quite a bit to get a CPU (or two) that have a very high core count. So in general you would be much better off allocating more of your budget towards purchasing increased GPU power first and only worrying about the number of CPU cores after you are maxed out on GPU performance.

Conclusion

What all our testing comes down to is that you should use GeForce if possible (although mixing Quadro and GeForce should work fine if you need a primary Quadro card for other tasks besides rendering) and to prioritize having multiple video cards before worrying about the individual performance of each card.

While using multiple video cards did not result in a linear increase in performance, overall the scaling was pretty decent. With a high core count system (~20 cores), going from one card to two reduces the time to render a scene by about 33% (a third). Going from one card to three cards results in about half the render time (49% to be exact), and going from one card to four cards reduces the render time by a bit less than two thirds (60%). Due to how Iray utilizes the CPU, lower core count systems (~4 cores) will actually see a larger performance gain with more than one card (roughly 45% going from one GPU to two), although once you get up to four GPUs the number of CPU cores should only make a minimal difference.

To give you an idea of what cards you should consider at different budgets, we put together a small chart showing the best choice for a system that can handle one, two, three, or four video cards:

Best GPU choice (with theoretical Iray render time) on Dual Xeon Workstation (2x Xeon E5-2687W V3)

GPU Budget $1000 $1500 $2000 $3000 $4000
Single GPU: GTX Titan X 12GB
(120.5s)
- - - -
Dual GPU: 2x GTX 980 4GB
(103.6s)
2x GTX 980 Ti 6GB
(84.5s)
2x GTX Titan X 12GB
(82.25s)
- -
Triple GPU: 3x GTX 970 4GB
(76.8s)
3x GTX 980 4GB
(76.1s)
3x GTX 980 Ti 6GB
(66.35s)
3x GTX Titan X 12GB
(64.25s)
-
Quad GPU: - 4x GTX 970 4GB
(62.55s)
4x GTX 980 4GB
(58.3s)
4x GTX 980 Ti 6GB
(50.85s)
4x GTX Titan X 12GB
(46.15s)

In the chart above, you can see that the performance gain by going with a more expensive model of video card is never as much as simply going with a higher number of cards. In fact, if you have a CPU with a lower core count the performance gains by having more GPUs should actually be higher than what we show in our chart.

It is not always possible to install more than one or two GPUs in your system, but if you are able to you could potentially see a huge increase in performance. For example, if you have about $2000 to spend on video cards you would have the choice between two GTX Titan X 12GB cards or four GTX 980 4GB cards. The cost is almost identical, but the GTX 980's will be about 30% faster. That is a free 30% increase in performance for absolutely no difference in cost!

Of course, raw performance is often not the only consideration. Accommodating more GPUs may require a larger power supply, may not allow for additional PCI-E cards like sound cards to be used, and requires a physically larger chassis. In addition, if your renders require a large amount of VRAM (video card memory), you may need to go with a GTX 980 Ti 6GB or GTX Titan X 12GB just for the additional VRAM. That may mean you will have to give up some raw rendering performance, but it would ensure that you are able to complete the render in the first place.

Recommended Articles

If you are configuring a system for rendering with a GPU-based engine, we have a number of articles regarding the hardware requirements for various rendering engines that you may be interested in:

Recommended Hardware for GPU-Based Rendering
Summary of what you need to know when choosing hardware for a GPU-based rendering workstation.

Octane Render GPU Performance Comparison
How well does Octane perform with different models and numbers of GPUs?

NVIDIA Iray GPU Performance Comparison
How well does Iray perform with different models and numbers of GPUs?

NVIDIA Iray CPU Scaling
Does having more CPU cores give you more performance in Iray?

 

Recommended Systems for GPU-based Rendering

 

 


 

 

Also great for:

  • Redshift
  • Solidworks Visualize
  • Furryball 
  • Arion
  • Blender - Cycles
  • Any other GPU-based rendering engine!

Dual GPU

Purchase

Compact workstation

  • Intel Core i7 CPU
    (up to 10 cores)
  • Supports up to 256GB of RAM
  • Up to two NVIDIA GeForce/Quadro video cards

Quad GPU

Purchase

Maximum Rendering Performance

  • Intel Core i7 CPU
    (up to 10 cores)
  • Supports up to 512GB of RAM
  • Up to four NVIDIA GeForce/Quadro video cards

Tags: NVIDIA, Iray, GPU, Video Card