Puget Systems print logo
Read this article at https://www.pugetsystems.com/guides/1062
Article Thumbnail

V-Ray RT 3.6 Hybrid Mode with AMD Threadripper 1950X and NVIDIA Titan Xp

Written on October 13, 2017 by William George


We recently performed testing on the latest version of V-Ray RT, which introduced a "Hybrid Mode" that utilizes both the CPU and GPUs in a system to provide extremely fast rendering. That test was run on several Core X series processors, formerly codenamed Skylake X, along with up to three GeForce GTX 1080 Ti graphics cards. The results were very interesting, with the addition of the CPU to the rendering pool definitely providing a benefit in terms of speed. However, our current Core X motherboards max out at three GPUs - yet V-Ray RT and similar rendering engines can scale to even more. As of this article's publication, our quad-GPU platform uses AMD's Threadripper processors instead... so I wanted to provide performance data showing how those processors compare to Intel's Core X in this application.

There are a few differences in the test system used here, though, which prevent direct comparisons from being made. Primarily, we did not have access to a full set of four GTX 1080 Ti cards. Those have been in shortage, so I had to use Titan Xp models instead. Those are noticeably faster than the 1080 Ti, though also a lot more expensive. That means that results from the Threadripper CPU + 1-4 GPUs here are not fair to stack up against the previous Core X data points. However, we can directly compare the CPU-only test results (when no GPUs were being used). As a bonus, we can also compare the GPU-only results (without the CPUs in the mix) to see how the 1080 Ti and Titan Xp cards stack up.

Before we dive into the test platform, methodology, and results here is a little background for those who may not have read our previous article. With V-Ray RT 3.6, you can now use both the CPU and GPUs in a single computer! This is called Hybrid Rendering, and promises a free boost to rendering speeds without any additional complexity for users. The way it works is pretty ingenious: the folks at Chaos Group figured out a way to run CUDA code on the CPU. CUDA is the language used to perform general computation on NVIDIA graphics cards, and has been used by V-Ray RT for quite a while - but until now it could only run on GPUs. Being able to run the same code on CPUs as well was originally designed to allow for easier debugging, but it turned out to also be a nice way to get a speed boost when rendering without any additional hardware requirements. Chaos Group posted a great log post about this, if you want more info.

Test Hardware and Methodology

We used a single AMD Threadripper configuration for the new testing we performed, but since select results from our previous article are also included below this chart shows the specs of both platforms:

The test methodology was the same as last time, so that the results would be comparable. To measure performance, we opened a complex indoor scene within 3ds Max 2017, switched the render engine to V-Ray RT, and then rendered it with the default settings.

Screenshot of finished render

What changed between runs was the mode selection within V-Ray RT and the CUDA device(s) being used. We ran first with the CPU alone, in both CPU and CUDA modes, and then also with every possible combination of 1-4 video cards.

Results - Threadripper 1950X

First up, we have the full run of GPU combinations using the AMD Threadripper 1950X processor:

V-Ray RT 3.60.03 Rendering Performance with Threadripper 1950X

Two results are shown for each hardware combination. The blue bar is the total time the rendering process took, including pre-render steps, while the red is the main render phase alone. This lets us observe some interesting things:

  • In all of the tests that included one or more GPUs, the pre-render steps (the difference between the two results we show) is very consistent. The Threadripper 1950X processor takes 80-100 seconds to perform those functions, regardless of how many GPUs are involved in the final render.
  • Running the render on secondary video cards is substantially faster than doing so on the primary video card - the one which is handling display output. We saw this on the Core X system as well.
  • The 1950X processor alone, in CUDA mode, is comparable in rendering performance to a single Titan Xp video card (when it is the primary card in a system).

Results - Threadripper 1950X vs Core X CPUs

Next, we can compare the CPU-only results from the X1950 test above with our previous data from the Core X series processors:

V-Ray RT 3.60.03 CPU Only Rendering Performance on X1950 vs Core X CPUs

AMD's top-end Threadripper, highlighted with orange borders, lands right in the middle of the pack of Intel processors. As we have seen with Threadripper in other heavily threaded applications, it does better in terms of price:performance ratio than Intel's models... but Intel does offer more costly CPUs with higher absolute performance. There are also a couple of additional observations worth making:

  • The rendering speed difference between CPU mode and CUDA mode is greater on the AMD system than on the Intel processors, but that probably doesn't matter since the ideal software setup on both platforms is CUDA mode (even without using GPUs).
  • While I did not include the lower-end 1920X (12 core) Threadripper processor in my testing, I am pretty confident that it would have landed between the Core i7 7820X and Core i9 7900X in terms of speed (closer to the Core i9, at least in CUDA mode). That puts its $799 price point more in line with Intel's options, so it probably isn't as good of a value as the 1950X in this application.

Bonus Results - GeForce 1080 Ti vs Titan Xp GPU Comparison

We cannot fairly compare the CPU + GPU results in this test with the prior Core X findings, because of the difference in both CPU platform and the video cards used. However, just as we could look at the CPU-only results, we can also look at the GPU-only data points to see how the GTX 1080 Ti compares to its bigger sibling: the Titan Xp. We will use the main rendering times only here, so that the impact of different CPUs on the pre-render steps does not affect the GPU comparisons unfairly.

V-Ray RT 3.60.03 GPU Rendering Performance Comparison with GeForce GTX 1080 Ti vs Titan Xp

The faster Titan Xp wins every time, of course, but the margin between the cards is right around 10% most of the time. The biggest outlier was when a single, secondary card was used... but even then, it only reached about a 15% difference. In some of the other tests it dips under 10% as well, so that is a pretty safe average. And while we only have test results with up to 3 GPUs for the 1080 Ti, we can use that average to project roughly how they would perform with more cards. Those estimates are highlighted with orange borders in the chart above.

So, given the price difference ($300-600, as availability fluctuates) between the 1080 Ti and Titan Xp, is the performance increase worthwhile? That probably depends on how many GPUs you are getting. With a single card, in the grand scheme of a several-thousand-dollar workstation, it may well make sense to spend about 10% more and get a roughly similar increase. However, in such situations it may be even better to go with two more modest cards (1070s or 1080s). Likewise, as you go up to 2, 3, or 4 GPUs then the added cost to have them all be Titan Xp models instead of the 1080 Ti gets much bigger. If you want the absolute best rendering speeds in V-Ray RT (or a similar GPU based rendering engine) then it could be worthwhile, but the 1080 Ti comes very close for a much lower price.


As we saw with the Core X processors, this new Hybrid Rendering mode in V-Ray 3.6 is quite beneficial. You will already have a CPU in any workstation, so no additional hardware investment is required. Simply adding your processor to the list of CUDA devices used when rendering in V-Ray RT 3.6 will increase performance, even if you already have multiple GPUs at your disposal. The speed-up you get will depend on what CPU you have, though, so if you are buying or building a new workstation for V-Ray RT it is now worth considering a more powerful CPU than you might have in the past. We have recently updated our V-Ray recommended systems accordingly, offering Core X options on the 1-2 GPU compact system, Threadripper on the 1-4 GPU tower, and Xeon options for those wanting dual CPUs.

V-Ray Workstations

Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.

Configure a System!

Labs Consultation Service

Our Labs team is available to provide in-depth hardware recommendations based on your workflow.

Find Out More!
Tags: V-Ray, RT, 3.6, Chaos, Group, CPU, GPU, Hybrid, Rendering, CUDA, Threadripper, Skylake, Core, X, GeForce, 1080, Ti, Titan, Xp
Lex -

Interesting to note that while you have the 4th card installed in the Threadripper computer, it doesn't really give you the same capability that only 3 cards do. It's only slightly less in rendering time, as if the card is not doing 100% full load work. I believe that can be chalked up to a bad driver design. Could you do this same test in say 6 months time with the newer drivers and see if you get the same result, to post them with a comparison.

I'm not totally sold on the idea of using more than 2 Titan Xp because that's what is officially supported by the Nvidia drivers, however, the older Maxwell supports the use of 3 or 4 cards as well as the associated driver for it.

I'd also like you to use 128 GiB of RAM but not at 2666 MHz (2.666 GHz) but at 3866 MHz (3.866 GHz), the newer BIOSes support this speed and CAS 19. Compare the difference, should be quite noticable. Heck, from 2666 MHz to 3200 MHz is noticable. This isn't just an increase in frame rate for games, gains can be found for rendering and deep learning (if you can afford it, that is). Since you're Puget System; I know you can go for it.

Posted on 2017-11-11 17:11:56

You might want to check the graphs out again, as there are definite gains when going from 3 -> 4 GPUs. Make sure you pay close attention to the inclusion of the CPU and if one of the GPUs is being used for video output (a 'primary' card), as those will also impact performance and can make a comparison unfair.

As for higher speed RAM, we specifically focus on running memory at the speeds which CPU makers officially support. Both Intel and AMD procs can technically run with higher RAM speeds, but doing so is a form of over-clocking (pushing the memory controller past it's rated speed) and can void the manufacturer warranty. Matt has written more about that, if you want insight into our reasoning:


Posted on 2017-11-16 00:38:27
Lex -

William, I understand your logic in regards to not voiding the warranty but if you think about it, there is a reason why the K and X series CPUs exist with unlocked voltages and multipliers. If Intel and AMD didn't want people to do that, we wouldn't have those options available to us.

Now, the type of memory that is super sensitive to voltage fluctuation and memory speeds is the buffered registered variety that has ECC circuitry on-board in regards to main system RAM. I wouldn't suggest trying to overclock that stuff and CPUs that support it directly in hardware, such as the Intel Xeon series.

In regards to Nvidia's SLI or AMD Crossfire multi-card configuration is the actual usage and balance of said cards when the dynamic power and clock throttling is disabled completely. While you do get much better performance while scaling on multiple head (monitors) from each card with a scaled virtual resolution but I'm actually referring to the optimization per card vs. that of the power being used.

1. Single card you get 100% of the performance in real world tests.

2. Dual card you get 150 ~ 175% of the performace in real world tests instead of 200%.

I could keep going but the point is moot, adding in more cards isn't a linear increase in performance but it's linear in regards to the power it uses.

Now, I tried something different, I'm working with an Epyc CPU and I've created a motherboard awhile ago. I have to say that this processor allows 4 card scaling to work out nicely, the utilization of all 4 cards in SLI or Crossfire does significantly better but it's not because all the slots are PCI-e 3.0 and x16 electrical signalling as well as data mode in x16 on all cards at once. It seems to be something about the Epyc processor itself that makes all the difference, I've disabled the multi-threading and found that to increase overall speed of the system but only change the optimization of all 4 cards by -2 ~ -5% which is hardly noticable, as this can be within the margin of error when conducting such tests. The only thing I see different is the latency of the frame rates is a tad bit unstable but far better than most standard desktop and laptop processors. You only notice it if you're doing 8k resolution, as that taxes those GP-GPUs anyhow.

I really do wish that when you have the cards installed in SLI bridge mode or CrossFire mode that you'd get linear performance and latency, which then the cost of the cards, the upkeep and of course, the power usage would then be well worth it.

Howver, I'm still waiting for my Titan V card set to arrive (damn, those are effing expensive, even at wholesale prices). I want to see how well those scale on a normal system and then on the Epyc motherboard. As, I'm hoping the tensor capability on these cards will scale linearly when having 4 cards installed (without the SLI bridge) to give Nvidia Deep Learning demo a try, to see what my results will be.

Posted on 2018-01-06 04:53:06
Lex -

When you read that I disabled the multi-threading that's a software controlled mode, not related to hyper-threading. It's not the traditional sense of multi-threading either. In other words, you can have multi-threaded applications running and they're not affected. This software that's running gives each core a specific job and runs on it and only on that core with two hardware threads in which to do the work. It does work but it's kind of weird.

The software mode of multi-threading doesn't stop Spectre, though.

Posted on 2018-01-06 05:02:10

Unless Chaos have improved the code in RT in this regard I think you'll find that your CPU has to be a pretty decent one to improve on the RT GPU only render times. It's not a case of "the more hands making lighter work". I found this out when I added my i7 5960x (no slouch) to the 4 x 1080 Ti's I had......it made my RT render times worse. Your Threadripper is helping because it's a very good, recent processor.

I did inform Chaos of this and maybe they will do something about it - I can't confirm.

Posted on 2018-02-21 14:23:23
Tuco Salamnca

is it true that GPU rendering is less accurate than CPU which cause less image quality ??

Posted on 2018-03-25 19:44:54

It's more about the render engine type Biased vs. Unbiased

Unbiased render engines are more "realistic" due to pure raytracing algorithms.

Posted on 2018-03-26 16:12:42