Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1259
Article Thumbnail

NVIDIA GeForce RTX 2080 Ti PCI-Express Lane Scaling in OctaneRender and Redshift

Written on October 15, 2018 by William George
Share:

Introduction

GPU based renderers like OctaneRender and Redshift make use of the video cards in a computer to process ray tracing and other calculations in order to create photo-realistic images and videos. The performance of an individual video card, or GPU, is known to impact rendering speed - as is the number of video cards installed in a single computer. But what about the connection between each video card and the rest of the system? This interconnect is called PCI Express and comes in a variety of speeds.

PCI Express Logo

In this article, we will look at how benchmarks for these programs perform across PCI-E 3.0 and 2.0 with x1, x4, x8, and x16 lanes.

Methodology and Test Hardware

In order to use the most recent GeForce RTX 2080 Ti video card in this test, we took the OctaneBench program and modified it slightly. As of this publication, OctaneBench is still using the 3.06.2 version of OTOY's rendering engine - which does not support either the Titan V or the new GeForce RTX cards. However, you can manually copy over the files from 3.08 into the folder containing OctaneBench and then it will use the newer rendering engine. We cannot redistribute the modified software, but if you download both OctaneBench 3.06.2 and the demo version of OctaneRender 3.08 it is pretty easy to copy over the necessary files. Running the Redshift demo is a lot easier, as version 2.6.22 and above works with RTX cards without any additional configuration being necessary.

PCI-Express bandwidth is affected by both what generation of the interface is used (1.0, 2.0, or 3.0) as well as how many lanes are available for the device to use. In this test, we are looking at the four common lane sizes - x1, x4, x8, and x16 - as well as the current PCI-E 3.0 generation and one back, 2.0. We used a motherboard that allows us to set the PCI-E generation in the BIOS, and to control the number of lanes we selectively taped off contacts on the bottom of the video card with a Post-it Note. Very high tech...

How to Limit PCI-Express Lanes on a Video Card for Testing with a Post-it Note

If you would like full details on the hardware configuration we ran these tests on, just .

Results

Here are the results showing both PCI-E generation and lane count while running OctaneBench 3.0.8:

OctaneBench 3.0.8 PCI-Express Performance Scaling on a GeForce RTX 2080 Ti

And the same type of data for the Redshift 2.6.22 Demo:

Redshift 2.6.22 Demo PCI-Express Performance Scaling on a GeForce RTX 2080 Ti

Analysis

There is a clear, but small, performance difference between the various PCI-Express lane configurations in OctaneBench. Each increase in lane count, from x1 up through x16, marks a little improvement. There is also a difference, again small, between PCI-E 2.0 and 3.0. Overall, the spread from the slowest (PCI-E 2.0 x1) to fastest (PCI-E 3.0 x16) was about 7%.

Redshift's demo showed a much larger impact from faster PCI-E lane configurations, with both generations of the technology being particularly affected at the x1 lane size. PCI-E 3.0 x16 was a full 50% faster, or took 33% less time to render the scene, compared to x1. x4 and x8 suffered less of a penalty, but it was still more pronounced than what OctaneBench showed. This may be due to the large, complex "Vulture" scene used in the Redshift demo; OctaneBench uses a series of much simpler scenes, rather than a single big one.

Caveats

As alluded to above, I suspect the main place in GPU rendering where lower bandwidth to or from the card will affect performance is when copying scene data to the card before rendering, and then moving the final image back afterward. Moving the image isn't a big deal, with very high resolution images only being measured in the tens of megabytes and taking fractions of a second to move across even a PCI-E x1 connection. Uploading scene data before rendering can require moving a lot of data, though, and I suspect that may come into play with Redshift's more complex single scene compared to the several small scenes used in OctaneBench. As such, it is possible that a larger scene in OctaneRender might also result in more of a difference between the various PCI-Express configurations than is shown in the charts above.

For one final point of data, we also ran the V-Ray Benchmark with each PCI-Express setting... but it always gave the same result (52 seconds) regardless of the PCI-E generation or number of lanes. I suspect that comes back, again, to the simplicity of the scene being tested... but I am also a bit concerned about the age of the V-Ray benchmark compared to where that rendering engine is today. Hopefully Chaos Group updates their benchmark to use the latest version of V-Ray Next soon, and also updates the scene being rendered to really highlight the performance delta between different hardware configurations.

Do PCI-Express x1, x4, x8, vs x16 Lanes Matter for GPU Rendering?

Yes - for GPU rendering you should avoid PCI-Express x1 configurations. x4 is acceptable, but x8 and x16 are really where you want to be for good performance. PCI-E 2.0 versus 3.0 matters a lot less, with results generally being within margin of error... but almost all motherboards and video cards these days use PCI-E 3.0 anyway.

Recommended Systems for OctaneRender

Recommended Systems for Redshift

Tags: Octane, Render, Redshift, GPU, Video, Graphics, Card, Rendering, PCI, Express, PCI-E, 3.0, 2.0, Lanes, Generation, Performance, Scaling
Dennis L Sørensen

Very interesting.

Posted on 2018-10-16 11:19:35
Dan_R_C

So looking at the previous Redshift benchmarks it seems that once you're over the 1x bottleneck the CPU\RAM become another (slight) issue in that the difference between the i9 and W2175 is minimal but could be attributed to CPU frequency and ECC overhead in the 2080Ti results. Redshift pegs one core\thread at 100% during rendering so if you're not overclocking would a 7940x get slightly better results depending upon how long or often it is hitting the MAX TURBO?

Posted on 2018-10-16 19:14:20

With a single GPU, yes - a processor that has better single-core maximum speeds and doesn't have as much overhead from other aspects of the system might well perform a bit better. However, it can be hard to find a motherboard for the Core X processors that supports four GPUs. The same is true of the mainstream Core processors, like the i7 8700K. The only way for those to get enough PCI-E slots is to use PLX chips (or similar) which sort of split up the lanes further, and could lead to lower performance.

Because of that, I would recommend going for a CPU and platform that allow you to use the maximum number of video cards, at the highest PCI-E speeds, first. That is why we offer the Xeon W and Threadripper platforms for quad GPU workstations :)

Posted on 2018-10-16 22:36:27
Richie

Thanks for these great PCI-E speed tests. I have a question about PLX chips. I have been reading redshift and octane forums saying that there have been issues with motherboards with PLX chips causing system crashes when using multiple GPUs. The latest I've read is that Nvidia have issued a fix, but that people are still having system freeze ups. I see that your Quad GPU system uses the Asus WS C422 SAGE/10G board which has a pair of PLX chips. Do you know if these systems have had any of the PLX related system crashes? Many thanks. Trying to work out if thread ripper would be better.

Posted on 2018-10-26 16:06:05

I am not aware of any issues with that, at least up here in Labs. I'll ask our support folks and comment again if they have run into this.

However, regardless of PLX, Threadripper can be a solid option for multi-GPU rendering workstations. The reason we currently use the C422 based board is that it supports lower core count but higher clock speed processor options. Since the CPU doesn't have a big impact on GPU rendering, we wanted to offer a solution that would be as good as possible for rendering-related tasks like 3D modeling and animation - while still supporting quad GPU. At the time we last updated that system, Xeon W seemed to be the best option. I am going to revisit that soon with Threadripper 2, though, since it did increase clock speeds a bit. If the system you are building is more of a dedicated render box, and won't be used for other tasks, then the CPU becomes a largely moot point. If possible I would keep the clock speed as high as you can, but beyond that the main thing is fitting, powering, and cooling four GPUs :)

Posted on 2018-10-26 16:25:50
Richie

Appreciate your reply, and thanks for asking your support team about PLX, very interested to hear what they say.

Posted on 2018-10-26 16:48:45

So the one issue with multi-GPU systems that we have had in support is Windows' Watchdog DPC Violations - and I had not realized it, but apparently some NVIDIA driver notes do mention that these can be related to PLX. I believe our support folks worked on a resolution for this issue, but you can read more about it if you'd like: https://www.pugetsystems.co...

Posted on 2018-10-26 17:20:28