NVIDIA GeForce RTX 2080 Ti PCI-Express Lane Scaling in OctaneRender and Redshift

Always look at the date when you read an article. Some of the content in this article is most likely out of date, as it was written on October 15, 2018. For newer information, see our more recent articles.

Table of Contents

Introduction

GPU based renderers like OctaneRender and Redshift make use of the video cards in a computer to process ray tracing and other calculations in order to create photo-realistic images and videos. The performance of an individual video card, or GPU, is known to impact rendering speed – as is the number of video cards installed in a single computer. But what about the connection between each video card and the rest of the system? This interconnect is called PCI Express and comes in a variety of speeds.

In this article, we will look at how benchmarks for these programs perform across PCI-E 3.0 and 2.0 with x1, x4, x8, and x16 lanes.

Methodology and Test Hardware

In order to use the most recent GeForce RTX 2080 Ti video card in this test, we took the OctaneBench program and modified it slightly. As of this publication, OctaneBench is still using the 3.06.2 version of OTOY's rendering engine – which does not support either the Titan V or the new GeForce RTX cards. However, you can manually copy over the files from 3.08 into the folder containing OctaneBench and then it will use the newer rendering engine. We cannot redistribute the modified software, but if you download both OctaneBench 3.06.2 and the demo version of OctaneRender 3.08 it is pretty easy to copy over the necessary files. Running the Redshift demo is a lot easier, as version 2.6.22 and above works with RTX cards without any additional configuration being necessary.

PCI-Express bandwidth is affected by both what generation of the interface is used (1.0, 2.0, or 3.0) as well as how many lanes are available for the device to use. In this test, we are looking at the four common lane sizes – x1, x4, x8, and x16 – as well as the current PCI-E 3.0 generation and one back, 2.0. We used a motherboard that allows us to set the PCI-E generation in the BIOS, and to control the number of lanes we selectively taped off contacts on the bottom of the video card with a Post-it Note. Very high tech…

If you would like full details on the hardware configuration we ran these tests on, just click here to expand a detailed list.

Testing Hardware
Motherboard:	Asus WS C422 SAGE/10G
CPU:	Intel Xeon W-2175 2.5GHz (4.3GHz Turbo) 14 Core
RAM:	8x Kingston DDR4-2666 32GB ECC Reg (256GB total)
GPU:	NVIDIA GeForce RTX 2080 Ti 11GB
Hard Drive:	Samsung 960 Pro 1TB M.2 PCI-E x4 NVMe SSD
OS:	Windows 10 Pro 64-bit
PSU:	EVGA SuperNova 1600W P2
Software:	OctaneRender 3.08 Benchmark & Redshift 2.6.22 Demo (Age of Vultures scene)

Results

Here are the results showing both PCI-E generation and lane count while running OctaneBench 3.0.8:

And the same type of data for the Redshift 2.6.22 Demo:

Analysis

There is a clear, but small, performance difference between the various PCI-Express lane configurations in OctaneBench. Each increase in lane count, from x1 up through x16, marks a little improvement. There is also a difference, again small, between PCI-E 2.0 and 3.0. Overall, the spread from the slowest (PCI-E 2.0 x1) to fastest (PCI-E 3.0 x16) was about 7%.

Redshift's demo showed a much larger impact from faster PCI-E lane configurations, with both generations of the technology being particularly affected at the x1 lane size. PCI-E 3.0 x16 was a full 50% faster, or took 33% less time to render the scene, compared to x1. x4 and x8 suffered less of a penalty, but it was still more pronounced than what OctaneBench showed. This may be due to the large, complex "Vulture" scene used in the Redshift demo; OctaneBench uses a series of much simpler scenes, rather than a single big one.

Caveats

As alluded to above, I suspect the main place in GPU rendering where lower bandwidth to or from the card will affect performance is when copying scene data to the card before rendering, and then moving the final image back afterward. Moving the image isn't a big deal, with very high resolution images only being measured in the tens of megabytes and taking fractions of a second to move across even a PCI-E x1 connection. Uploading scene data before rendering can require moving a lot of data, though, and I suspect that may come into play with Redshift's more complex single scene compared to the several small scenes used in OctaneBench. As such, it is possible that a larger scene in OctaneRender might also result in more of a difference between the various PCI-Express configurations than is shown in the charts above.

For one final point of data, we also ran the V-Ray Benchmark with each PCI-Express setting… but it always gave the same result (52 seconds) regardless of the PCI-E generation or number of lanes. I suspect that comes back, again, to the simplicity of the scene being tested… but I am also a bit concerned about the age of the V-Ray benchmark compared to where that rendering engine is today. Hopefully Chaos Group updates their benchmark to use the latest version of V-Ray Next soon, and also updates the scene being rendered to really highlight the performance delta between different hardware configurations.

Do PCI-Express x1, x4, x8, vs x16 Lanes Matter for GPU Rendering?

Yes – for GPU rendering you should avoid PCI-Express x1 configurations. x4 is acceptable, but x8 and x16 are really where you want to be for good performance. PCI-E 2.0 versus 3.0 matters a lot less, with results generally being within margin of error… but almost all motherboards and video cards these days use PCI-E 3.0 anyway.