Table of Contents
Introduction
PCI-Express is the primary connection adding expansion cards – from powerful video cards to simple USB controllers – to a computer. It has been updated several times since its release in 2004, with PCI-E 3.0 being the current version, and it is also available in several slot sizes. These are described in terms of "lanes", with x1, x4, x8, and x16 being the common sizes found today.
Each generation and each slot / lane size brings with it more bandwidth for the expansion card to communicate with the rest of the computer, but sometimes cards which are capable of running at a higher speed (like x16) may run at a slower speed (x8 for example) because of other limitations within the computer. The question that can naturally arise from this situation is simple: does that reduction in speed actually lead to lower performance? And if so, how much is lost?
The answer to this question can differ from one type of expansion card to another, and indeed from one application to another. Today we are looking specifically at the impact of dropping from x16 to x8 speed on video cards, and in particular what impact that has on GPU-accelerated rendering.
Test Setup
To answer this question, we are looking at the new X299 chipset and Skylake X processors from Intel. This is an ideal test case because different CPUs in this series support different numbers of PCI-Express lanes. The Core i7 7800X and 7820X both have 28 lanes, while the Core i9 7900X and higher models have 44 lanes. With a single video card both can provide the normal x16 speed, but when you move to 2 or 3 video cards, that drops – especially on the processors that only have 28 lanes.
When the Core i7 7820X is supporting two video cards, one can run at x16 but the other is limited to x8. This uses 24 lanes of the 28 that processor has available, since it cannot provide the 32 lanes that would be needed for two cards to both operate at x16. If three video cards are used, then they all run at x8 (again, using 24 total lanes between them).
On the other hand, the Core i9 7900X with 44 lanes can support two video cards at x16, and when three are used it keeps two of them at x16 and runs the third at a slower x8 speed. If there is any performance benefit from video cards running at x16 instead of x8, it should show itself when two or three video cards are in use on these differing CPUs. Here is the hardware we tested to find out:
Test Platforms | |
Platform: |
Intel X299 Chipset / Skylake X |
Motherboard: |
Gigabyte X299 AORUS Gaming 7 (rev 1.0) |
CPU: |
Intel Core i7 7820X 3.6GHz |
RAM: | 8x Crucial DDR4-2666 16GB (128GB Total) |
GPU: | 1-3 x NVIDIA GeForce GTX 1080 Ti 11GB |
Hard Drive: | Samsung 960 Pro M.2 PCI-E x4 NVMe SSD |
OS: | Windows 10 Pro 64-bit |
Software: | OctaneBench 3.06.2, FurryBall RT Benchmark, & V-Ray Benchmark 3.57.01 |
The three rendering engines we included cover a wide portion of the GPU-accelerated rendering market, and should be a good sample size. If the results from all three of these agree, then they should be applicable across most similar applications as well.
Benchmark Results – OctaneRender
First up, here are the results from running OctaneBench across 1, 2, and 3 video cards on each CPU:
There is almost no difference in performance showing here. None is to be expected with a single video card, since both CPUs can run one with a full x16 lanes, but the fact that there is less than 1% (well within margin of error) means that the CPU speed itself is not impacting performance either. That is helpful, since without knowing that it could be another factor impacting any differences that show up… but now we don't need to worry about it.
In the 2- and 3-card comparisons, there is again no substantial difference between the two CPUs and the way they allocate PCI-Express lanes. The biggest gap is at 2 video cards, with the 7820X based system coming in 1.2% slower, but I think that is still well within the margin of error. Certainly it would be negligible in real world usage.
Benchmark Results – FurryBall
Next up is FurryBall RT, which has a built-in benchmark. It provides three results for different aspects of the rendering process: Ambient Occlusion, Direct, and Indirect. We have combined all three into a single graph to keep this article from getting too long:
The results here are technically not as close as they were with Octane, with up to a 3.2% difference, but that difference goes in favor of different systems depending on the test. Because neither configuration is continually faster, this again appears to be within the margin of error.
Benchmark Results – V-Ray
And finally, we look at the GPU portion of V-Ray Benchmark. The main V-Ray application has moved on to version 3.6, so this test is a bit dated now, but it isolates GPU rendering performance well (it also has a CPU option, which was turned off for this article):
Here there is absolutely no difference between the two CPUs, and thus no difference between x8 and x16 speeds for the video cards. This may be due to the lower precision (just two significant digits in the test results) but it does line up with the findings from the other two benchmarks.
Conclusion
As shown across the results of all three tests, there is no impact from x8 vs x16 lane configurations for GPU rendering. This is good news, as it means that less expensive CPUs can be used even in multi-GPU rendering workstations. In other articles we have found that there is minimal difference between chipsets / motherboard platforms for GPU rendering, so for these applications you can focus on what really does matter: getting the most – and most powerful – video cards you can afford.