Table of Contents
A number of years ago, we published the article "Impact of PCI-E Speed on Gaming Performance" investigating whether you would see a difference between a video card in a PCI-E 3.0 x8 slot versus one in a x16 slot. At the time, we found that even the fastest video card available had nearly identical performance whether it was running at x8 or x16.
In the three years since that article was published, however, video cards have become much faster, higher resolution displays (including 4K surround) are more common, and video cards have become increasingly useful in professional applications. Because of these factors, we have decided to revisit this topic using the latest hardware to see whether there is now a measurable difference between PCI-E 3.0 x8 and x16.
For our test system, we used the following hardware:
|Motherboard:||Asus X99 Deluxe II/U3.1|
|CPU:||Intel Core i7 6900K 3.2GHz (3.5-4.0GHz Turbo) Eight Core|
|RAM:||8x Samsung DDR4-2133 8GB ECC Reg. (64GB total)|
|Video Card:||1-2x NVIDIA Titan X 12GB (Pascal)|
|Hard Drive:||Samsung 850 Pro 512GB SATA 6Gb/s SSD|
|OS:||Windows 10 Pro 64-bit|
|PSU:||EVGA SuperNOVA 1200W P2|
To see whether there is a difference between PCI-E 3.0 x8 and x16, we tested with both a single GPU and dual GPU (in SLI where appropriate) with three gaming benchmarks and two professional applications – each of which are highly GPU dependent. These applications include Unigine Heaven Pro 4.0, Ashes of the Singularity, GRID Autosport, Davinci Resolve, and Octane Render. In order to thoroughly test the different PCI-E configurations you may run into, we tested single and dual Titan X configurations at full x16 speeds as well as limiting them to x8 by covering half the PCI-E contacts with a piece of insulating material (paper). The PCI-E configurations we tested are:
- Single GPU in PCI-E 3.0 x16
- Single GPU in PCI-E 3.0 x8
- Dual GPU in PCI-E 3.0 x16/x16
- Dual GPU in PCI-E 3.0 x16/x8
- Dual GPU in PCI-E 3.0 x8/x8
Unigine Heaven Pro 4.0
Starting out our testing is Unigine Heaven Pro 4.0 which is a fairly standard gaming benchmark. It is beginning to be a little aged at this point, but it is still one of the best and most consistent DirectX 11 benchmark we know of. To test a range of different displays, we included results for 1080p, 4K, and 4K surround with a variety of quality settings.
With a single Titan X, there was little benefit (less than 2%) in most cases to running the card at x16 instead of x8. In fact, with a 1080p display, using PCI-E 3.0 x8 was actually about 9% faster than x16! This result is odd and not what we expected, but we ran the benchmark multiple times and got the same result over and over.
Moving on to the dual GPU results in SLI, we saw effectively no difference at 1080p and 4K, but we did see some significant differences when using 4K surround. At this extremely high resolution (11520×2160), we saw no difference at ultra quality settings, but as we turned down the settings (which resulted in a higher FPS), running the cards at x16 became more and more important. When we were getting around 45FPS, x16/x8 was still only about 1% slower than x/16/x16, but x8/x8 was almost 4% slower. Lowering the settings to the point that we were getting roughly 100FPS, we saw a 15% drop in performance with x16/x8, and a massive 30% drop in performance running at x8/x8.
Ashes of the Singularity
Ashes of the Singularity was one of the first DX12 games to include a built-in benchmarking utility and is especially interesting because the developers specifically say to take dual GPU configurations out of SLI and run them as two seperate cards due to how DX12 handles multi-GPU configurations
With a single GPU, we saw fairly consistent results across the different resolutions. In all cases, running the card at x8 was about 2-5% slower than running at x16.
With two Titan X video cards, we had a little bit of trouble getting 4K surround to work properly – in fact, we were unable to successfully take the video cards out of SLI at all with surround enabled. Even physically removing the SLI bridge did not work as it simply made the cards give driver errors in device manager.
At 1080p, all the results were within 2% (and x8/x8 was faster than x16/x8) so we are inclined to call them all to be within our margin of error. At 4K, however, we saw about a 2.5-3.5% drop in performance with the cards running in x16/x8 and x8/x8.
With a single GPU, the results in this game were surprisingly consistent. Across the different resolutions we tested, GRID showed about a 5% drop in performance when running at x8 versus x16.
Unlike the single GPU results, when using two Titan X cards in SLI the results not nearly as consistent. At 1080p, running at x16/x8 was actually the fastest, beating x16/x16 by about 10%. Similarly, x8/x8 was also faster than x16/x16, but only by about 8%.
Increasing the resolution to 4K gave us results more in line with what you would expect. At this resolution, x16/x8 was about 2% slower than x16/x16 and x8/x8 was about 8.5% slower. Finally, with 4K surround we saw about a 5% drop in performance with x16/x8 and a 6.5% drop in performance with x8/x8.
Davinci Resolve is a non-linear video editing and color correcting suite that makes heavy use of GPU acceleration for things like live video playback and exporting of video. While this makes it a great professional application to test PCI-E speeds, the one downside is that the benchmark file we are using (Standard Candle) to test playback speed is limited to 24 FPS and the built-in FPS counter can only give results in whole numbers. This means that while we may see some performance differences, we need to keep in mind that what appears to be a 5% difference in performance may actually be less due to the whole number limitation.
The Standard Candle test file actually has 8 different Blur and TNR node tests, but since the simpler configurations gave a flat 24FPS, we decided to keep them out of our charts to help cut down on noise. Interestingly, while we saw a 1-2 FPS drop in performance with 30 Blur nodes and 4 TNR nodes, we did not see a drop in performance with 66 Blur nodes or 6 TNR nodes. This is likely due to the fact that Davinci Resolve only shows FPS in whole numbers, so we simply don't have a fine enough scale to see the difference.
With two Titan X video cards, we saw a small 1 FPS drop in performance with x8/x8 with 66 Blur nodes and 6 TNR nodes. However, with 4 TNR nodes we saw a 2 FPS drop with x16/x8 (roughly 8%) and a 3 FPS drop with x8/x8 (about 12%). While the whole number limitation keeps us from being super precise, it is still clear that there is indeed a benefit to using x16/x16 over 16/x8 or x8/x8 in Davinci Resolve.
Octane Render is one of the first GPU-based rendering engines to be developed and uses the GPU only (not the CPU) for rendering and has extremely good scaling across multiple GPUs. We've looked at x8 versus x16 performance in Octane only a few months ago in our Octane Render GPU Comparison article and while we saw no difference at that time, it is possible that the new Titan X will be fast enough to show a difference. One important thing to note is that full support for the new Pascal video cards (including the Titan X) is still not present in Octane. They work just fine and are still faster than the previous generation cards, but once CUDA 8 is released by NVIDIA there are expected to be a number of optimizations to even further increase performance.
With a single GPU, we interestingly saw about 2% faster performance with x8 over x16. This is odd, but is actually almost identical to what we saw in our Octane Render GPU Comparison article linked above.
Moving on to two Titan X cards, the results are all pretty much the same. There were small changes in performance, but nothing larger than 1% so they are well within our margin of error
Overall, the results of our testing is pretty mixed. With a single Titan X, we saw a wide range of results between using a PCI-E 3.0 slot at x8 and x16. Some applications (Unigine Heaven Pro and Octane Render) showed no difference, while others (Ashes of the Singularity, GRID Autosport, and Davinci Resolve) showed up to ~5% difference in performance.
With dual GPUs, the results actually got a bit more confusing. Although Unigine Heaven Pro didn't see much of a difference with a single card, with two cards in SLI driving three 4K displays in surround we saw roughly a 15% drop in performance running at x16/x8 and a massive 30% drop in performance running at x8/x8. On the other hand, Ashes of the Singularity only showed minimal differences, and GRID Autosport was actually faster at 1080p when running in x8/x8 – although it was about 8% slower at 4K and 4K surround. On the professional side, Octane Render still didn't show a difference when using two cards but Davinci Resolve did see up to a ~10% drop in performance with both x16/8 and x8/x8.
While our results are somewhat inconsistent, there are still a couple of conclusions we can draw:
- Whether you will see lower performance with x8 versus x16 is going to highly depend on the application. Some may see a difference, others won't.
- The higher the load of the card(s) – either through a higher resolution in games or a large number of accelerated effects in professional applications – the higher the chance of there being a difference in performance
At this point, we would say that if you are using a high end video card like the Titan X (Pascal) or possibly even a GTX 1080, it is probably a good idea to try to use a PCI-E 3.0 x16 slot – especially in multi GPU configurations. Depending on the applications you use it may not make much of a difference, but the fact that we saw a 10-30% drop in performance with x8/x8 compared to x16/x16 in a couple of our tests just goes to show how large of a difference it can make in certain situations.