Puget Systems print logo
https://www.pugetsystems.com
Read this article at https://www.pugetsystems.com/guides/1982
Article Thumbnail

PCI-Express 4.0 vs 3.0 Video Card Performance

Written on November 30, 2020 by William George
Share:

Introduction

PCI-Express has been the standard for connecting video cards and other expansion devices inside of computers for many years now, and several generations of the technology have now passed. With each of those generations, the amount of data that can be transferred over the PCIe connection has increased. How much impact does that have on modern video cards? Is there any benefit to running a PCIe 3.0 card in a 4.0 slot, or loss if using a 4.0 card in a 3.0 slot? We frequently get questions like these during the consultation process with new customers, so I thought it would be worth taking some time to test in order to better answer these queries.

PCI Express logo

Test Methodology

In order to test the impact of PCI-Express bandwidth on performance, we are going to look at two video cards in a system where we can control the PCIe slot generation within the BIOS. Why two cards? Because one is natively PCIe 3.0 (NVIDIA's Titan RTX) while the other uses PCIe 4.0 (their GeForce RTX 3090). Both have 24GB of memory, to avoid the amount of onboard VRAM affecting anything, and are effectively the top performing card from their respective GPU families. To minimize the impact of the CPU, we went with the top-end of AMD's new Ryzen 5000 series, the 5950X, installed on a Gigabyte X570 AORUS ULTRA motherboard which has a BIOS setting for selecting which PCIe version is used.

PCIe Slot Configuration Options in Gigabyte X570 AORUS ULTRA Motherboard BIOS

Here are the full specifications of the system we used for this testing:

Test Platform
CPU AMD Ryzen 9 5950X
CPU Cooler Noctua NH-U12S
Motherboard Gigabyte X570 AORUS ULTRA
RAM 4x DDR4-3200 16GB (64GB total)
Video Card NVIDIA Titan RTX 24GB
NVIDIA GeForce RTX 3090 24GB
Hard Drive Samsung 960 Pro 1TB
Software Windows 10 Pro 64-bit

With this hardware configuration, we tested each of the two video cards in each of the four PCIe Slot Configuration settings (Gen1 through Gen4). Most of the questions we get from prospective customers center around PCIe Gen3 and Gen4, but by going further back with our tests we can get a better picture of how PCIe bandwidth impacts video card performance. For example, PCIe Gen2 on a full x16 size slot (which these video cards were using) is roughly equivalent in bandwidth to PCIe Gen3 x8, and that is a common setting for motherboards to use when running multiple video cards on chipsets that don't have a massive number of PCIe lanes available. Likewise, PCIe Gen1 at x16 should be comparable to PCIe Gen3 at x4 - and PCIe Gen3 at x16 is on par with PCIe Gen4 at x8.

Finally, on the software side, we used a handful of benchmarks across three main types of applications:

Benchmark Results

Here are the results of our testing, split into galleries by application type, with some analysis after each set of charts:

GPU Based Rendering Engines

None of these rendering benchmarks show much difference in performance between the various generations of PCI-Express. There is a slight curve in Redshift, with about an 8-second slowdown from PCIe Gen4 to Gen1 on the RTX 3090 and 5 seconds on the Titan RTX. V-Ray Next shows nothing outside the test's margin of error, and while there is a small drop on OctaneRender it is only around 2% (so that may well be within the margin of error too).

It is worth remembering the way that GPU rendering works, though: scene data is sent to the card over the PCIe connection, and then the processing is all done on the video card(s), then the resulting image is sent back to the system to be displayed and/or saved. The speed of the PCIe bus is going to impact how quickly the data can be moved back and forth, but won't impact the actual computations happening on the card. That probably explains why we see so little impact from the older versions of PCI-Express in this test.

There are also some notable exceptions to this, which are simply outside the purview of these benchmarks. For example, some rendering engines support "out of core memory" - which is where some of the scene data is stored in main system memory if there isn't enough dedicated video memory on the card(s) themselves. In that situation, there would be a lot more data being transmitted over PCI-Express, throughout the rendering process, and thus the speed of that connection would be a lot more important.

Post Production

Post-production makes much more interactive use of the video than rendering, so here we see large performance differences across the various PCI-Express generations. DaVinci Resolve shows a steady drop from PCIe Gen4 down to Gen1 on the RTX 3090, while the Titan RTX has effectively no difference between Gen4 and Gen3, but then drops when using Gen2 and Gen1.

NeatBench, which tests the Neat Video noise reduction algorithm's performance, shows an even larger reduction when using the older version of PCI-Express... to the point where an RTX 3090 on Gen1 is only providing half the speed of the same card running on Gen4. Again, though, the Titan RTX doesn't benefit from Gen4 vs Gen3 - presumably, because it is a Gen3 card itself, so even if the system it is in is capable of the newer PCIe Gen4 speeds the Titan is stuck at Gen3.

Game Engines

We don't generally test game performance here at Puget Systems, as so many other outlets already look at that subject in great depth, but I thought I would try seeing if Unigine Superposition showed any differences across the different PCIe speeds. It did not. There is a small, sub-2% difference - similar to what we saw with OctaneBench - but even if that is being cased by the generational difference it is so small that a variance like that would not be noticeable when playing games. Again, this is likely due to the way this benchmark works: if all of the data for the test scenes can fit in the system memory, then once it is loaded up at the start the speed of the PCIe bus will no longer matter. Real gaming would see a different usage pattern, but even there I suspect that PCIe Gen4 vs Gen3, at least, would have no measurable performance impact.

Perhaps we can revisit this with more of a focus on game development in the future, as my colleague Kelly Shipman has been doing amazing work on testing the Unreal Engine.

Conclusion

For applications where data is constantly traveling across the PCI-Express bus, we can see that the generational bandwidth differences do have a very measurable impact on real-world performance. The best examples of that in the tests we conducted for this article were those looking at post-production & video editing, which exhibited substantial gains moving up from PCIe Gen1 to Gen2, moderate gains from Gen2 to Gen3, and then a small boost from Gen3 to Gen4 on the RTX 3090 (which is, itself, a Gen4 card). The Titan RTX, a Gen3 card, did not show a difference between running on PCIe Gen3 vs Gen4.

Other programs where data is only sent across PCIe before and after a long calculation did not see that sort of difference, however. At least within the manufacturer benchmarks we utilized, there was at best a small gain when using the latest Gen3 and Gen4 speeds - but definitely nothing like what we saw with video editing.

In the end, though, the PCI-Express Gen1 and Gen2 stuff is mostly an academic question. Virtually all modern motherboards are going to run at PCIe Gen3 or Gen4, and if running a single video card then they pretty much all will offer full x16 lane support as well. This gets a little trickier when running multiple video cards, which is common for some of these professional workloads, because while the PCIe generation isn't going to drop the number of lanes available per slot/card definitely can. PCIe Gen3 at x8 lanes is going to be roughly on par with PCIe Gen2 at a full x16 lanes, and Gen3 x4 is close to Gen1 at x16... so depending on your exact motherboard and GPU configuration it is entirely possible to end up with lower bandwidth per card. The good news is that this looks like it will have little negative impact on GPU based rendering, which is one of the places where having a lot of video cards can really shine - but if you are working with video editing or some other application that depends on sending a lot of data back and forth to the graphics card(s), then it is a good idea to ensure that your system is providing the most bandwidth possible over PCI-Express.

Does putting a PCIe Gen3 video card in a Gen4 slot improve performance?

No, if the graphics card itself is PCIe 3.0 then putting it in a faster 4.0 slot will not provide any benefit since they will be operating at Gen3 speed.

Does putting a PCIe Gen4 video card in a Gen3 slot reduce performance?

In some applications, yes - there can be a small performance drop when running a PCI-Express 4.0 capable card in a system/slot that is only using PCIe 3.0. We did not find any impact for gaming or GPU-based rendering, but we did measure a small decline (less than 5%) with video editing in DaVinci Resolve and a little bit larger drop (~10%) with noise reduction in Neat Video.

Tags: PCI-E, PCIe, PCI, Express, GPU, Video Card, Motherboard, CPU, Processor, AMD, AMD Ryzen 5000 Series, AMD Ryzen 5000-series, NVIDIA, GeForce, RTX 3090, Titan RTX
Ampere

https://www.nvidia.com/en-u...

Posted on 2020-12-01 14:00:14
任柔

waiting for PugetBench about 3060Ti

Posted on 2020-12-01 14:18:28

Me too! Hopefully we can get a card in soon and take a look!

Posted on 2020-12-01 16:59:53
Pushparaj

me too....

Posted on 2020-12-06 06:08:14
Ampere

https://www.asus.com/Mother...

Puget, here's another blower fan card.

Posted on 2020-12-02 08:36:43

Oh, nice! Thank you for pointing that out - I will pass this along to our product qualification team :)

Posted on 2020-12-02 09:31:12
Ampere

https://www.nvidia.com/en-u...
https://www.nvidia.com/down...

Puget, please install the latest Nvidia GeForce 460.79 WHQL driver when you test the RTX 3060 Ti.

Posted on 2020-12-09 14:04:19

When testing NVIDIA GPUs, we first go with whatever the latest Studio driver is that is available. But if it doesn't support a GPU because it is too new, then we fall back to the latest Game Ready driver. The only exception to this is if we are in the middle of a big round of testing and NVIDIA releases a new Studio driver. In that case, we stick with the driver we are already using, and update to the new one before the next round of testing.

We almost never see a performance difference in these kinds of applications with different drivers though. Bug fixes, absolutely, but I can't think of any time we saw more than a percent or two difference in terms of performance.

Posted on 2020-12-09 17:46:15
Ampere

https://www.nvidia.com/en-u...
https://www.nvidia.com/down...

Nvidia Studio 460.89 driver.

Posted on 2020-12-15 14:56:50
Ampere

https://www.msi.com/Graphic...

Another blower fan card for your product qualification team to test.

Posted on 2020-12-18 11:47:33

Thank you! I've passed that link along to our hardware qualification team :)

Posted on 2020-12-18 16:08:18
tpcs

Synthetic benchmarks mean absolutely crap. Show REAL WORLD....gaming...side by side. Otherwise, this was a waste of time.

Posted on 2021-01-13 00:44:57

The only synthetic test in this article is Unigine Superposition, and it was just included as an example of what game engines could be like. We don't do game testing here at Puget very often (in fact, I can't remember the last time we included that in an article) because our focus is on professional applications and workflows.

Posted on 2021-01-13 00:52:07
tpcs

Octane is another.

Sorry, but synthetic benchmarks mean nothing. They're just arbitrary numbers games. People work and play in the real world.

Handbrake to shrink aka transcode videos
Davinci Resolve and other's to Export videos

It's interesting you mention you don't do gaming testing at Puget when I know Jayztwocents as well as Linus from Linus Media Group ( Linus Tech Tips for example) have both done videos involving systems built by you all and you test gaming systems per them. They're not the kind of guys that would just make that up. (shrugs) But whatever.

Posted on 2021-01-13 01:31:24

OctaneBench isn't synthetic - it tests how fast the video card is able to render an example scene in the OctaneRender engine. Same with Redshift. V-Ray is a little different, in that instead of seeing how long it takes to render a scene it measures how many paths are calculated per second during a render.

And I'm glad you mentioned Resolve, as that was also tested as part of this article - but our benchmark does far more than just check export time! A lot of other stuff within Resolve is impacted by GPU performance while editing, and Matt has designed our benchmark to test many different aspects of that software.

Posted on 2021-01-13 01:53:22
Michael Rada

Hi guys, i found this thread why try to find ansfer for this.. am I able to run 2 x RTX 3080 on motherboard - MAG X570 TOMAHAWK WIFI ? could this help me to get better performance in davinci resolve? thanks a lot for ansfer

Posted on 2021-09-14 21:59:42

I'm not personally familiar with that motherboard, but looking at pictures of it online it does appear to have two PCIe x16 slots that are spaced far enough apart to allow RTX 3080 Founders Edition cards (which are two slots wide). I'd be a little more hesitant about 3rd party cards, since a lot of those are wider than two slots (and the slots on that board appear to be 3 apart, which means a wider card would fit but the second card might start to block airflow).

As for Resolve performance, if you are using the paid version (the free version only supports one GPU) then dual video cards would help with GPU-accelerated effects... it wouldn't help in other areas, like Fusion, though.

Posted on 2021-09-15 17:09:05