NVIDIA Dual-Fan GeForce RTX Coolers Ruining Multi-GPU Performance

Always look at the date when you read an article. Some of the content in this article is most likely out of date, as it was written on September 28, 2018. For newer information, see our more recent articles.

Table of Contents

Introduction

The new GeForce RTX series cards perform well in GPU based rendering, as individual cards, and have great potential for the future thanks to their new RT cores. However, when stacking them together to measure multi-GPU scaling we ran into some serious problems.

After wrapping up our testing of the new RTX 2080 and 2080 Ti as single cards, we wanted to see how well they scale in popular rendering engines like OctaneRender, Redshift, and V-Ray. We have looked at that in the past and found that multi-GPU scaling is quite good in these applications, and many of our customers use 2, 3, or even 4 GPUs to get the fastest possible render times. We only have one 2080 Ti at the moment, so we had to go with a set of four of the Founders Edition RTX 2080 cards for this test.

UPDATE: Single-fan versions of the GeForce RTX cards are now available, which don't have the problems described in this article.

Test Setup

Normally we run each test three times, make sure the results are pretty close, and then take the best one (highest score or lowest time, depending on the situation) as the final result. This helps ensure that something going on in the background doesn't throw things off and that each hardware configuration we test gets a fair shake. Over the course of our normal three runs of the Octane Benchmark and Redshift Demo with quad RTX 2080s, though, we found very odd behavior: each of the three runs was substantially slower than the one before it. We ran it again, this time with 5 runs for OctaneRender and 8 runs for Redshift – and not only did we see the same pattern, but it continued with slower performance the longer the tests were running. This is very different from what we have seen in past multi-GPU scaling tests, so we knew something was amiss.

If you would like full details on the hardware configuration we ran these tests on, just click here to expand a detailed list.

Testing Hardware
Motherboard:	Asus WS C422 SAGE/10G
CPU:	Intel Xeon W-2175 2.5GHz (4.3GHz Turbo) 14 Core
RAM:	8x Kingston DDR4-2666 32GB ECC Reg (256GB total)
GPU:	4 x NVIDIA GeForce RTX 2080 8GB
Hard Drive:	Samsung 960 Pro 1TB M.2 PCI-E x4 NVMe SSD
OS:	Windows 10 Pro 64-bit
PSU:	EVGA SuperNova 1600W P2
Software:	OctaneRender 3.08 Benchmark & Redshift 2.6.22 Demo (Age of Vultures scene)

Benchmark Results & Analysis

To try and determine what was going on, we ran GPUz 2.11.0 with one instance per video card and logged the results. What we found was that over time, as the tests continued, the temperatures on three of the four cards were getting very high and then the clock speed was throttling down dramatically. Modern video cards (and CPUs too) are designed to do this as a safety precaution, to avoid damage to the cards from overheating or software errors/crashes. Fan speeds start out slow, to stay quiet, and then ramp up as cards get warmer – and eventually, if the fans cannot keep temps in check even as they approach full speed, the clock speed of the GPU throttles down to reduce heat output. But on these cards, we weren't just seeing a little bit of downclocking, but toward the end of our test runs some cards were running at less than half of the speed NVIDIA lists in their specs! That is a huge difference in clock speed, and it translated into a massive drop in overall rendering performance. Here is what we found, both in terms of raw results and performance drop over time – with average GPU clock speeds included so that you can see the correlation. Let's start with OctaneRender 3.08:

That second chart shows it best: Octane performance drops 30%, over the course of five benchmark runs, from what we would expect based on the performance of a single RTX 2080 and typical GPUs scaling in OctaneRender. Even during the very first run, there is a big enough clock speed drop that we didn't get even one benchmark score in the range we expected. We found a single RTX 2080 to score ~179.5, and in past tests, OctaneBench has demonstrated near-perfect scaling – hence our expectation to see a score around 718 from four of them. The "expected" clock speed is based on NVIDIA's claimed spec of 1800MHz boost clock for the Founders Edition GeForce RTX 2080 cards we used. GPUz did report the cards starting off that high, and in fact a bit higher: of the four cards in our testbed, two started these runs at 1890MHz, one at 1905MHz, and another at 1950MHz. They didn't stay that fast for long, though.

After wrapping up five runs in OctaneBench, we let the cards cool down a bit before starting Redshift. We had already updated to the latest release of Redshift (2.6.22) in order to get the new Turing-based GeForce RTX cards to work, so once the system had cooled down we ran the Redshift Demo on a loop for about 30 minutes – resulting in eight runs, with the same GPUz clock speed tracking:

Here, again, we see a big drop in rendering speed over time with results evening out after six runs. At that point, performance seems to stabilize at about 26% lower than expected. Redshift hasn't scaled as well as Octane in our past tests, so the expectation here was based on the 374% speedup we saw when going from one to four GTX 1080 Ti cards in one of our recent articles. The expected GPU clock speed was again based on the listed boost clock spec from NVIDIA, but as mentioned above we saw clock speeds reported even higher than that in GPUz. The first run managed to average higher clock speeds, even, and performance was almost exactly what we had expected – but the second run took a big hit, and render times kept getting longer as time went on and the cards got hotter.

I should also point out here how we got the average GPU frequency. GPUz was used to track each card individually, with measurements taken of the clock speed every second. We then took those clock speeds over the course of each test run, averaged them per card, and then averaged the four cards' individual results together. This doesn't tell the whole story, as we found that the bottom card – the one with its fans exposed to the air, instead of being next to another card – actually managed to keep its temps in check and its clock speed high during the entirety of both test runs. The next two cards, the ones in the middle, fared the poorest in terms of temperatures and throttling… and then the top card, which had its backside (with a thermal cooling plate) exposed to open air did better than the middle two – but still throttled pretty heavily after a while.

We did also test V-Ray, which has a GPU rendering component to its public benchmark, but with this many GPUs, it only takes 20 seconds to complete. Running it repeatedly involves a downtime of 10 to 20 seconds between tests, though, so the cards never have the chance to get as hot as the longer OctaneBench and Redshift Demos (which also only have about 2 seconds between repeat runs).

What is Causing this Throttling?

So what is going on here? Why are these cards having such problems with heat and downclocking, when past NVIDIA GeForce cards we have tested – including former Founders Edition models – have always done so well with multi-GPU rendering?

The answer lies in the heatsink and fan layout on these new cards. Past Founders Edition and "reference" style GeForce cards have had a single fan, near the front of the card, blowing back across the heatsink and exhausting the bulk of the hot air out the rear of the case. These new models, however, have dual fans – and the fins on the heatsink are arranged vertically, rather than horizontally. That means that they do not push hot air out the back of the system, but instead vent it up into the computer. Even in our open-air testbed system, this ends up being a far poorer cooling setup when cards are installed back-to-back, which is required when putting three or four of them on a single motherboard. One card by itself does okay this way, though it will heat up the inside of a chassis more than the blower-style cooling system, and even two cards – if separated by a slot or two – seem to do fine… but when they are put next to each other, this dual fan cooling layout proves to be a huge problem for GPU intensive workloads like rendering.

Thankfully, it looks like some OEMs are going to produce single-fan, blower-style GeForce RTX cards. When we can get our hands on a set of four of them we will try this again, and hopefully be able to publish proper multi-GPU scaling results for the 2080 and 2080 Ti.

Why Are Dual Fan NVIDIA RTX GPUs Throttling?

The new dual-fan cooling setup on NVIDIA's GeForce RTX 2080 and 2080 Ti Founders Edition cards vents heat upward, into the computer, rather than out the back of the system like previous single-fan designs. This means that putting multiple cards in a computer, especially back-to-back, leads to overheating and performance throttling under heavy GPU load.

Should You Buy GeForce RTX Video Cards for OctaneRender or Redshift?

A single RTX 2080 or 2080 Ti is a fine choice for GPU based rendering now and maybe even better in the future if engines take advantage of the new RT cores in these GPUs. However, the dual-fan Founders Edition cards are NOT good for multi-GPU configurations. Wait for single-fan, blower style cards instead.

UPDATE: GeForce RTX Blower-Style Cards Now Available

We got a batch of ASUS Turbo RTX 2080 cards, and re-created the testing from this article. These have a single fan positioned such that it blows heat generated by the card out of the system. As such, they do not suffer from throttling in multi-GPU configurations. Check out their fantastic OctaneRender and Redshift scaling performance!