Puget Systems print logo


Read this article at https://www.pugetsystems.com/guides/1253
Article Thumbnail

NVLink on NVIDIA GeForce RTX 2080 & 2080 Ti in Windows 10

Written on October 5, 2018 by William George


When NVIDIA announced the GeForce RTX product line in August 2018, one of the things they pointed out was that the old SLI connector used for linking multiple video cards had been dropped. Instead, RTX 2080 and 2080 Ti cards would use the NVLink connector, found on the high-end Quadro GP100 and GV100 cards. This caused much excitement since one of the features of NVLink on Quadros is the ability to combine the video memory on both cards and share it between them. This is extremely helpful in applications that can be memory-limited, like GPU based rendering, and having it available on GeForce cards seemed like a great boon. Afterward, though, NVIDIA only spoke of it using terms like "SLI over NVLink" - leading many to surmise that the GeForce RTX cards would not support the full NVLink feature set, and thus might not be able to pool memory at all. To clear this up we decided to investigate...

What is NVLink?

At its core, NVLink is a high-speed interconnect designed to allow multiple video cards (GPUs) to communicate directly with each other - rather than having to send data over the slower PCI-Express bus. It debuted on the Quadro GP100 and has been featured on a few other professional NVIDIA cards like the Quadro GV100 and Tesla V100.

What Can NVLink on Quadro Cards Do?

As originally implemented on the Quadro GP100, NVLink allows bi-directional communication between two identical video cards - including access to the other card's memory buffer. With proper software support, this allows GPUs in such configurations to tackle larger projects than they could alone, or even in groups without NVLink capabilities. It required specific driver setup, though.

What Are the Requirements to Use NVLink on Quadros?

Special setup is necessary to use NVLink on Quadro GP100 and GV100 cards. Two NVLink bridges are required to connect them, and a third video card is needed to handle actual display output. Linked GPUs are then put in TCC mode, which turns off their outputs (hence the third card). Application-level support is also needed to enable memory pooling.

TCC Mode Being Enabled on Quadro GP100 Video Cards

This is how TCC is enabled on Quadro GP100s via the command line in Windows 10.

Do GeForce RTX 2080 and 2080 Ti Video Cards Have NVLink Connectors?

Technically, yes: there is a single NVLink connector on both the RTX 2080 and 2080 Ti cards (compared to two on the Quadro GP100 and GV100). If you look closely, though, you will see that the connectors on the RTX cards face the opposite direction of those on the Quadro cards. Check out the pictures below:

NVIDIA GeForce RTX 2080 and Quadro GP100 Side by Side

NVIDIA GeForce RTX 2080 and Quadro GP100 NVLink Connector Comparison

Are the GeForce RTX and Quadro GP100 / GV100 NVLink Bridges the Same?

No, there are several differences between the NVLink bridges sold for the GeForce RTX cards and older ones built for Quadro GP100 and GV100 GPUs. For example, they differ in both appearance and size - with the Quadro bridges designed to connect adjacent cards while the GeForce RTX bridges require leaving a slot or two between connected video cards.

NVIDIA Quadro NVLink Bridge vs GeForce RTX NVLink Bridge (View From Top)

NVIDIA Quadro NVLink Bridge vs GeForce RTX NVLink Bridge (View From Bottom)

Are GeForce RTX and Quadro GP100 NVLink Bridges Interchangeable?

In our testing, the GP100 bridges physically fit but would not work on GeForce RTX 2080s. The GeForce bridge did work on a pair of Quadro GP100 cards, with some caveats. Due to its larger size, only one GeForce bridge could be installed on the pair of GP100s - meaning only half the potential bandwidth was available between them.

Dual NVIDIA Quadro GP100 Cards with Dual Quadro NVLink Bridges Installed

Dual NVIDIA Quadro GP100 Cards with Single GeForce RTX NVLink Bridge Installed

Dual NVIDIA GeForce RTX 2080 Cards with a Quadro NVLink Bridge Installed - Which Does Not Function

Dual NVIDIA GeForce RTX 2080 Cards with a GeForce RTX NVLink Bridge Installed

Are NVLink Bridges for Quadro GP100 and GV100 Cards the Same?

No. While we don't have any GV100 era NVLink bridges here to test, we know that they are the same size as those for the GP100 but are colored differently and sold separately by NVIDIA. Other sources are also reporting that they may work with the new RTX series video cards, but we cannot confirm that.

A Pair of Quadro GP100 Era NVLink Bridges (Silver)

A Pair of Quadro GP100 Era NVLink Bridges (Silver)

A Pair of Quadro GV100 Era NVLink Bridges (Gold)

A Pair of Quadro GV100 Era NVLink Bridges (Gold)

Is NVLink Setup on the GeForce RTX 2080 the Same as Quadro GP100?

After testing many different combinations of cards and NVLink bridges, we were unable to find any way to turn on TCC mode for the GeForce RTX cards. That means they cannot be set up for "peer-to-peer" communication using the same method as the GP100 and GV100 cards, and attempts to test NVLink using the 'simpleP2P.exe' CUDA sample program failed.

Chart of NVIDIA Quadro GP100 and GeForce GTX 2080 NVLink Configurations and Capabilities

The chart above shows the results we found when using different combinations of video cards and NVLink bridges, including which combinations supported SLI and whether TCC could be enabled. Click to expand and see additional notes about each configuration.

Dual Quadro GP100 Video Cards Without NVLink Bridge in Peer-to-Peer Bandwidth Test

Dual Quadro GP100 Video Cards With Single GeForce RTX NVLink Bridge in Peer-to-Peer Bandwidth Test

Dual Quadro GP100 Video Cards With Dual Quadro NVLink Bridges in Peer-to-Peer Bandwidth Test

Dual GeForce RTX 2080 Video Cards With NVLink Bridge Failing Peer-to-Peer Bandwidth Test

These screenshots from the Windows command line show peer-to-peer bandwidth across cards with different types of NVLink bridges installed. The first three are pairs of GP100s with no bridge, the GeForce RTX bridge, and then dual Quadro bridges - while the last screenshot shows that the RTX 2080 cards did not support P2P communication in this test at all, regardless of what bridge was installed.

GeForce RTX 2080 Video Cards Do Not Support TCC Mode

TCC mode cannot be enabled on the GeForce RTX 2080 video cards in Windows.

How To Configure NVLink on GeForce RTX 2080 and 2080 Ti in Windows 10

Instead of using TCC mode, and needing to have a third graphics card to handle video output, setting up NVLink on the new GeForce RTX cards is much simpler. All you need to do is mount a compatible NVLink bridge, install the latest drivers, and enable SLI mode in the NVIDIA Control Panel.

NVIDIA Control Panel Screenshot Showing SLI Enabled on GeForce RTX 2080 Video Cards

It is not obvious that the steps above enable NVLink, as that is not mentioned anywhere in the NVIDIA Control Panel that we could see. The 'simpleP2P.exe' test we ran before also didn't detect it, likely because TCC mode is not being enabled in this process. However, another P2P bandwidth test from CUDA 10 did show the NVLink connection working properly and with the bandwidth expected for a pair of RTX 2080 cards (~25GB/s each direction):

NVIDIA P2P Bandwidth Test Showing NVLink Working on a Pair of GeForce RTX 2080 Video Cards

Do GeForce RTX Cards Support Memory Pooling in Windows?

Not directly. While NVLink can be enabled and peer-to-peer communication is functional, accessing memory across video cards depends on software support. If an application is written to be aware of NVLink and take advantage of that feature, then two GeForce RTX cards (or any others that support NVLink) could work together on a larger data set than they could individually.

What Benefits Does NVLink on GeForce RTX Cards Provide?

While memory pooling may not 'just work' automatically, it can be utilized if software developers choose to do so. Support is not widespread currently, but Chaos Group has it functioning in their V-Ray rendering engine. Just like the new RT and Tensor cores in the RTX cards, we will have to wait and see how developers utilize NVLink.

What About SLI Over NVLink on GeForce RTX Cards?

While memory pooling may require special software support, the single NVLink on the RTX 2080 and dual links on the 2080 Ti are still far faster than the old SLI interconnect. That seems to be a main focus on these gaming-oriented cards: implementing SLI over a faster NVLink connection. That goal is already accomplished, as shown in benchmarks elsewhere.

Will GeForce RTX Cards Gain More NVLink Functionality in the Future?

Future application and driver updates will change the situation on a program-by-program basis, as software developers learn to take advantage of NVLink. Additionally, the 2.5 Geeks Webcast interviewed a NVIDIA engineer who indicated that NVLink capabilities on these cards will be exposed via DirectX APIs - which may be different than the CUDA based P2P code which we tested here.

Does NVLink Work on GeForce RTX Cards in Linux?

My colleague Dr. Don Kinghorn conducted similar tests in Ubuntu 18.04, and he found that peer-to-peer communication over NVLink did work on RTX 2080 cards in that operating system. This functionality in Linux does not appear to depend on TCC or SLI, so with that hurdle removed the hardware link itself seems to work properly.

Tags: NVIDIA, GeForce, RTX, 2080, 2080 Ti, NVLink, SLI, Bridge, Quadro, GP100, GPU, Memory, Pooling

Memory pooling is possible for GeForce RTX according Nvidia’s Director of Technical Marketing, Tom Peterson, during HotHardware 2.5 Geeks podcast:

„Petersen explained that this would not be the case for GeForce RTX cards. The NVLink interface would allow such a use case, but developers would need to build their software around that function. “While it's true this is a memory to memory link; I don't think of it as magically doubling the frame buffer. It's more nuanced than that today,” said Petersen. “It's going to take time for people to understand how people think of mGPU setup and maybe they will look at new techniques. NVLink is laying a foundation for future mGPU setup.”

edit: lik fixed

Posted on 2018-10-06 08:17:01

The link in your comment seems to have been cut off, but I found the podcast episode you are referring to. Do you happen to know what time stamp this particular quote is from? I'd like to go through and listen to the context around it, but I was hoping to avoid listening to the whole hour-long podcast :-)

Posted on 2018-10-06 14:59:27
ryan o'connor


Posted on 2018-10-07 04:01:34

Yeah, talk of NVLink starts just before the 38:00 mark and goes until about 46:30. I ended up watching all of it earlier today, but thank you for the direct link :)

I am going to listen to just that ~8 minute portion again tomorrow, and then write some thoughts.

Posted on 2018-10-07 07:12:46
ryan o'connor

No problem! Interested to hear what you think

Posted on 2018-10-07 18:30:36

Okay, here is the section that I think bears most closely on what our article above covers. It goes from about 41:56 to 44:15 in the video above and addresses two questions:

Interviewer: "NVIDIA collective communications library, the NCCL library, for developing atop NVLink, will GeForce users get access to that for playing with communications and buffers?"

NVIDIA Engineer: "I expect that the answer is 'yes' to that. So NVLink is a software visible capability, and its gonna be exposed primarily through the DX [DirectX] APIs. I'm not sure exactly... NCCL, I'm not super familiar with that but the DX APIs will expose NVLink."

Interviewer: "I had a question, generally speaking, in terms of when you were talking about "hey, what's in your frame buffer?" - in the way I understand the way NVLink works in machine learning and supercomputers, you know, high performance computing - you now have, let's say in the case of two 8GB frame buffer cards, you now have a contiguous 16GB frame buffer. Is that too simplified, simplifying it too much?"

NVIDIA Engineer: "I think that sets the wrong expectation, right? When people say that they're trying to say I can now game with 16GB textures. And its really that style of memory scaling will require app work, right? Its not just gonna magically share that memory. Now its true that you could set it up to do that, right? You could set it up a memory map so that, you know, effectively it looked like a giant frame buffer - but it would be a terrible performance thing. Because the game would really need to know that there is latency to access that second chunk of memory, and its not at all the same. So think of it as it is true that this is a memory to memory kind of link, but I don't just think of it as magically doubling the frame buffer. Its much more nuanced than that today, and its going to really take time for people to understand "hey, NVLink is changing the way I should think about my multi-GPU setup and, effectively, maybe I should start looking at new techniques", right? And that's why we did NVLink. NVLink is not really to make SLI a little bit better, its to lay a foundation for the future of multi-GPU."

So it sounds to me like what is going on, for these GeForce cards, is that they are going to expose NVLink capabilities in a different way than Quadro cards have. That makes sense, in a way, since GeForce cards are aimed at a different audience (mainstream, largely gamers) and need to be accessible to game developers in ways that they are already somewhat familiar with. However, if NVIDIA only allow access to NVLink on GeForce cards through DirectX APIs then that may interfere with using it in applications that are more focused on GPU computation.

I think I will add one more section onto the article above, talking about how just because the traditional way to test NVLink GPU communication doesn't work on the GeForce cards does not mean they will never be able to work together in a similar way. We are, of course, very early in the release of this RTX / Turing GPU generation - and both other APIs / approaches to the issue as well as future driver updates could change the situation :)

Posted on 2018-10-08 18:39:50

Amazing summary. Thank you for taking the effort transcribing it. This was the way I understood it when OTOY talked about their implementation. It is not automagically double the VRAM and it will have a speed hit every time assets need to be exchanged over NVLink. The hope is that this penalty is much smaller than going out-of-core to system memory to fetch assets which don‘t fit the 11 GB VRAM but might fit in a 22 GB pool of VRAM.

The good news is that GPU render engines are already actively working on using the new API as described in the post I linked on an older article here:


The odds are good that we will benefit from the GeForce NVLink as well and the 2080 Ti will have better bandwidth compared to the 2080 cards.

Posted on 2018-10-08 19:01:55

Between the potential of NVLink and RT cores, I think there will be a lot of growth room for GPU rendering on this generation of cards. I am excited to see where it goes, and to test Octane, Redshift, and V-Ray as they release updates that utilize Turing's capabilities. It may also be interesting to replicate the testing above once we have a pair of RTX 2080 Ti cards (we have only one at the moment) to see if they report a different number of links than the vanilla 2080 cards.

Posted on 2018-10-08 19:21:38

Sorry for the cut off link. You will find the context in the article. I did listen to the interview a month ago but I don‘t remember specific timestamps:


Posted on 2018-10-07 05:53:00

Thank you for sharing that! I am going to re-listen to the applicable part of the interview tomorrow, and write some of my thoughts on it here in the comments.

Posted on 2018-10-07 07:13:58

I just posted a reply above, to Ryan O'Connor, addressing the video interview you brought up.

Posted on 2018-10-08 18:40:19

Any chance you can test this on linux, where TCC mode is not an issue?

Posted on 2018-10-07 03:41:59

That may be a little bit outside my area of expertise, but it would certainly be interesting to see if there is any different behavior on Linux.

Posted on 2018-10-07 07:54:54
Donald Kinghorn

Hi Michael, William ask if I could comment ... I've just done a bunch of NVLINK testing in Linux (Ubuntu 18.04, CUDA 10.0 and driver 410) It looks like full NVLINK but with a bit lower performance than you would see on the V100 server hardware. I'll have a full post up at https://www.pugetsystems.co... in a couple of days. I'll be looking at TensorFlow performance along with general performance like the following testing... the following is 2 x RTX2080 founder edition cards

kinghorn@i9:~/projects/samples-10.0/bin/x86_64/linux/release$ nvidia-smi nvlink -c
GPU 0: GeForce RTX 2080 (UUID: GPU-2cac9708-1ed8-0312-ada8-ce3fb52a556c)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false

cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 22.53GB/s

Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 389.09 5.82
1 5.82 389.35
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 386.63 24.23
1 24.23 389.76
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 386.41 11.59
1 11.57 391.01
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 382.58 48.37
1 47.95 390.62

Posted on 2018-10-11 15:26:25

Thank you for doing that testing, Don! It is looking like the issue with NVLink on these GeForce RTX cards is purely because NVIDIA is not allowing TCC mode in the current Windows drivers. I will update the article text (and maybe the title too) to better reflect that.

Posted on 2018-10-11 16:58:52

Great that's terrific news. The 2080ti as a tu102 chip should support twice the bandwidth as the 2080. I'm curious whether this results in training speedups with memory pooling. Will look forward to your writeup.

Posted on 2018-10-12 18:54:46

I couldn't test P2P bandwidth on Windows, of course, but I was able to see that the 2080 Ti cards have two links available - compared to just one on the vanilla 2080 (and none on the upcoming 2070s, as I understand it). So assuming you are using an OS and software setup that works properly with NVLink, then a pair of 2080 Ti should indeed have double the P2P bandwidth of the 2080 :)

Posted on 2018-10-12 18:57:26
Lee aste

so Can't i use GP100 or GV100 NVLINK Bridge for 2080ti 2pcs?
I use Matx motherboard, so I need to do sli, need 2-slot nvlink bridge.. but there is no 2-slot nvlink bridge.. except Quadro nvlinj bridges..
Is there a way?

Posted on 2018-10-08 10:26:42

The Quadro bridge did not work on GeForce RTX cards for us, so I would not expect it to work for you either. Moreoever, I would be concerned about using two of these dual-fan cards right next to each other. The heatsink configuration on the NVIDIA Founders Edition cards in this generation is not built for having cards next to each other without at least one slot in-between for airflow. I think that may be why they don't offer a 2-slot NVLink SLI Bridge.

Posted on 2018-10-08 18:11:30

I did some testing under Win10 1809 and 416.16 drivers and during my single application monitoring of VRAM usage I hit 11.7GB, 700MB over (keep in mine this is single app, not combined OS + app). This was an "SLI" aware app that does indeed use both mGPU with a supporting nVidia profile under DX11. If the 700MB was swapping to main system RAM then I would have expected to see a sharp decline in FPS but no such decline happened at the point the app exceed 11GB usage, FPS was very consistent. So in my real world test case and not the "discussion" case, it seems that Memory Pooling is happening. In my case the application was a flight simulator (Lockheed Martin's Prepar3D V4.x). I can probably run more test by increasing the shadow map size to sharpen shadow quality as this will use more VRAM also and should push me further past 11GB.

Posted on 2018-10-14 16:59:55

Thank you for sharing your experience! If all that is needed is enabling SLI, in order to have memory pooling, that would be nice... but it is definitely a change from how NVLink and memory pooling worked in the past (on Quadro cards). I hope NVIDIA puts out some more official information about this, and it would be nice if they also put more details in their control panel - especially showing memory pooling and usage.

Posted on 2018-10-15 19:22:49

VRay apparently got it to work.

Posted on 2018-10-15 01:26:31

Chaos Group (V-Ray) and OTOY (OctaneRender) have both talked about it, but I haven't seen anything published with detailed information directly showing NVLink at work on GeForce RTX cards in either of those rendering engines. I would love to know more about what they have actually tested and how they got memory pooling working, if indeed they have. It would also be great if they would update their benchmarks to utilize it - both V-Ray Benchmark and OctaneBench are lagging behind their latest releases :(

Posted on 2018-10-15 19:11:22

You guys should be aware that you probably need to enable "SLI" in order for the NVLink to work on the RTX series. Memory pooling also works if implemented in the application. I'd recommend taking a look at this post:

Posted on 2018-10-15 05:58:16

That is really weird - they used bridges from Quadro cards and it worked for them, when that definitely did not work for us (not even SLI mode was available when trying to use those bridges).

Hmm... GV100 bridges? We used ones from the GP100. Maybe the bridges themselves have been updated over the Quadro GP100 -> GV100 generation change? The coloring is different - the bridges in that Facebook post look golden in color, rather than silver like the Quadro GP100 bridges we have.

It is good to see that they are using blower-style RTX cards, though - looks like the same Asus series we tested recently, though they have the 2080 Ti variants (lucky!).

I still am unsure how software like this is functioning with P2P over NVLink without being able to put the cards into TCC mode... but maybe memory pooling in this generation somehow doesn't need that? I'll play with this some more when I have a chance.

Posted on 2018-10-15 19:20:26

Here we go, they got it working on the regular RTX NVLink bridges :) https://www.chaosgroup.com/...

Would be fun to see if it works on the Quadro bridges too I guess. Thanks for testing btw :)

Posted on 2018-10-19 12:45:01

Thank you for posting that! It lines up with some things one of the other guys here (Matt) learned while talking with NVIDIA reps at Adobe Max this week. I am going to test two RTX 2080 cards again, without a third card this time, and make sure they are in SLI mode. That *might* enable P2P functionality in the test we used above, or it may be that the type of testing we were trying to do just won't play nicely with the new RTX cards no matter what. It does look like some of NVIDIA's own tools, even in CUDA 10, are not reporting things properly (like the incorrect VRAM usage mentioned in the link you posted)... which may well be part of why NVLink didn't work as we expected it to.

Posted on 2018-10-19 17:14:01

I noticed that for Geforce RTX there are only 3 and 4 slot connectors, but on the Quadro RTX pages there are 2 slot connectors too. I noticed that there was no mention of the new Quadro RTX NVLink here, I am interested if that 2 slot bridge for Quadro RTX would work on Geforce RTX (for space constrained systems, less optimal thermals is tolerable). I saw a similar question below but I think older Quadro NVLinks were tested since it was not specified.

Posted on 2018-11-06 05:43:04

They're not yet available to purchase and test, but I fully expect the new Quadro RTX NVLink bridges should work on the GeForce cards. The bridges from the old Quadro GP100 didn't work, but I have the feeling that's because it was an older generation of the technology. We don't have any to test here, but other sources online have indicated that bridges from the Quadro GV100 do work with the GeForce RTX cards, which if true is a good sign for future Quadro bridges working as well.

Posted on 2018-11-06 05:45:58

Thank you for the information.

Posted on 2018-11-06 05:53:32