Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1268
Article Thumbnail

Does DaVinci Resolve Studio 15.1.2 support NVLink?

Written on October 31, 2018 by Matt Bach
Share:

Introduction

In the past, NVLink has only been available on extremely expensive GPUs like the Quadro GV100. With the new GeForce RTX 2080 and 2080 Ti, however, NVIDIA has decided to include NVLink functionality on a more mainstream product which opens up a number of very interesting possibilities.

We have more information on NVLink in our NVLink on NVIDIA GeForce RTX 2080 & 2080 Ti in Windows 10 article, but in a nutshell, NVLink allows for much faster interconnect between multiple video cards. The version built into the GeForce RTX cards only supports up to two cards, but those two cards should be able to talk and exchange data much faster than they could without NVLink. One of the more common and "simple" ways that NVLink can be utilized is to allow for memory pooling which can let software like DaVinci Resolve to combine the VRAM from multiple GPUs. Right now, Resolve has to load all the relevant data into both card's memory, so if you have two RTX 2080 Ti 11GB cards, you will still only have 11GB of usable VRAM. With memory pooling, however, the cards can share memory which increases the effective amount of memory to 22GB (11GB + 11GB).

Another possibility is that we could simply see increased performance in multi-GPU configurations. Resolve is pretty good at using multiple cards already, but if Blackmagic finds a way to utilize the much faster data rates that NVLink is capable of, the scaling could improve even further.

The issue is that NVLink is very new and as far as we know, Blackmagic has not yet added support in the current versions of DaVinci Resolve (15.1.2). Still, we are curious to see what happens, especially since in order to enable NVLink, you also have to enable SLI. In the past, we've seen lots of issues with Resolve if SLI was enabled so we are very curious to see what happens.

If you would like to skip over our test setup and benchmark result/analysis sections, feel free to jump right to the Conclusion section.

Test Setup & Methodology

Listed below is the hardware & software we will be using in our testing:

Test Hardware  
Motherboard: Gigabyte X399 AORUS Xtreme​
CPU: AMD Threadripper 2990WX 3.0GHz
(4.2GHz Turbo) 32 Core
CPU Cooler: Corsair Hydro Series H80i v2
RAM: 8x DDR4-2666 16GB (128GB total)
Video Card: 1-2x NVIDIA GeForce RTX 2080 Ti 11GB
w/ NVIDIA RTX 4-slot NVLink Bridge
Hard Drive: Samsung 960 Pro 1TB M.2 PCI-E x4 NVMe SSD
OS: Windows 10 Pro 64-bit
Software: DaVinci Resolve 15 (ver. 15.1.2.8)

Our testing for DaVinci Resolve primarily revolves around the Color tab and focuses on the minimum FPS you would see with various media and levels of grading. The lowest level of grading we test is simply a basic correction using the color wheels plus 4 Power Window nodes with motion tracking. The next level up is the same adjustments but with the addition of 3 OpenFX nodes: Lens Flare, Tilt-Shift Blur, and Sharpen. The final level has all of the previous nodes plus one TNR node.

We kept our project timelines at Ultra HD (3840x2160) across all the tests, but changed the playback framerate to match the FPS of the media. For all the difficult RAW footage we tested (CinemaDNG & RED), we not only tested with the RAW decode quality set to "Full Res" but we also tested at "Half Res" ("Half Res Good" for the RED footage). Full resolution decoding should show the largest performance delta between the different cards, but we also want to see what kind of FPS increase you might see by running at a lower decode resolution.

Codec Resolution FPS Bitrate Clip Name Source
H.264 3840x2160 29.97 FPS 80 Mbps Transcoded from RED 4K clip
H.264 LongGOP 3840x2160 29.97 FPS 150 Mbps Provided by Neil Purcell - www.neilpurcell.com
DNxHR HQ 8-bit 3840x2160 29.97 FPS 870 Mbps Transcoded from RED 4K clip
ProRes 422 HQ 3840x2160 29.97 FPS 900 Mbps Transcoded from RED 4K clip
ProRes 4444 3840x2160 29.97 FPS 1,200 Mbps Transcoded from RED 4K clip
XAVC S 3840x2160 29.97 FPS 90 Mbps Provided by Samuel Neff - www.neffvisuals.com
XAVC Long GOP 3840x2160 29.97 FPS 190 Mbps Transcoded from RED 4K clip
Blackmagic RAW 4608x1920 24 FPS 210 Mbps A001_08122231_C008 Blackmagic RAW
RED (7:1) 4096x2304 29.97 FPS 300 Mbps A004_C186_011278_001 RED Sample R3D Files
CinemaDNG 4608x2592 24 FPS 1,900 Mbps Interior Office Blackmagic Design
[Direct Download]
RED (7:1) 6144x3077 23.976 FPS 840 Mbps S005_L001_0220LI_001 RED Sample R3D Files
RED (9:1) 8192x4320 25 FPS 1,000 Mbps B001_C096_0902AP_001 RED Sample R3D Files

With the addition of the "Fusion" tab in Resolve, we are also going to be including some basic tests for that tab as well. At the moment these are relatively easy projects that specifically test things like particles with a turbulence node, planar tracking, compositing, and 3D text with a heavy gaussian blur node. These projects are based on the following tutorials:

If you have suggestions on what we should test in the future, please let us know in the comments section. Especially if you are able to send us a sample project to use, we really want to hear from you!

Color Tab FPS - Raw Benchmark Results

Color Tab FPS - Benchmark Analysis

To analyze our benchmark results, we are going to break it down based the three different levels of color grading we tested. The easiest - a basic grade with 4 power windows - is not too difficult and every GPU we tested should be able to give full playback FPS in everything but RED 8K (Full Res Premium). However, each level up should show more and more of a difference between the different cards. One thing we have come to realize is that as available hardware is getting faster and faster, we are going to have to tweak our testing at some point. We still feel that what we are doing is realistic, but we may need to bump all our test media to 60 FPS just so we can actually see a difference between the different cards.

The "Score" shown in the chart below is a representation of the average performance we saw with each configuration for that test. In essence, a score of "80" means that on average, we were able to play our project at 80% of the tested media's FPS. A perfect score would be "100" which would mean that the system gave full FPS even with the most difficult codecs and grades.

Even a single RTX 2080 Ti is able to play our lower two levels of grading at near full FPS with all the test media we used, but we can see some pretty clear results if we look at the highest level of grading which includes Temporal Noise Reduction. Here, it is very obvious that with NVLink/SLI off we are getting a nice ~40% increase in performance with two GPUs over just one. However, as soon as we turn on NVLink/SLI, we go right back down to the performance we saw with a single GPU. Clearly, having NVLink and SLI on not only doesn't work, it makes things significantly worse!

Fusion Tab FPS - Raw Benchmark Results

Fusion Tab FPS - Benchmark Analysis

Fusion is relatively new to our DaVinci Resolve testing, and so far, we haven't been too impressed with how well it takes advantage of the GPU. To be fair, we are not using media in these projects that is particularly difficult to process, but given the FPS we saw in each project we doubt that that having multiple GPUs would significantly improve performance even if you are using 8K RED media. However, it is possible that either SLI or NVLink might help, so we went ahead and did the testing to find out.

For Fusion, it looks like it really doesn't matter if you use NVLink or SLI at all. Fusion doesn't seem to use multiple GPUs very well, so things that make multi-GPU setups either better or worse don't really affect performance.

Does NVLink work in DaVinci Resolve Studio 15.1.2?

No, NVLink does not work in DaVinci Resolve 15.1.2 and will actually significantly reduce performance! Right now, in order to enable NVLink you have to enable SLI which currently means that Resolve can only effectively see and use one of the two installed GPUs.

NVIDIA GeForce RTX 2080 & 2080 Ti DaVinci Resolve 15 Benchmark

Hopefully this will change in future versions of DaVinci Resolve, but for now, having NVLink enabled is very much a bad thing. It appears that since SLI needs to be enabled for NVLink to work, Resolve is getting confused about what GPUs is can and cannot use. Rather than just ignoring SLI, it actually decides to ignore the second GPU instead. So even if you manually tell Resolve to use both GPUs, it is only able to actually utilize one of them.

We expect this to change in future versions of Resolve since even if Blackmagic decides to not implement NVLink, they still need to at least have enough support so that problems like this don't occur if a user has NVLink enabled for some other software. When they will fix this issue is currently unknown, but at least for Resolve 15.1.2 simply be aware that even if your system supports NVLink, you really should have it turned off when you use DaVinci Resolve.

Tags: DaVinci Resolve, GeForce, RTX, 2080 Ti, NVLink
Fahim Khan.

Matt,

Thank you for this much needed review.

I am scratching my head now. It is very strange that there is no increase in performance with multi RTX GPUs when Blackmagic claims that it does. How does this performance compare to mutli 1080ti GPU config?

Posted on 2018-11-06 16:17:56

Multiple GPUs (whether they are the RTX, GTX, Quadro, or even AMD) definitely DO increase performance. In the Color Tab Benchmark analysis https://www.pugetsystems.co... you can see that we saw up to a 38% increase in performance going from one RTX 2080 Ti to two.

What causes problems is if you enable SLI. SLI is one of many ways to make multiple video cards work together, but it is primarily used for gaming or other 3D visualization using similar methods to gaming. Resolve and most other "compute" type processing does not use SLI, and as we showed in this testing it actually breaks things, resulting in no performance gain rather than the 38% gain we saw with SLI off.

The issue is that Blackmagic needs to fix this problem if they ever want to leverage NVLink (which should allow for even better GPU scaling) since right now if you want NVLink on in Windows, you have to also enable SLI. There is a lot of hype about NVLink in the computing industry right now, which is why we wanted to go ahead and test this to see what happens. We've had plenty of customers turn on SLI since they saw some post on the internet saying that it improves performance, only to have their performance drop instead.

Posted on 2018-11-06 17:26:30
Alan Gordon

Matt Bach Has there been any testing done with Red footage with a timeline resolution matching the framesize? In particular does the 11gb vram become a limitation if you need to deliver at larger framesizes. And does having multiple cards relive that limitation at all even though resolve does not pool memory at this time?

Posted on 2018-11-06 17:40:13

We haven't done too much with the timeline itself set above 3840x2160. That would be an interesting thing to look at, however, so I'll keep that in mind. As far as VRAM goes, 8GB is typically fine for up to 6K media, but you would want a 10GB+ GPU for 8K. That is a general recommendation, however, we have some customers who do a lot of noise reduction nodes and things like that can really eat into VRAM usage.

Unfortunately, you can't add VRAM together across multiple cards right now. If you have two GPUs with 11GB of VRAM you still only get 11GB of usable video memory. The reason for this is that in order for each GPU to do it's share of the processing, it needs to have access to all of the data so each card needs to have it's own copy. This may change in the future with new technology like NVLink which we were testing in this article , but at the moment this is not a feature available in DaVinci Resolve.

Posted on 2018-11-06 17:48:39
Alan Gordon

I'm curious about the 10GB number for 8k. Our machine with a 1080 Ti with 11GB ram routinely has issues rendering out 8k (even without NR) and even 1 frame of temporal NR completely cripples the 1080. And even 6k (with more intensive grades) I've seen renders have issues, though it is mostly OK. Neat Video, however will fail even with 1 frame radius at 6k.

Posted on 2018-11-06 18:47:13

Plugins are going to change things quite a bit, the 10GB recommendation is just for pure DaVinci Resolve. It will also change if you have other applications open, multiple displays (especially 4K displays), and other factors as well. I'm surprised that you are running out of VRAM, however. I wonder if using an 8K timeline makes a big difference. I'll definitely have to look into that.

Posted on 2018-11-06 19:22:49
Alan Gordon

Part of the problem is definitely the r3d debayer being done on the GPU. With a prores 444 xq 8k clip 3 frames of temporal NR renders just fine to 8k. The same clip as an r3d fails rendering to 8k but renders just fine to 4k. (no 4k displays, just 1 1440p and one 1080p)

So it just makes me curious if there is any affect that 2 or 3 gpus could assist with at these large timeline/render sizes.

Posted on 2018-11-07 01:20:58

Yea, right now multiple GPUs just helps with straight performance in Resolve. It is possible that NVLink will allow memory pooling, but it is impossible to know unless you can sneak it out of one of the developers somehow (although they might not even know quite yet either). In your case, the only thing you really can do right now is to move up to a Quadro card. They are a lot more expensive compared to GeForce for the performance you get, but something like the Quadro P6000 has 24GB of VRAM. Performance-wise, however, it is going to be roughly on par with a GTX 1080, maybe a GTX 1080 Ti.

Posted on 2018-11-07 01:24:27

The Quadro RTX cards are coming out soon as well, which should give performance more in line with the GeForce RTX while having more VRAM. However, they are *very* expensive. The only one on sale so far is the Quadro RTX 6000, and it is over $6k for a 24GB card :(

Posted on 2018-11-07 19:29:07
Alan Gordon

Correct me if I'm wrong, but there's no way to have Resolve use one GPU for R3D debayer and one for Resolve image processing? If you could do that, then two GPUs could distribute the load for those specific jobs.

Posted on 2018-11-09 00:05:41

The most control I know of is that you can tell Resolve not to use your primary GPU for anything other than driving your displays. I believe the option is called "Use display GPU for compute" or if you are manually selecting cards you can just not select the primary GPU. All that does is allow you to have a low-end primary GPU to drive your displays while having higher end cards for all the heavy lifting. That used to be useful a few years back, but these days it is really not necessary and if anything having mixed GPUs causes more problems then it solves.

I don't believe there is any way to separate debayering or any other single type of task by GPU, however. But who knows, maybe that is the way Blackmagic will end up leveraging NVLink. It's a way I haven't considered before, but my guess is they will end up focusing on some method that applies to all users and not just those working with RAW footage.

Posted on 2018-11-09 00:19:02