Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1060
Article Thumbnail

DaVinci Resolve 14 GPU Scaling Analysis (1-4x Titan Xp)

Written on October 13, 2017 by Matt Bach
Share:

Introduction

For a number of years, we have focused our video editing testing around Premiere Pro and After Effects, but we have received a large number of requests to include DaVinci Resolve as well. Resolve has actually been on our list for a while now, but since Resolve 14 was supposed to make fundamental changes to how the software runs we decided to hold off our testing until it was available. The launch of Resolve 14 dragged on longer than we expected, but it finally came out of beta about a month ago and we were able to begin our testing.

Today, we have published our first two articles for DaVinci Resolve. This article will be looking at GPU scaling performance with 1-4 NVIDIA Titan Xp video cards while our second article will be looking at how performance changes with different models of CPUs. There is quite a bit more we want to test, but this is an excellent start and will allow DaVinci Resolve users to give us feedback on our test process that we can leverage in future articles. Plus, we wanted to get these articles up before we left for the NAB Show in New York next week (Oct 17-18, 2017). Come visit us at booth N570 in the "Post + Production Pavilion" of Hall 3B if you happen to be attending!

There is a lot we could (and want to) look at in DaVinci Resolve, but today we will be focusing on performance while exporting and more importantly FPS while grading in the color panel. Our testing includes 4K RED, H.264, ProRes 422, and DNxHR HQ footage as well as 6K and 8K RED footage. If you would rather skip over our analysis of the individual benchmarks, feel free to jump right to the conclusion section.

Test Setup

While GPU scaling is our primary concern in this article, we will actually be testing with two platforms and four different CPUs. The CPUs we chose should give us a great range of processing power to see if anything changes with a more or less powerful CPU. One thing to note is that while we are going to be testing with up to four GPUs on the Threadripper system, we are only using up to three GPUs on the Skylake-X systems. At the time of our testing, there were no quad GPU X299 motherboards so three GPUs was the maximum we could test. This has very recently changed with a number of quad GPU capable X299 boards coming out and we will likely revisit quad GPU on an Intel platform in the future, but for now this is a limitation we have to live with.

Before getting into our testing, we want to point out that while our test platforms are using a single storage drive, that is not actually what we would typically recommend to our customers. Normally we would recommend having a SSD/NVMe for your OS and programs and a dedicated drive or multiple drives for your media. However, this is in large part for capacity and organizational purposes and should not have any impact on our results so we opted to use a single drive to cut down on the number of testing variables.

Most of the media we will be using is available from the Sample R3D files. However, there were two special things we did. First was to transcode the 4K RED footage to a number of other popular codecs we wanted to test. The second was that we increased the project FPS to 60 for our color grading tests regardless of the original media FPS. Since many of our CPU/GPU combinations were able to easily play back our test footage at ~24 FPS, this was a relatively easy way to increase the spread and accuracy of our results.

4K

EPIC DRAGON
3840x2160
REDCODE 11:1

Transcoded to:

H.264 .MP4
ProRes 422 .MOV
DNxHR HQ 8-bit .MOV
RAW .TIFF

6K

WEAPON 6K
6144x3077
REDCODE 7:1

EPIC DRAGON
6144x3160
REDCODE 12:1

8K

WEAPON 8K S35
8192 x 3456
REDCODE 9:1

To test exporting we used a moderately complex timeline involving multiple clips, basic color grading, multicam footage, and various effects like crop, gaussian blur and cross dissolves. We do not yet have a video showing exactly our test timeline, but it is nearly identical to what we use in our Premiere Pro testing only without the logo and text overlay.

For our color grading tests, we applied three levels of correction which we called Basic, Advanced, and Torture:

Basic:

  • Single node adjustments

Advanced:

Torture:

  • All nodes from "Advanced"
  • Adds a single node of temporal noise reduction

Exported stills from our 4K testing are available for download if you wish to examine them in more detail.

Exporting

Performance while exporting is likely a secondary consideration for many users, but it is still an important factor to keep in mind. Color grading may be what Resolve is known for, but you still have to get that graded footage delivered after you are done working on it.

DaVinci Resolve GPU Benchmark Exporting

[+] Show Raw Results

Our testing was fairly extensive with over 300 data points, so while the actual results in seconds are available if you click on the "Show Raw Results" link, we also created a number of charts breaking down the average performance gain or loss we saw as the number of video cards was increased. 100% is the performance with a single GPU so a result of 103% would be a 3% gain in performance.

Exporting is something that should be more heavily impacted by the CPU than the GPU, but it turns out that this is even more true than we expected. With the Intel CPUs, the overall net performance gain with multiple GPUs was essentially nothing. With the AMD Threadripper 1950X, however, there was actually a very consistent drop in performance with more GPU when exporting to DNxHR HQ. At 4K it was only a 5% drop in performance with four GPUs versus one, but with our 8K RED footage we saw a significant 14% drop in performance.

Overall, it is pretty safe to say that having more GPUs doesn't help when exporting - at least with the kind of timeline we used. This is jumping ahead a bit into the next section, but one thing we will note is that our timeline does not include any temporal noise reduction (TNR) which is where we saw the biggest performance gains with multiple GPUs. This is likely something we will add in our future testing, but given that Resolve is making huge improvements to their edit interface we also want to continue testing with this type of timeline as well.

Color Grading

Color grading is really where we wanted to focus our efforts so we used three different levels of grading on our test footage. With each footage type and grading level, we watched the FPS counter and recorded the lowest result after giving about a 15 second "grace period" since the counter can sometimes take a bit to settle down. This test was repeated three times with Resolve being restarted in between tests. Across the three runs, we logged the highest of the minimum FPS results and used that in our results below.

 

DaVinci Resolve GPU Benchmark Color Grading

[+] Show Raw Results

Just like in the previous section, our results are formatted based on the performance gain we saw with multiple cards compared to just one. If you want to see the raw results in seconds, just click on the "Show Raw Results" link.

Considering how often the claim is made that more GPUs is always better for DaVinci Resolve, we were a bit surprised at our results. In our "basic" and "advanced" color grading tests we saw the occasional benefit with two GPUs, but just as often we actually saw a drop in performance. In fact, the only time we saw a significant and consistent performance gain was when we added temporal noise reduction in our "torture" testing. For that test, we saw an average 25-35% performance gain with two GPUs but only a further ~10% with three GPUs. On the one test platform where we could use four GPUs (AMD Threadripper 1950X), we actually saw a performance drop with four GPUs over three.

Conclusion

To be frank, our testing did not go the way we expected it to. The lack of performance gain with multiple GPUs when exporting is understandable as encoding media does not utilize the GPU much, but we expected more from our color grading tests. In fact, it was only when we added temporal noise reduction that we saw a significant benefit from even two GPUs:

DaVinci Resolve multiple GPU Benchmark temporal noise reduction

From what we've tested so far, if most of your work is done with the color wheels and curves and you don't use effects like TNR, multiple video cards isn't going to impact performance much. That said, with temporal noise reduction the performance gain was very respectable - we saw an average 25-35% performance gain with two GPUs and a further ~10% with three GPUs. This is not quite as much as you may expect given Resolve's reputation of benefiting greatly from more GPU power, but still relatively good. The performance drop with four GPUs on the AMD Threadripper 1950X system is odd, but we are curious to test four GPUs on an Intel platform in the future to see if this is simply an issue with that platform.

Like we said, these results were a bit of a surprise to us so we are very interested in any feedback you may have. These results may simply be a result of Blackmagic's software optimizations in Resolve 14, or perhaps we are not testing intensive enough work loads. We could certainly force Resolve to make greater use of multiple cards by slapping on a bunch of TNR nodes, but we strive to keep our testing relevant to real-world users. Unless there are people out there that actually use a couple dozen TNR nodes when grading, this would be counter productive to our goal of determining the real-world benefit of different hardware configurations.

This is just the first of many articles looking at performance in DaVinci Resolve and there is still a lot we want to test including exporting all the color graded clips in addition to our test timeline, adding motion tracking, testing the performance impact of using a Decklink display out card, and much more. If you think there is something else we should include, we encourage you to let us know in the comments. Especially if you are able to send us an example project, we have found that active communication with the community to be incredibly valuable as it helps us shape our testing to match exactly the type of work that real users do every day.

Tags: DaVinci Resolve, Titan Xp, GPU, video card, Skylake-X, Threadripper, 7960X, 7900X, 7820X, 1950X
Chip Murphy

Not sure that I agree with this test, especially since all the cards aren't operating at 16x.

This test should have been conducted on a SuperMicro system that can handle four 16x PCIe 3.0 lanes, not consumer level boards at 16x, 8x, 16x, 8x.

Posted on 2017-10-18 13:29:54
Chad Capeland

"The performance drop with four GPUs on the AMD Threadripper 1950X system is odd"
3x GPU should have left them at 16x, right? Adding the 4th may have dropped them down to 8x.

Posted on 2017-10-22 01:34:19

Hey Chad, the drop to x8 is probably a factor, but I don't think that is the main thing that going on. On the X399 board we used, I'm pretty sure the PCI-E slots are always x16/x16/x8/x8 no matter whether you have two, three, or four GPUs installed. So the third GPU was already running at x8 and we saw a decent performance gain there which makes it unlikely that having the fourth GPU at x8 is causing a drop in performance all by itself.

Also, if running at x8 was that large of a factor, then it is very strange that we saw a larger increase in performance going from one GPU to two with the Core i7 7820X (which is x16/x8) than we did with the Core i9 7900X (which is x16/x16). Until we do the same testing with quad GPU on an Intel platform, it is really hard to say if this is something that you would always see with quad GPU or if there is simply something odd that happens with the AMD Threadripper platform.

Posted on 2017-10-23 18:19:05
Rob Trombino

Thanks so much for all the time and effort in performing these tests and sharing them with us!

I do have to point out a few things though that seem rather odd.

• No mention if virtual memory is disabled on the OS, which it should be with that much system memory, since leaving it on is a huge bottleneck.
• Background processes that should be disabled as well:
SuperFetch, drive indexing, Windows search, drive optimization, and a few others . . .
• Reading and writing the cache files to either the OS or storage drive is another huge bottleneck as it would be for rendering to either of those drives as well.

All this applies to both the CPU and GPU tests.

Best regards.

Posted on 2017-10-22 15:12:15

Hey Rob. Do you mean the pagefile? If so, we do have that left on but I've never seen a performance drop with leaving it on. Using the pagefile is much slower than using system RAM, but that should only ever happen if you run out of system memory. There are still some weird bugs and stability issues that can come up if you disable the pagefile completely, so we tend to leave it on unless there is a significant reason to disable it.

We do our testing with everything in the OS set the same way the majority of users have their systems. That does mean that the results are probably a percent or two worse than they could be under the most ideal of situations, but we try to replicate the actual performance an end user would see rather than doing a bunch of optimizations that 99% of users don't do. However, we do repeat all our testing a minimum of three times to ensure that things like a random Windows update or drive index doesn't cause artificially low results.

Storage drive configuration is on our list of testing we want to do in the future, so rest assured that we will tackle that at some point. From all the other testing we have done, I suspect that having multiple drives might make a small to decent difference when working with RAW or lightly compressed file formats, but until we do the testing itself that is really just a guess.

Posted on 2017-10-23 18:24:59
Rakesh Malik

It sounds like you tested the playback and export performance mostly using fairly basic nodes, and also using mezzanine codecs rather than raw, which leads to an incomplete story here.

Since Red raw is decompressed on the CPU and then de-Bayered on the GPU, that will have a significant affect on how an additional GPU gets used. A lot of colorists work directly with Red raw in order to have access to as much of the original data as possible, though a lot also end up working with DPX files, which are already de-Bayered but far larger than Redcode.

Since ArriRAW isn't compressed, it's huge... so it doesn't impose as much load for decompression, but the de-Bayer is still compute intensive (not as much admittedly as Redcode with high resolution footage), the catch being that it's more disk intensive due to its size.

The LGG color wheels are a relatively small part of the color grading process also; there are a lot of other operations involved that add considerably to the compute load like tracking power windows, keying, and a lot of the visual effects like the face refinement tool, and don't forget the often used stuff like Sapphire and BorisFX that add further to the compute load.

If you use 8K Red raw (which I can help you get, since I'm in your area and have an 8K Red camera), color grade with that, and test the playback + export using some of these tools, you might see some different results from what you got on this test.

That isn't to say that this test isn't valid, just that there are lots of other situations and variations that haven't been covered yet.

Posted on 2017-10-24 19:20:31

There is definitely a lot we still want to test and like we noted in the article, this is just the first of many articles to come. ArriRAW is on our list for round two, as well as many of the tools your listed (tracking, keying, etc.). Plugins we typically shy away from simply there is such a wide range of plugins out there, but that is something we may tackle a bit further in the future as well.

As for RED footage, we tested 4K 11:1, 6K 7:1, 6K 12:1, and 8K 9:1 in this article. However, one thing I do want to do next time is make a bit more of a clear division between high and minimal compression. I'm hoping to test something around 4:1 and 12:1 with 4K, 6K, and 8K footage. That should give us a pretty good idea of how different hardware performs with quite a wide range of RED footage without taking multiple days just to finish one round of testing.

Posted on 2017-10-24 19:51:08
Arshjyot Singh

Hey can you please compare Intel CPUs with Titan Xp vs Vega 64 & let us know what difference it makes in premiere pro Live Playback & editing. There is a video on YouTube that is making huge claims that if you have Vega 64 w/ Threadripper, then Premiere Pro can easily handle RED 8k footage at full resolution with LUTs & lumetri. I know it sounds crazy. But you can check it for yourself. Search '8k video editing pc' on YouTube. First video you will see of Max Yuryev'. In that video he compares his TR system with Linus Tech Tips RED Rocket Card. And Max's system crushes the Top Intel build. Can you let us know what is happening behind the hardware. Is it CPU or GPU responsible for LIVE playback in premiere pro?

You guys tested & proved that 14 core Intel CPU is best for Premiere Pro when compared to all others (including threadripper). But we are eager to see Titan Xp vs Vega 64 vs 1080 Ti. And also if Possible, mix & match the CPUs & GPUs for even more better comparison.

This will give an answer to best combination for Premiere Pro.
-Threadripper+Vega 64
-Threadripper+Titan Xp
-Threadripper+1080 Ti
-Intel 14 Cores+Vega 64
-Intel 14 Cores+Titan Xp
-Intel 14 Cores+1080 Ti

Posted on 2017-10-29 19:17:26
Patrick Taylor

Excellent test. Thanks for taking the time.

You mention you did all these tests at 60fps. I'm a professional colorist for over 5 years now and I have yet to have a client ask me to deliver a project at anything but 30fps or 24.98fps. So, if these are the 2 dominant speeds I work at, I'm wondering if there is any benefit at all to having more than 1 Titan GPU?

Thanks again for all the excellent testing you folks do. Much appreciated.

Posted on 2017-11-06 06:58:35

Hey Patrick, we did our testing at 60 FPS mostly because that gives us a bit of extra headroom to compare the different hardware configurations. There are some people that do need 60 FPS performance, but I would say the majority of our customers only need 30 FPS max just like you. However, by testing up to 60 FPS we can see how much above 30 FPS we are able to achieve with different hardware. Hitting 30 FPS on the dot is OK, but I think it is extremely useful to know if you are right on the edge (where applying a few more effects would cause you to drop frames) or if you have plenty of headroom.

We're still working out the best way to present all our data since giving a percentage result is only moderately useful for real colorists (not just those interested in hardware benchmarks in general), but for now you can take a look at the raw results charts to get an idea of what FPS different configurations should be able to achieve. As for 1 vs 2 Titan XPs, I would say multiple is really only useful if you use something like TNR. There were also a few times with 6K and 8K footage that a second GPU bumped us up from ~20 to ~25 FPS, however, so a second GPU can be very helpful at times at those higher resolutions. If you work with 4K footage at <30FPS and don't use TNR or similar effects, I don't think a second GPU is going to do anything for you. Probably better to spend that money on a couple M.2/PCIe NVMe drives or more RAM.

Posted on 2017-11-09 03:58:14
Volta

https://nvidianews.nvidia.c...

https://www.nvidia.com/en-u...

http://www.nvidia.com/downl...

Can you please benchmark the new NVIDIA TITAN V?

Posted on 2017-12-08 14:17:23
Charles Unice

I just went through the tutorial (Advanced Color Grading Tutorial from SonduckFilm) that you guys have done for your "advanced" color grading, and its pretty basic color correction. I would not call that advanced. Id like to see a test where you do the "advanced" grading that you have done but also use some power widows and the face refinement tool, also track some of the power windows. On one shot ill have any where from at least 2-10 power windows. The face refinement tool has reduced this number but its not perfect. For example ill have tracking power windows removing blemishes, brightening eyes and doing background relights. When I render a 4k cinema DNG RAW file using your "advanced coloring" I'm rendering at 17 fps. cpu load at 90-94, gpu load 20-50. But when I do my own color correction with power windows I drop down to 8fps cpu load 43-53, and gpu 82-91. It seems to me that I could use a second GPU but Id like to know before dropig $800 on another GPU. Current system specs: i7 5960x oc to 4.5ghz, Evga 1080ti ftw3. I would like to know if adding another gpu would help increase my render times, or if I need to jump to a i9 7960x? Wish I had the cash to test this myself. Thanks you guys rock!!! I love your forum.

Posted on 2017-12-09 04:53:13