Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1060
Article Thumbnail

DaVinci Resolve 14 GPU Scaling Analysis (1-4x Titan Xp)

Written on October 13, 2017 by Matt Bach
Share:

Introduction

For a number of years, we have focused our video editing testing around Premiere Pro and After Effects, but we have received a large number of requests to include DaVinci Resolve as well. Resolve has actually been on our list for a while now, but since Resolve 14 was supposed to make fundamental changes to how the software runs we decided to hold off our testing until it was available. The launch of Resolve 14 dragged on longer than we expected, but it finally came out of beta about a month ago and we were able to begin our testing.

Today, we have published our first two articles for DaVinci Resolve. This article will be looking at GPU scaling performance with 1-4 NVIDIA Titan Xp video cards while our second article will be looking at how performance changes with different models of CPUs. There is quite a bit more we want to test, but this is an excellent start and will allow DaVinci Resolve users to give us feedback on our test process that we can leverage in future articles. Plus, we wanted to get these articles up before we left for the NAB Show in New York next week (Oct 17-18, 2017). Come visit us at booth N570 in the "Post + Production Pavilion" of Hall 3B if you happen to be attending!

There is a lot we could (and want to) look at in DaVinci Resolve, but today we will be focusing on performance while exporting and more importantly FPS while grading in the color panel. Our testing includes 4K RED, H.264, ProRes 422, and DNxHR HQ footage as well as 6K and 8K RED footage. If you would rather skip over our analysis of the individual benchmarks, feel free to jump right to the conclusion section.

Test Setup

While GPU scaling is our primary concern in this article, we will actually be testing with two platforms and four different CPUs. The CPUs we chose should give us a great range of processing power to see if anything changes with a more or less powerful CPU. One thing to note is that while we are going to be testing with up to four GPUs on the Threadripper system, we are only using up to three GPUs on the Skylake-X systems. At the time of our testing, there were no quad GPU X299 motherboards so three GPUs was the maximum we could test. This has very recently changed with a number of quad GPU capable X299 boards coming out and we will likely revisit quad GPU on an Intel platform in the future, but for now this is a limitation we have to live with.

Before getting into our testing, we want to point out that while our test platforms are using a single storage drive, that is not actually what we would typically recommend to our customers. Normally we would recommend having a SSD/NVMe for your OS and programs and a dedicated drive or multiple drives for your media. However, this is in large part for capacity and organizational purposes and should not have any impact on our results so we opted to use a single drive to cut down on the number of testing variables.

Most of the media we will be using is available from the Sample R3D files. However, there were two special things we did. First was to transcode the 4K RED footage to a number of other popular codecs we wanted to test. The second was that we increased the project FPS to 60 for our color grading tests regardless of the original media FPS. Since many of our CPU/GPU combinations were able to easily play back our test footage at ~24 FPS, this was a relatively easy way to increase the spread and accuracy of our results.

4K

EPIC DRAGON
3840x2160
REDCODE 11:1

Transcoded to:

H.264 .MP4
ProRes 422 .MOV
DNxHR HQ 8-bit .MOV
RAW .TIFF

6K

WEAPON 6K
6144x3077
REDCODE 7:1

EPIC DRAGON
6144x3160
REDCODE 12:1

8K

WEAPON 8K S35
8192 x 3456
REDCODE 9:1

To test exporting we used a moderately complex timeline involving multiple clips, basic color grading, multicam footage, and various effects like crop, gaussian blur and cross dissolves. We do not yet have a video showing exactly our test timeline, but it is nearly identical to what we use in our Premiere Pro testing only without the logo and text overlay.

For our color grading tests, we applied three levels of correction which we called Basic, Advanced, and Torture:

Basic:

  • Single node adjustments

Advanced:

Torture:

  • All nodes from "Advanced"
  • Adds a single node of temporal noise reduction

Exported stills from our 4K testing are available for download if you wish to examine them in more detail.

Exporting

Performance while exporting is likely a secondary consideration for many users, but it is still an important factor to keep in mind. Color grading may be what Resolve is known for, but you still have to get that graded footage delivered after you are done working on it.

DaVinci Resolve GPU Benchmark Exporting

[+] Show Raw Results

Our testing was fairly extensive with over 300 data points, so while the actual results in seconds are available if you click on the "Show Raw Results" link, we also created a number of charts breaking down the average performance gain or loss we saw as the number of video cards was increased. 100% is the performance with a single GPU so a result of 103% would be a 3% gain in performance.

Exporting is something that should be more heavily impacted by the CPU than the GPU, but it turns out that this is even more true than we expected. With the Intel CPUs, the overall net performance gain with multiple GPUs was essentially nothing. With the AMD Threadripper 1950X, however, there was actually a very consistent drop in performance with more GPU when exporting to DNxHR HQ. At 4K it was only a 5% drop in performance with four GPUs versus one, but with our 8K RED footage we saw a significant 14% drop in performance.

Overall, it is pretty safe to say that having more GPUs doesn't help when exporting - at least with the kind of timeline we used. This is jumping ahead a bit into the next section, but one thing we will note is that our timeline does not include any temporal noise reduction (TNR) which is where we saw the biggest performance gains with multiple GPUs. This is likely something we will add in our future testing, but given that Resolve is making huge improvements to their edit interface we also want to continue testing with this type of timeline as well.

Color Grading

Color grading is really where we wanted to focus our efforts so we used three different levels of grading on our test footage. With each footage type and grading level, we watched the FPS counter and recorded the lowest result after giving about a 15 second "grace period" since the counter can sometimes take a bit to settle down. This test was repeated three times with Resolve being restarted in between tests. Across the three runs, we logged the highest of the minimum FPS results and used that in our results below.

 

DaVinci Resolve GPU Benchmark Color Grading

[+] Show Raw Results

Just like in the previous section, our results are formatted based on the performance gain we saw with multiple cards compared to just one. If you want to see the raw results in seconds, just click on the "Show Raw Results" link.

Considering how often the claim is made that more GPUs is always better for DaVinci Resolve, we were a bit surprised at our results. In our "basic" and "advanced" color grading tests we saw the occasional benefit with two GPUs, but just as often we actually saw a drop in performance. In fact, the only time we saw a significant and consistent performance gain was when we added temporal noise reduction in our "torture" testing. For that test, we saw an average 25-35% performance gain with two GPUs but only a further ~10% with three GPUs. On the one test platform where we could use four GPUs (AMD Threadripper 1950X), we actually saw a performance drop with four GPUs over three.

Conclusion

To be frank, our testing did not go the way we expected it to. The lack of performance gain with multiple GPUs when exporting is understandable as encoding media does not utilize the GPU much, but we expected more from our color grading tests. In fact, it was only when we added temporal noise reduction that we saw a significant benefit from even two GPUs:

DaVinci Resolve multiple GPU Benchmark temporal noise reduction

From what we've tested so far, if most of your work is done with the color wheels and curves and you don't use effects like TNR, multiple video cards isn't going to impact performance much. That said, with temporal noise reduction the performance gain was very respectable - we saw an average 25-35% performance gain with two GPUs and a further ~10% with three GPUs. This is not quite as much as you may expect given Resolve's reputation of benefiting greatly from more GPU power, but still relatively good. The performance drop with four GPUs on the AMD Threadripper 1950X system is odd, but we are curious to test four GPUs on an Intel platform in the future to see if this is simply an issue with that platform.

Like we said, these results were a bit of a surprise to us so we are very interested in any feedback you may have. These results may simply be a result of Blackmagic's software optimizations in Resolve 14, or perhaps we are not testing intensive enough work loads. We could certainly force Resolve to make greater use of multiple cards by slapping on a bunch of TNR nodes, but we strive to keep our testing relevant to real-world users. Unless there are people out there that actually use a couple dozen TNR nodes when grading, this would be counter productive to our goal of determining the real-world benefit of different hardware configurations.

This is just the first of many articles looking at performance in DaVinci Resolve and there is still a lot we want to test including exporting all the color graded clips in addition to our test timeline, adding motion tracking, testing the performance impact of using a Decklink display out card, and much more. If you think there is something else we should include, we encourage you to let us know in the comments. Especially if you are able to send us an example project, we have found that active communication with the community to be incredibly valuable as it helps us shape our testing to match exactly the type of work that real users do every day.

Tags: DaVinci Resolve, Titan Xp, GPU, video card, Skylake-X, Threadripper, 7960X, 7900X, 7820X, 1950X
editor0400

very interesting, thanks for sharing. I'm curious if the results are a RED specific thing? What were the debayer settings? ARRI has some great sample footage available for download in both ProRes and ARRIRAW formats.

All though that M2 storage is very fast, I wonder if reading off one volume and rendering to another would boost performance.

Posted on 2017-10-15 22:38:49
Dylan

I just made a long reply saying the same thing but realised I missed this bit of the article -
"However, there were two special things we did. First was to transcode the 4K RED footage to a number of other popular codecs we wanted to test."

They tested RAW TIFF - which should be roughly synonymous with other uncompressed RAW formats like ArriRAW.

Posted on 2017-10-15 22:46:55

The ARRI sample footage is something we've used in the past, but we switched to transcoding the RED footage to make things easier and more consistent for us. We only used ProRes 422 in this article, but ProRess 4444 is something we often use as well. ARRIRAW is something I haven't considered however. I'm actually not too familiar with their cameras - is that something people commonly shoot in? Or do they shoot directly to ProRes?

As for the M.2 storage, I don't think it would make difference for what we tested - I kept a close eye on the disk load and it was really low. Different storage configurations is on our list of future testing, however, so hopefully soon(ish) I can give you a more definitive answer.

Posted on 2017-10-17 19:05:51
editor0400

the R3D codec native is how most facilities are working these days, rather than making intermediates. It's also a heavily taxing codec on both the CPU and GPU so it would make for more dramatic results.

Love these posts BTW!

Posted on 2017-10-19 20:56:50
Dylan

ArriRAW is a very common codec in high end drama, film and commercials in the UK, and I believe elsewhere in the world as well.

Having been discussing this over on the Resolve forums (https://forum.blackmagicdes... I think it would be beneficial to have ArriRAW as a format as well for such tests. The Arri cameras are by far and away the most popular cameras in Film/TV, they absolutely dominate, and the only type of RAW they record is uncompressed.

ArriRAW is an uncompressed RAW format, with Debayer happening in the GPU. So whilst in a test like the above Redcode likely saturates the CPU during the debayer/decode part of the pipeline meaning multiple GPU's are harder to take advantage of, ArriRAW barely stresses the CPU as it is debayer only and no decoding/uncompressing - so you may actually be able to see the benefit of multiple GPU's as other parts of the hardware have not already bottlenecked, especially when choosing "full debayer" in the options.

Posted on 2017-10-22 17:27:24

Interesting, thanks for that info! I'll note down to try to include ArriRAW in our future testing. Thanks a ton for the suggestion!

Posted on 2017-10-23 18:07:40
Dylan

This is interesting, as I have seen other forum posts where an additional GPU shows an immediate performance boost, perhaps it is simply that the TitanXp is already handling as much throughput as Resolve can throw as it.

Would be interesting to see the 'standard candle test' (http://www.carousel.hu/stan... ) as an addition to these tests as it is a widely used standard for benchmarking Resolve.

Posted on 2017-10-15 22:42:38

The "standard candle test" is something we've used in the past, but the problem with that benchmark is that it isn't a very real-world test. It is just some 1080p H.264 footage that gets a ton of blur or TNR node applied. It is really good as a stress test, but I think you are going to have a hard time finding anyone who does that in their actual work.

This is really why we do our testing the way we do rather than using the "standard candle benchmark" or something like it. The closer we can get our testing to match what actual users of Resolve do, the better and more useful that testing is. We would rather show the actual performance difference that someone might see in their day to day work rather than a huge performance bump in some unrealistic scenario that they would never actually find themselves doing.

Posted on 2017-10-17 18:59:49
Dylan

Whilst i agree with the principle, with tests like these running many thousands of dollars worth of GPU the only realistic use scenario is someone who is doing a ton of GPU heavy work.

Whilst the standard candle test isnt going to tell you how resolve handles transcoding, it does tell you how well it scales GPU heavy tasks. For a user running multiple nodes, noise reduction, and multiple openfx plugins this is useful info to have.

What you might then end up with is a situation where you can say that Resolve can scale across 4x TitanXp, but it is only useful for these very specific use scenarios. On the other hand it may be that other components bottleneck before the extra GPU's get to flex their muscle.

I think it seems likely the CPU was the bottleneck for most of the tests above, I'm not sure if you agree? I wonder if even in the case of RAW TIFF the CPU was getting to 100% on the encode of h264/dnx4k before the first TitanXp was maxed out.

Were you monitoring CPU/GPU load during the tests?

Posted on 2017-10-17 19:13:47

I think the difference between what we try to do and something like the candle test is that if multiple nodes, noise reduction, and openfx plugins is what is bogging down users, we would rather test exactly that rather than a bunch of TNR nodes. It is more difficult to test, but it is going to much more useful in the end. That's why we asked for people to let us know if there is something else they felt we should test - the more realistic we can make our testing, the better it is for everyone.

As far as CPU being the bottleneck, it most likely is with the testing we did. But if the CPU is the bottleneck when using the fastest CPUs currently available, then that alone is incredibly valuable to know. It doesn't matter what kind of scaling might be possible from multiple GPUs if your CPU isn't able to keep up. Future CPUs will certainly help, but GPU performance is increasing at a much faster pace than CPU performance so if it is a problem now it is only going to get worse over time. Dual Xeon might be able to make a difference (we are holding off testing that until the Intel SP platform becomes useful for workstations) but from the testing we did in the past I don't think it is going to be much.

CPU/GPU load I did watch a bit, but I didn't log anything. The more testing I've done with programs that mix both CPU and GPU performance, the more unreliable I have found load levels to be. If we are running something at 100% then that is a pretty clear blottleneck, but especially in a program like Resolve that is pretty rare to have happen. Not saying it doesn't happen at times, but for something like FPS when grading I put way more stock in actual performance numbers than trying to guess anything from load percentages.

Posted on 2017-10-17 19:45:58
Dylan

Cheers Matt,

Fwiw i mostly agree that monitoring CPU/GPU usage levels can be misleading, and rarely tells you a full story.

Posted on 2017-10-17 19:49:55
Housesion Madrid

there is a significant difference in Resolve 14, between 1080 and 1080ti?

Posted on 2017-10-17 14:38:52
Jay Smith

I wouldn't think so, but for ~150 bucks you may as well? At some point in the future you'll likely max out the memory on the 1080 and have a nice big crash, so why not avoid that fora little bit.

Posted on 2017-10-17 15:33:49

We haven't tested different GPUs in Resolve quite yet, but you are probably looking at somewhere around a 10-15% performance increase with the 1080 Ti. Like Jay mentioned though, the extra VRAM is always nice.

Posted on 2017-10-17 18:49:36
Thomas Goward

could you have a look at vega aswell? they are apparently better than nvidia gpu's in resolve...

Posted on 2017-10-30 05:27:51
Chip Murphy

Not sure that I agree with this test, especially since all the cards aren't operating at 16x.

This test should have been conducted on a SuperMicro system that can handle four 16x PCIe 3.0 lanes, not consumer level boards at 16x, 8x, 16x, 8x.

Posted on 2017-10-18 13:29:54

I've received questions and comments about x16 vs x8 in the past so I figured I would get all the information out there so in the future I can just point people to this comment rather than having to re-type it out. Sorry for it being so long!

The total number of PCI-E lanes for things like GPUs is actually determined by the CPU, not the motherboard. So the Core i9 CPUs for example only have 44 PCI-E lanes to divide between your GPUs. This lets you do dual GPU at full x16 (since that only needs 32 lanes), but once you get up to triple GPU it isn't possible to have all three running at full x16 since that would require 48 lanes where those CPUs only have 44 in total. The Core i7 7820X is even worse since it has only 28 PCI-E lanes. So while you can do a single GPU at x16 with that CPU, the second and third GPU will be running at x8 speeds.

AMD Threadripper, however, can do triple GPU at full x16 (which is what we were running at with triple GPU on that platform), but when you get up to four GPUs only three cards can run at x16 and the fourth card will run at x8. Threadripper actually has 64 PCI-E lanes, but unlike Intel some of those lanes are reserved for things like USB controllers, SATA controller, etc. so you can't use all 64 lanes for video cards.

The Supermicro board I'm assuming you are talking about is the C9X299, but that board can't do triple or quad GPU at full x16 either. They don't have a manual uploaded so I can't check the block diagram, but they list the specs as "4 PCI-E 3.0 x16 ... ***4xPCIe3.0 x16 Slots(8/NA/16/16 or 8/8/8/16 or NA/4/8/16(Skylake-X 28 Lanes)". Their formatting isn't great, but what I am 99.9% sure that is saying is that the slots are physically x16, but with a 44 lanes CPU (Core i9) you get x8/x16/x16 with triple GPU (which is what we tested with on our X299 system) or x8/x8/x8/x16 with four GPUs. Four GPUs we do want to test on the Intel platform in our next round of testing, but it is not possible to have them all running at full x16 speeds with the CPUs that are currently available.

There were some boards on x99 that used what are called PLX or PEX chips that allowed multiple video cards to share PCI-E lanes to get a kind of fake quad x16 setup. On boards like the Asus X99-E WS, the first two cards shared 16 lanes while the third and fourth card shared a second set of 16 lanes. This was really useful when doing things with unequal load across the cards since if the second card is idle, the first card gets the full x16 speed. However, for things like Resolve where all the GPUs are loaded roughly equally, what you end up with is all the GPUs running at x8. Remember that there is a finite number of PCI-E lanes on the CPU and since you cant just create more out of nothing, this kind of lane sharing was a interesting attempt at a work around. PLX/PEX isn't actually all that useful when doing things that heavily loads all the cards, however, which is why I think no one has made an X299 board with PLX so far. If the actual result is that the cards are running at x8 under load anyways, then all you are doing is adding additional overhead from the PLX chip itself which likely lowers performance rather than improves it.

Today, the only way to get true x16 with four cards is with dual Xeon (or with AMD EPYC but that doesn't actual exist as a physical product right now). There, you have two CPUs so double the number of PCI-E lanes. There is a lot of performance hurdles with multiple physical CPUs however, so even with the cards running at true x16 it is very likely that the performance impact of the platform itself will result in overall lower performance in Resolve. Right now, the latest Intel Xeon SP CPUs are out, but there isn't a great motherboard for them to be used in a workstation format. Once there is, we will be including dual Xeon in our testing so we can include quad x16 in our testing. But again, the platform itself is likely to make a bigger difference than x8 vs x16.

Really what it comes down to is that while having everything at x16 would be ideal, that simply isn't what is possible with most current CPUs. The PCI-E lane limitations with different CPUs is a very real and valid consideration, and our testing is designed around taking that into account. We could just test with dual Xeon to get all the cards at x16 speeds, but those results are possibly very inaccurate for someone who is purchasing a Core i7 7820X or a Core i9 7980X. This is why we include a range of CPUs even in our articles like this one which is focused primarily on GPU performance. Real numbers taken from real CPU/GPU combinations are always going to be more reliable than testing with a single CPU. Just look at the scaling going from one to two GPUs with the Core i7 7820X versus the Core i9 7900X. Even though the 7820X limits the second GPU to x8 speeds, the scaling was actually better than with the 7900X where the second GPU is able to run at full x16! That alone is a very clear indication that there is a lot more to the performance equation than simply x8 versus x16.

Posted on 2017-10-19 18:04:04
Luca Pupulin

Hi,

I'll be curious to know which driver version you used on the Titan Xp,
'cause Nvidia claims a performance boost of 3x on some applications with the release of driver version 385.12
(https://blogs.nvidia.com/bl...

although I guess the main boost is pertinent to 3D OpenGL applications and CAD programs,they explicitly cites Maya,
and some benchmarks show this is true (although it's not clear whether the driver also enables some extra OpenGL extensions you find on Quadro cards),
would be nice to know if this is somehow true for post production applications as well.

Cheers

Posted on 2017-10-20 15:14:03

Hey Luca, I believe we were using driver version 387.92 . We looked into those performance boost claims a bit when the 385.12 driver came out and I believe our conclusion at the time was that they simply disabled some artificial limitations they had in place in order to encourage Quadro sales. Most of the performance gains with with engineering software or with software from AutoDesk (like Maya). I don't believe software like Resolve would have seen a major difference with that driver, but we are using an even newer driver so if there is a difference we are already taking it into account in our testing.

Posted on 2017-10-23 18:29:51
Luca Pupulin

Hi Matt,
thank you for your reply and to confirm my suppositions :-)

a bit off topic here..but would be great to see a comprehensive Maya benchmark (I discussed about this subject with your colleague William George,when I pointed out that the SPECapc for Maya 2017 had been recently released.

Cheers,

Luca

Posted on 2017-10-24 16:16:19
Chad Capeland

"The performance drop with four GPUs on the AMD Threadripper 1950X system is odd"
3x GPU should have left them at 16x, right? Adding the 4th may have dropped them down to 8x.

Posted on 2017-10-22 01:34:19

Hey Chad, the drop to x8 is probably a factor, but I don't think that is the main thing that going on. On the X399 board we used, I'm pretty sure the PCI-E slots are always x16/x16/x8/x8 no matter whether you have two, three, or four GPUs installed. So the third GPU was already running at x8 and we saw a decent performance gain there which makes it unlikely that having the fourth GPU at x8 is causing a drop in performance all by itself.

Also, if running at x8 was that large of a factor, then it is very strange that we saw a larger increase in performance going from one GPU to two with the Core i7 7820X (which is x16/x8) than we did with the Core i9 7900X (which is x16/x16). Until we do the same testing with quad GPU on an Intel platform, it is really hard to say if this is something that you would always see with quad GPU or if there is simply something odd that happens with the AMD Threadripper platform.

Posted on 2017-10-23 18:19:05
Rob Trombino

Thanks so much for all the time and effort in performing these tests and sharing them with us!

I do have to point out a few things though that seem rather odd.

• No mention if virtual memory is disabled on the OS, which it should be with that much system memory, since leaving it on is a huge bottleneck.
• Background processes that should be disabled as well:
SuperFetch, drive indexing, Windows search, drive optimization, and a few others . . .
• Reading and writing the cache files to either the OS or storage drive is another huge bottleneck as it would be for rendering to either of those drives as well.

All this applies to both the CPU and GPU tests.

Best regards.

Posted on 2017-10-22 15:12:15

Hey Rob. Do you mean the pagefile? If so, we do have that left on but I've never seen a performance drop with leaving it on. Using the pagefile is much slower than using system RAM, but that should only ever happen if you run out of system memory. There are still some weird bugs and stability issues that can come up if you disable the pagefile completely, so we tend to leave it on unless there is a significant reason to disable it.

We do our testing with everything in the OS set the same way the majority of users have their systems. That does mean that the results are probably a percent or two worse than they could be under the most ideal of situations, but we try to replicate the actual performance an end user would see rather than doing a bunch of optimizations that 99% of users don't do. However, we do repeat all our testing a minimum of three times to ensure that things like a random Windows update or drive index doesn't cause artificially low results.

Storage drive configuration is on our list of testing we want to do in the future, so rest assured that we will tackle that at some point. From all the other testing we have done, I suspect that having multiple drives might make a small to decent difference when working with RAW or lightly compressed file formats, but until we do the testing itself that is really just a guess.

Posted on 2017-10-23 18:24:59
Janis Lionel Huber

Testing different cinema DNG formats is an absolute must in my opinion; there are a lot of people using the Ursa Mini and BMD Pocket. So uncompressed cinema DNG, 3:1, 4:1 CDNG should be included in testing.

Posted on 2017-10-23 23:48:10

Hey Janis, I'll definitely add that to the list. I looked pretty quickly online and found some source files available for download from here: http://nofilmschool.com/201... . Is this exactly what you are talking about? If not, do you know of any source footage available for download or would you be willing to send us a couple ~15-20 second clips that we could use?

Posted on 2017-10-24 00:10:42
Rakesh Malik

It sounds like you tested the playback and export performance mostly using fairly basic nodes, and also using mezzanine codecs rather than raw, which leads to an incomplete story here.

Since Red raw is decompressed on the CPU and then de-Bayered on the GPU, that will have a significant affect on how an additional GPU gets used. A lot of colorists work directly with Red raw in order to have access to as much of the original data as possible, though a lot also end up working with DPX files, which are already de-Bayered but far larger than Redcode.

Since ArriRAW isn't compressed, it's huge... so it doesn't impose as much load for decompression, but the de-Bayer is still compute intensive (not as much admittedly as Redcode with high resolution footage), the catch being that it's more disk intensive due to its size.

The LGG color wheels are a relatively small part of the color grading process also; there are a lot of other operations involved that add considerably to the compute load like tracking power windows, keying, and a lot of the visual effects like the face refinement tool, and don't forget the often used stuff like Sapphire and BorisFX that add further to the compute load.

If you use 8K Red raw (which I can help you get, since I'm in your area and have an 8K Red camera), color grade with that, and test the playback + export using some of these tools, you might see some different results from what you got on this test.

That isn't to say that this test isn't valid, just that there are lots of other situations and variations that haven't been covered yet.

Posted on 2017-10-24 19:20:31

There is definitely a lot we still want to test and like we noted in the article, this is just the first of many articles to come. ArriRAW is on our list for round two, as well as many of the tools your listed (tracking, keying, etc.). Plugins we typically shy away from simply there is such a wide range of plugins out there, but that is something we may tackle a bit further in the future as well.

As for RED footage, we tested 4K 11:1, 6K 7:1, 6K 12:1, and 8K 9:1 in this article. However, one thing I do want to do next time is make a bit more of a clear division between high and minimal compression. I'm hoping to test something around 4:1 and 12:1 with 4K, 6K, and 8K footage. That should give us a pretty good idea of how different hardware performs with quite a wide range of RED footage without taking multiple days just to finish one round of testing.

Posted on 2017-10-24 19:51:08
Jay Smith

I was doing some tests today with 8k weapon footage in a UHD timeline, and I found that even if I had cached my red footage to a fairly light codec, once I turned on a node with a big qualifier (in this case 3d keyer) and an adjustment on that it would throw my 1080ti to 100% usage and drop playback to ~12fps. This would seem to be a case where a second gpu would allow me to get realtime playback. Might move another 1080ti into this station and see if it helps. I know dual gpus have a huge help in resolve for real world grading, just need to figure out the proper testing.

Posted on 2017-11-19 01:10:55
Arshjyot Singh

Hey can you please compare Intel CPUs with Titan Xp vs Vega 64 & let us know what difference it makes in premiere pro Live Playback & editing. There is a video on YouTube that is making huge claims that if you have Vega 64 w/ Threadripper, then Premiere Pro can easily handle RED 8k footage at full resolution with LUTs & lumetri. I know it sounds crazy. But you can check it for yourself. Search '8k video editing pc' on YouTube. First video you will see of Max Yuryev'. In that video he compares his TR system with Linus Tech Tips RED Rocket Card. And Max's system crushes the Top Intel build. Can you let us know what is happening behind the hardware. Is it CPU or GPU responsible for LIVE playback in premiere pro?

You guys tested & proved that 14 core Intel CPU is best for Premiere Pro when compared to all others (including threadripper). But we are eager to see Titan Xp vs Vega 64 vs 1080 Ti. And also if Possible, mix & match the CPUs & GPUs for even more better comparison.

This will give an answer to best combination for Premiere Pro.
-Threadripper+Vega 64
-Threadripper+Titan Xp
-Threadripper+1080 Ti
-Intel 14 Cores+Vega 64
-Intel 14 Cores+Titan Xp
-Intel 14 Cores+1080 Ti

Posted on 2017-10-29 19:17:26
Patrick Taylor

Excellent test. Thanks for taking the time.

You mention you did all these tests at 60fps. I'm a professional colorist for over 5 years now and I have yet to have a client ask me to deliver a project at anything but 30fps or 24.98fps. So, if these are the 2 dominant speeds I work at, I'm wondering if there is any benefit at all to having more than 1 Titan GPU?

Thanks again for all the excellent testing you folks do. Much appreciated.

Posted on 2017-11-06 06:58:35

Hey Patrick, we did our testing at 60 FPS mostly because that gives us a bit of extra headroom to compare the different hardware configurations. There are some people that do need 60 FPS performance, but I would say the majority of our customers only need 30 FPS max just like you. However, by testing up to 60 FPS we can see how much above 30 FPS we are able to achieve with different hardware. Hitting 30 FPS on the dot is OK, but I think it is extremely useful to know if you are right on the edge (where applying a few more effects would cause you to drop frames) or if you have plenty of headroom.

We're still working out the best way to present all our data since giving a percentage result is only moderately useful for real colorists (not just those interested in hardware benchmarks in general), but for now you can take a look at the raw results charts to get an idea of what FPS different configurations should be able to achieve. As for 1 vs 2 Titan XPs, I would say multiple is really only useful if you use something like TNR. There were also a few times with 6K and 8K footage that a second GPU bumped us up from ~20 to ~25 FPS, however, so a second GPU can be very helpful at times at those higher resolutions. If you work with 4K footage at <30FPS and don't use TNR or similar effects, I don't think a second GPU is going to do anything for you. Probably better to spend that money on a couple M.2/PCIe NVMe drives or more RAM.

Posted on 2017-11-09 03:58:14
Volta

https://nvidianews.nvidia.c...

https://www.nvidia.com/en-u...

http://www.nvidia.com/downl...

Can you please benchmark the new NVIDIA TITAN V?

Posted on 2017-12-08 14:17:23

We definitely will, and we have a couple cards on the way already. Not sure when they will actually ship and arrive, however, so I can't give an ETA on testing results.

Posted on 2017-12-08 17:13:53
Charles Unice

I just went through the tutorial (Advanced Color Grading Tutorial from SonduckFilm) that you guys have done for your "advanced" color grading, and its pretty basic color correction. I would not call that advanced. Id like to see a test where you do the "advanced" grading that you have done but also use some power widows and the face refinement tool, also track some of the power windows. On one shot ill have any where from at least 2-10 power windows. The face refinement tool has reduced this number but its not perfect. For example ill have tracking power windows removing blemishes, brightening eyes and doing background relights. When I render a 4k cinema DNG RAW file using your "advanced coloring" I'm rendering at 17 fps. cpu load at 90-94, gpu load 20-50. But when I do my own color correction with power windows I drop down to 8fps cpu load 43-53, and gpu 82-91. It seems to me that I could use a second GPU but Id like to know before dropig $800 on another GPU. Current system specs: i7 5960x oc to 4.5ghz, Evga 1080ti ftw3. I would like to know if adding another gpu would help increase my render times, or if I need to jump to a i9 7960x? Wish I had the cash to test this myself. Thanks you guys rock!!! I love your forum.

Posted on 2017-12-09 04:53:13

Yea, "Advanced" may not have been the best choice of test names. It is more advanced than the very light "Basic" test we did (which is what we were going for), but we just couldn't come up with a very clear naming scheme for the different levels of testing. Honestly, next time I might just change it to arbitrary "Level 1 grading", "Level 2 grading", etc. or something like that. Still not great, but then at least it doesn't have the same implications of calling something "Advanced".

That said, this was only our first round of testing in Resolve and there are a large number of changes we want to make for round two. I really want to include some tracking in our testing (which shouldn't be too difficult), and working with faces is another big one. The problem with faces is that most of the publicly available test footage we use doesn't include people so that wouldn't be a test we can do across all the different codecs/resolutions. Plus, I have on my to-do list adding a few more types of footage that people have recommended in the comments of our articles, and few of those have faces either which just compounds the problem. At some point, I think we're just going to have to hire someone to go rent a bunch of different cameras and get our own test footage that includes everything we want (15 second clips, colorful, slow moving background, face(s), etc) but I'm not sure realistically how close we are to doing that quite yet. So while power windows is something I hope to add soon, face refinement will probably have to wait a little bit.

For your hardware questions, that is a tough call. A second GPU should be the easiest and cheapest way to get better performance. Hard to say how much since we have not tested power windows yet, but I imagine you'll see a ~20% performance boost with a second GTX 1080Ti. Moving to a Core i7 7960X will probably net roughly similar performance gain for live playback, but more like ~50% performance increase when exporting. Of course, adding a second GTX 1080Ti is only around $800, while upgrading to a 7960X (with a new motherboard, CPU, RAM, etc.) is going to be more like $2500-3000 depending on the hardware you end up using. Again, those are all guesses on my part. I'm not sure when we'll be doing another full round of testing with the updated tests, but it likely won't be for at least several months.

Posted on 2017-12-11 20:18:06
Joe S.

Are these GPU combined via SLI or as "singles"?
I have an gtx 970 in my current build and a 560 ti lying around. Does this mean slotting in the 560 ti would help NR?
I gave up on this idea, because I always had problems using Resolve 12 with using the gtx 560 ti as a secondary card, it gave me, depending on driver, a 0x000005c error in event viewer and wouldn't start. I most of the time had to disable the 560 ti in computer management first.

Posted on 2017-12-13 22:30:37

You actually don't want to use SLI for Resolve or most other applications where the GPUs are being used for compute. It can cause weird issues (mostly stability, but also simple lowering of performance) so you want to have the GPUs setup as individual.

Posted on 2017-12-13 22:32:46
Joe S.

Thank you! Do you think it would help to have two different cards in it? I'm thinking of ordering my new build with a GTX 10XX, but I kind of don't want to throw away my GTX 970 if it still has a use?

Posted on 2017-12-14 13:26:02

You should be able to use both a GTX 1080 Ti and a GTX 970 to get better performance, but there might be some complications. First, you will be limited to the 4GB (I believe) of VRAM on the 970. If that isn't an issue for you right now, it might not be a problem but be aware of that if you start using higher res footage. Second, sometimes mixing different architectures can be buggy and not give the kind of performance you would expect. If the VRAM limit isn't a deal breaker for you, you could give it a shot and seeing what happens. Maybe don't do any work that is critical for a few days to make sure Resolve doesn't crash or anything though. And at the first sign of trouble, take the GTX 970 out.

Posted on 2017-12-14 19:06:10
Joe S.

Thx, good hint about VRAM, because actually it is already giving me trouble with "not enough VRAM" errors. I just kind of hoped, the vram would be added together...

Posted on 2017-12-14 19:21:52
Joe S.

But I'm also going to get a blackmagic decklink for my 4k grading monitor and then only have the GUI on 1080p, so that maybe relax my VRAM troubles a little bit

Posted on 2017-12-14 21:11:42

Probably not very much actually. Most of the VRAM usage in Resolve is because it is loading the actual footage into VRAM, not because it is displaying things on the screen. You might see a bit less VRAM usage, but if you are getting errors today then it probably isn't a good idea to push it. Like I noted before, however, you can always give it a shot and just see. Since you already have the GTX 970 it isn't like it will cost you anything.

Posted on 2017-12-14 21:58:56
Joe S.

True dat, I will get back with results once I tried it! But I may first buy a tangent ripple instead of the gtx 1080 ti

Posted on 2017-12-14 22:34:19
Erkan Özgür Yılmaz

Which version of DaVinci Resolve did you use? The "Free" or the "Studio" version?

Posted on 2018-02-16 11:17:53

This testing was with DaVinci Resolve Studio 14. The free edition only supports a single GPU.

Posted on 2018-02-16 17:32:33