DaVinci Resolve 14 GPU Scaling Analysis (1-4x Titan Xp)Written on October 13, 2017 by Matt Bach
For a number of years, we have focused our video editing testing around Premiere Pro and After Effects, but we have received a large number of requests to include DaVinci Resolve as well. Resolve has actually been on our list for a while now, but since Resolve 14 was supposed to make fundamental changes to how the software runs we decided to hold off our testing until it was available. The launch of Resolve 14 dragged on longer than we expected, but it finally came out of beta about a month ago and we were able to begin our testing.
Today, we have published our first two articles for DaVinci Resolve. This article will be looking at GPU scaling performance with 1-4 NVIDIA Titan Xp video cards while our second article will be looking at how performance changes with different models of CPUs. There is quite a bit more we want to test, but this is an excellent start and will allow DaVinci Resolve users to give us feedback on our test process that we can leverage in future articles. Plus, we wanted to get these articles up before we left for the NAB Show in New York next week (Oct 17-18, 2017). Come visit us at booth N570 in the "Post + Production Pavilion" of Hall 3B if you happen to be attending!
There is a lot we could (and want to) look at in DaVinci Resolve, but today we will be focusing on performance while exporting and more importantly FPS while grading in the color panel. Our testing includes 4K RED, H.264, ProRes 422, and DNxHR HQ footage as well as 6K and 8K RED footage. If you would rather skip over our analysis of the individual benchmarks, feel free to jump right to the conclusion section.
While GPU scaling is our primary concern in this article, we will actually be testing with two platforms and four different CPUs. The CPUs we chose should give us a great range of processing power to see if anything changes with a more or less powerful CPU. One thing to note is that while we are going to be testing with up to four GPUs on the Threadripper system, we are only using up to three GPUs on the Skylake-X systems. At the time of our testing, there were no quad GPU X299 motherboards so three GPUs was the maximum we could test. This has very recently changed with a number of quad GPU capable X299 boards coming out and we will likely revisit quad GPU on an Intel platform in the future, but for now this is a limitation we have to live with.
|Skylake-X (X299) & Threadripper (X399) Test Platforms|
|Motherboard:||Gigabyte X299 AORUS Gaming 7
|Gigabyte X399 AORUS Gaming 7
|CPU:||AMD Threadripper 1950X 3.4GHz
(4.0GHz Turbo) 16 Core
|RAM:||8x DDR4-2666 16GB
|8x DDR4-2666 16GB
|GPU:||1-4x NVIDIA Titan Xp 12GB|
|Storage Drive:||Samsung 960 Pro 1TB M.2 PCI-E x4 NVMe SSD|
|OS:||Windows 10 Pro 64-bit|
|Software:||DaVinci Resolve Studio 14|
Before getting into our testing, we want to point out that while our test platforms are using a single storage drive, that is not actually what we would typically recommend to our customers. Normally we would recommend having a SSD/NVMe for your OS and programs and a dedicated drive or multiple drives for your media. However, this is in large part for capacity and organizational purposes and should not have any impact on our results so we opted to use a single drive to cut down on the number of testing variables.
Most of the media we will be using is available from the Sample R3D files. However, there were two special things we did. First was to transcode the 4K RED footage to a number of other popular codecs we wanted to test. The second was that we increased the project FPS to 60 for our color grading tests regardless of the original media FPS. Since many of our CPU/GPU combinations were able to easily play back our test footage at ~24 FPS, this was a relatively easy way to increase the spread and accuracy of our results.
To test exporting we used a moderately complex timeline involving multiple clips, basic color grading, multicam footage, and various effects like crop, gaussian blur and cross dissolves. We do not yet have a video showing exactly our test timeline, but it is nearly identical to what we use in our Premiere Pro testing only without the logo and text overlay.
For our color grading tests, we applied three levels of correction which we called Basic, Advanced, and Torture:
- Single node adjustments
- 18 serial & parallel nodes
- Roughly follows the Advanced Color Grading Tutorial from SonduckFilm
- All nodes from "Advanced"
- Adds a single node of temporal noise reduction
Exported stills from our 4K testing are available for download if you wish to examine them in more detail.
Performance while exporting is likely a secondary consideration for many users, but it is still an important factor to keep in mind. Color grading may be what Resolve is known for, but you still have to get that graded footage delivered after you are done working on it.
Our testing was fairly extensive with over 300 data points, so while the actual results in seconds are available if you click on the "Show Raw Results" link, we also created a number of charts breaking down the average performance gain or loss we saw as the number of video cards was increased. 100% is the performance with a single GPU so a result of 103% would be a 3% gain in performance.
Exporting is something that should be more heavily impacted by the CPU than the GPU, but it turns out that this is even more true than we expected. With the Intel CPUs, the overall net performance gain with multiple GPUs was essentially nothing. With the AMD Threadripper 1950X, however, there was actually a very consistent drop in performance with more GPU when exporting to DNxHR HQ. At 4K it was only a 5% drop in performance with four GPUs versus one, but with our 8K RED footage we saw a significant 14% drop in performance.
Overall, it is pretty safe to say that having more GPUs doesn't help when exporting - at least with the kind of timeline we used. This is jumping ahead a bit into the next section, but one thing we will note is that our timeline does not include any temporal noise reduction (TNR) which is where we saw the biggest performance gains with multiple GPUs. This is likely something we will add in our future testing, but given that Resolve is making huge improvements to their edit interface we also want to continue testing with this type of timeline as well.
Color grading is really where we wanted to focus our efforts so we used three different levels of grading on our test footage. With each footage type and grading level, we watched the FPS counter and recorded the lowest result after giving about a 15 second "grace period" since the counter can sometimes take a bit to settle down. This test was repeated three times with Resolve being restarted in between tests. Across the three runs, we logged the highest of the minimum FPS results and used that in our results below.
Just like in the previous section, our results are formatted based on the performance gain we saw with multiple cards compared to just one. If you want to see the raw results in seconds, just click on the "Show Raw Results" link.
Considering how often the claim is made that more GPUs is always better for DaVinci Resolve, we were a bit surprised at our results. In our "basic" and "advanced" color grading tests we saw the occasional benefit with two GPUs, but just as often we actually saw a drop in performance. In fact, the only time we saw a significant and consistent performance gain was when we added temporal noise reduction in our "torture" testing. For that test, we saw an average 25-35% performance gain with two GPUs but only a further ~10% with three GPUs. On the one test platform where we could use four GPUs (AMD Threadripper 1950X), we actually saw a performance drop with four GPUs over three.
To be frank, our testing did not go the way we expected it to. The lack of performance gain with multiple GPUs when exporting is understandable as encoding media does not utilize the GPU much, but we expected more from our color grading tests. In fact, it was only when we added temporal noise reduction that we saw a significant benefit from even two GPUs:
From what we've tested so far, if most of your work is done with the color wheels and curves and you don't use effects like TNR, multiple video cards isn't going to impact performance much. That said, with temporal noise reduction the performance gain was very respectable - we saw an average 25-35% performance gain with two GPUs and a further ~10% with three GPUs. This is not quite as much as you may expect given Resolve's reputation of benefiting greatly from more GPU power, but still relatively good. The performance drop with four GPUs on the AMD Threadripper 1950X system is odd, but we are curious to test four GPUs on an Intel platform in the future to see if this is simply an issue with that platform.
Like we said, these results were a bit of a surprise to us so we are very interested in any feedback you may have. These results may simply be a result of Blackmagic's software optimizations in Resolve 14, or perhaps we are not testing intensive enough work loads. We could certainly force Resolve to make greater use of multiple cards by slapping on a bunch of TNR nodes, but we strive to keep our testing relevant to real-world users. Unless there are people out there that actually use a couple dozen TNR nodes when grading, this would be counter productive to our goal of determining the real-world benefit of different hardware configurations.
This is just the first of many articles looking at performance in DaVinci Resolve and there is still a lot we want to test including exporting all the color graded clips in addition to our test timeline, adding motion tracking, testing the performance impact of using a Decklink display out card, and much more. If you think there is something else we should include, we encourage you to let us know in the comments. Especially if you are able to send us an example project, we have found that active communication with the community to be incredibly valuable as it helps us shape our testing to match exactly the type of work that real users do every day.