Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1908
Article Thumbnail

Premiere Pro GPU Decoding for H.264/HEVC media - is it faster?

Written on October 20, 2020 by Matt Bach
Share:

Introduction

In the last 6 months, Adobe has made a number of terrific improvements to Premiere Pro for processing H.264 and HEVC (H.265) media. Although these are the most common codecs in use today, they can be extremely difficult to process due to the nature of intra-frame codecs. Normal playback can often be acceptable, but scrubbing, reverse playback, multicam, and a host of other tasks can be nearly unusable unless your computer has some very specific hardware.

In the latest update of Premiere Pro, Adobe has added GPU-accelerated hardware decoding for H.264/HEVC media (not to be confused with the GPU encoding we got back in April). In previous versions, hardware decoding was supported, but was only available if you had a CPU that included Intel Quick Sync. That implementation works very well, but Quick Sync is only available in Intel's consumer and mobile processor families. This meant that high-end Intel systems with X-series or Xeon processors - and all AMD systems - did not have an option for hardware decoding even though those systems are often significantly more powerful overall.

Quick Sync decoding is still available if your system supports it, but with the addition of GPU decoding using AMD and NVIDIA video cards, nearly everyone will have access to technology that will greatly improve performance when working with H.264 and HEVC media.

Premiere Pro workstations

Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.

Configure a System!

Labs Consultation Service

Our Labs team is available to provide in-depth hardware recommendations based on your workflow.

Find Out More!

Test Setup

Listed below are the specifications of the systems we will be using for our testing:

Intel Z490 Test Platform
CPU Intel Core i9 10900K
Video Card NVIDIA GeForce RTX 3080 10GB
CPU Cooler Noctua NH-U12S
Motherboard Gigabyte Z490 Vision D
RAM 4x DDR4-2933 16GB (64GB total)
Hard Drive Samsung 960 Pro 1TB
Software Windows 10 Pro 64-bit (Ver. 2004)
Premiere Pro (Ver. 14.3 & 14.5)
AMD X570 Test Platform
CPU AMD Ryzen 9 3900X
Video Card NVIDIA GeForce RTX 3080 10GB
CPU Cooler Noctua NH-U12S
Motherboard Gigabyte X570 AORUS ULTRA
RAM 4x DDR4-2933 16GB (64GB total)
Hard Drive Samsung 960 Pro 1TB
Software Windows 10 Pro 64-bit (Ver. 2004)
Premiere Pro (Ver. 14.3 & 14.5)

To see how well GPU-based hardware decoding works, we will be using two different test systems - one that supports Intel Quick Sync, and one that does not. This will let us see how well the new GPU decoding works, and how it compares to hardware decoding with Intel Quick Sync.

For the testing itself, we will look at 4K H.264 and HEVC media from a GoPro 7, iPhone 11, and DJI Mavic 2 drone. These are devices that are often the source of media that gives poor performance in editing programs and should give us a great look at how well GPU decoding works for scrubbing, J/K/L playback (forward and reverse), and multicam sequences. If you want to try out this footage yourself, our test clips are all available for download:

One big difference between this testing and our usual is that we do not have any charts or graphs. Many of the tasks we are testing are not easy to measure in terms of pure FPS, so what we decided to do instead was screen record each set of tests, and compile them into a series of videos showing the performance of each decoding mode side-by-side.

If you want a summary of our results, we recommend watching our Premiere Pro H.264/HEVC GPU Decoding Performance video embedded at the top of this post.

GoPro 7 - 4K 60mbps H.264/HEVC media

Intel Core i9 10900K
Software, Quick Sync, & GPU Decoding

AMD Ryzen 3900X
Software & GPU Decoding

For the GoPro Hero7, we decided to look at a number of 4K 60mbps test clips recorded in both H.264 and HEVC. In the videos above, we are showing the performance between software and GPU decoding, as well as Quick Sync decoding on the Intel system. The HEVC footage is shown first, followed by the H.264 footage.

Starting with scrubbing performance, the advantage of hardware decoding (whether it is GPU or Quick Sync) is immediately apparent. Where the source window is very choppy with software decoding, it is perfectly smooth with hardware decoding even when scrubbing at a fairly fast pace.

If you are a fan of using J/K/L (forward, pause, reverse) for editing, GPU decoding also has some great benefits. With the CPUs we are using, forward playback at up to 2x speed can be just fine with software decoding, but if you want to go any faster, you really need hardware decoding in order to do so. And if you want to be able to play this type of footage in reverse... hardware decoding is a must if you want your timeline to play smoothly.

Our final test was to look at multicam performance with 4, 9, and 12 streams. Here, the main benefits of GPU decoding come into play once you have 9+ streams which is when software decoding really starts to struggle on these systems. Interestingly enough, which type of hardware decoding (GPU or Quick Sync) is better depending on whether we were using H.264 or HEVC media. GPU decoding is a bit better with HEVC media, and Quick Sync is better with H.264.

iPhone 11 - 4K VFR H.264/HEVC media

Intel Core i9 10900K
Software, Quick Sync, & GPU Decoding

AMD Ryzen 3900X
Software & GPU Decoding

Using phones to create video content is becoming more and more popular as the cameras improve, but there are a number of issues when using something like an iPhone 11. The first issue is that they record in H.264 or HEVC, but on top of that, they usually also record in variable frame rate (VFR). This makes it even harder for applications like Premiere Pro to process, so we were very interested to see what performance benefits there are to using the GPU to decode footage from an iPhone.

Once again starting with scrubbing, software decoding does better here than it did with the GoPro media, but it is still significantly worse than with GPU or Quick Sync decoding.

For J/K/L (forward, pause, reverse) editing, 2x forward playback is great with all the decoding options, but when we went up to 4x even hardware decoding struggles somewhat with this VFR media. With the HEVC media, GPU decoding does fairly well and is even better than Quick Sync decoding - although it does still drop a few frames. For H.264, however, neither GPU nor Quick Sync decoding is able to play the footage at what we would call an acceptable frame rate.

Playing in reverse, software decoding is once again unusable. Hardware decoding, on the other hand, can play at full FPS all the way up to 2x speed, although GPU decoding does drop a few more frames with the H.264 clips compared to Quick Sync.

Multicam performance with 4, 9, and 12 streams is very interesting. Hardware decoding is significantly better than software when you have 9 or more streams, but between the two hardware decoding methods, Quick Sync was a bit better with H.264 while GPU decoding was better with the HEVC clips.

DJI Mavic 2 - 4K 100mbps H.264/HEVC media

Intel Core i9 10900K
Software, Quick Sync, & GPU Decoding

AMD Ryzen 3900X
Software & GPU Decoding

Drone footage is notorious for how difficult it can be to process, yet as the price of high-quality drones continues to fall, it is more and more common to use in projects.

For scrubbing, the performance from GPU decoding is what we have come to expect and it is vastly smoother than software decoding.

J/K/L (forward, pause, reverse) editing is also very similar to the other tests, with hardware decoding being much smoother when we get up to 4x speed. Reverse playback is once again unusable with software decoding, but more than acceptable at up to 2x speed in reverse.

For multicam performance, we again see a bit better performance from GPU decoding with HEVC media, while Quick Sync is slightly better for H.264. Both drop far fewer frames than software decoding as we get up to 9 or 12 streams, but this does continue the trend of Quick Sync having a slight edge for H.264 and GPU decoding for HEVC.

How well does H.264/HEVC GPU Decoding work in Premiere Pro?

Overall, the new GPU-based hardware decoding for H.264/H.265 (HEVC) media in Premiere Pro works extremely well. Scrubbing is massively smoother, and the fact that you can actually play these codecs in reverse without it turning into a slideshow is terrific to see. Not to mention being able to work with more multicam streams than what is possible with software decoding.

As a part of our testing, one of the things we wanted to look at in particular was how GPU decoding compared to the Intel Quick Sync decoding that has been a part of Premiere Pro for a long time. While Quick Sync did appear to have a slight edge with H.264 media in some situations, GPU decoding had a similar edge for HEVC footage. In the end, our conclusion is that GPU and Quick Sync decoding are pretty much on par with each other.

Now, you might look at this and think "if it is no better than Quick Sync, why bother?" The reason why this is such a big deal is that Quick Sync is only available on certain Intel CPUs (primarily those with integrated graphics). If you wanted an Intel X-series or Xeon CPU, not to mention one of the terrific Ryzen and Threadripper processors from AMD, you pretty much had to accept the fact that you would be getting significantly worse performance when working with the most common codecs in use today.

With the addition of GPU decoding, you no longer are limited to the handful of mid-range Intel processors that include Quick Sync. You can get that AMD Threadripper CPU to improve performance when working with R3D footage, and still get the benefits of hardware decoding for H.264 and HEVC media.

If you want to try out this new feature, it is available in the latest version of Premiere Pro which you can update to in Creative Cloud. Feel free to download the test clips we used that are linked in the Test Setup section, and let us know what you think in the comments! How much faster was it for you, and how is it going to affect your workflow?

Premiere Pro Workstations

Puget Systems offers a range of poweful and reliable systems that are tailor-made for your unique workflow.

Configure a System!

Labs Consultation Service

Our Labs team is available to provide in-depth hardware recommendations based on your workflow.

Find Out More!
Tags: Premiere Pro, H.264, HEVC, Hardware Acceleration, H.265, hardware decoding
Ampere

https://www.nvidia.com/en-u...

Nvidia released new Studio driver 456.71, it's not really new because GeForce driver was released earlier
this month but for whatever reason Studio driver was not released until now.

Posted on 2020-10-20 18:39:33
sp4rk

Oh great! I'm dealing with a lot of h.246 (GoPro) footage for a living, and have been really feeling the benefits of QuickSync. NVENC was a great addition recently, and was a huge timesaver too! But this means i'm free to look at AMD platforms too!

Posted on 2020-10-20 21:14:33

I am very pleased with how this is working. I just upgraded my GPU from an RX480 to a 2060 Super. That initially took my PugetBench score from 279 to 355. Then going from Premiere 14.4 to 14.5 took it to 460! Quite a big jump this week.

I think Intel must have improved QuickSync quite a bit over the last few years because for me the decoding from the 2060S is much better than the iGPU on my i7-7700K.

Will you be doing comparison tests of the decoding with different cards like you did for the encoding? Keen to see how the 2060S does against the 3080.

From watching Task Manager it looks like test 5 in the benchmark tests e.g. "4K H.264 150Mbps 8bit (59.94FPS) Lumetri Color - Export (H.264 40Mbps UHD)" is using hardware H.264 encoding - is that right? My FPS went 28 (RX480)>57 (2060S on 14.4)>67 (2060S on 14.5) which would imply the export has quickened by using the hardware decoding for me - which is what I had hoped for. And my H.264 -> ProRes has sped up from 27 to 35 FPS too which is great.

All in all this release is a big improvment for me.

Posted on 2020-10-21 13:46:53

Hey Nicholas, We should be re-doing our GPU testing when AMD launches their next GPUs, which should be somewhat soon. I don't think there will be much difference between similar cards, but I do expect NVIDIA to be quite a bit better than AMD since that is what we saw in the GPU encoding testing: https://www.pugetsystems.co...

For our benchmark, it is set to use hardware acceleration whenever it is available. That was a bit of a tough decision when hardware encoding was only available via Quick Sync (since hardware encoding is slightly lower quality than software), but now that pretty much everyone can utilize hardware encoding (and decoding), I think it just makes sense to do it that way.

Posted on 2020-10-21 16:20:38

Thanks Matt, yes that makes sense.

Posted on 2020-10-21 17:56:49

I'm also looking forward to your testing of the Ryzen 5000 series when that comes out! I really like proper testing like you do.

Posted on 2020-10-21 17:58:29
dr0

Can you guys also add a few lower end cards (like GTX1650 Supre) to your testing? Maybe even include Intel and AMD iGPUs for reference?

Posted on 2020-10-23 18:23:22

Unfortunately, we are usually limited to testing things that apply directly to our workstation sales. We are a bit unique in that we do not make any money directly from our articles - no ad revenue, etc. Our entire Labs department is funded by our workstation sales, so we (usually) have to focus testing things that apply directly to our customers. We can sometimes go outside that, and every once in a while get funding from different sources so that we can do special testing, but by and large, that is what we have to focus testing on.

Posted on 2020-10-23 19:14:56
R.J. Leong

For me, my two systems with discrete Turing GPUs showed improvements over software-only decoding, especially on the H.264 4k 59.94 timeline. However, the last two or three released versions of the Intel GPU drivers broke QuickSync support with a discrete GPU installed. My results with 14.4 on my i7-7700 mini ITX system bore that out: The timeline playback behaved as if it used software-only decoding even though the integrated Intel GPU was definitely enabled.

Your score jumped from 355 to 460 because the RTX 2060 Super is significantly more powerful than that i7-7700K CPU. My systems, on the other hand, exhibited a much smaller increase in the PugetBench score: My i7-7700 with the GTX 1650 Super only went from 406 to 421, while my Ryzen 7 3800X main system with the RTX 2060 Super went from around 720 to 741 (all scores were with the Standard preset).

Posted on 2020-10-22 03:16:35

That makes a lot of sense - I did think my PC was underperforming and I had previously done a test with Intel iGPU disabled in device manager vs not and it performed the same, so thanks for clearing that up.

Posted on 2020-10-22 06:15:10
SMD79

Is this only on newer GPU's or will this work on an older GPU like the GTX 970 that is at the bottom of their recommended list...but it's still there! :)

Posted on 2020-10-23 17:47:23

I haven't tried anything that old, but a GTX 1070 did work for me. I think a 970 should work, but that might be worth asking Adobe directly.

Posted on 2020-10-23 18:04:40
SMD79

Thanks. Currently testing out my new A7sIII with all kinds of XAVCS/HS/I settings and seeing how PPro handles it. The timing of this update is great. Figured I'd run some tests before and after I update PPro using this footage. I'll be upgrading PC parts soon but milking this current system for all it's worth.

Posted on 2020-10-23 18:07:30

Yea, one of the cool things with this update is now that we have GPU decoding and encoding, it is much easier to upgrade older systems. A new CPU usually means a new mobo, RAM, and even PSU sometimes. Whereas now, you can just drop in a new GPU and see big performance gains. At a certain point you still need to upgrade the CPU, but that lets you upgrade the GPU first which is pretty easy, and upgrade the rest of the system later (if you don't want to do it all at once of course).

Posted on 2020-10-23 18:10:37
SMD79

Very true. I'm curious if this helps playback of 60p footage on a 24p timeline too. I'd imagine so. I guess I'll find out shortly. Uploading footage to test now.

Posted on 2020-10-23 18:15:08
1meter60

Thanks for this test. I think what's missing here is the look at the situation for 4k material with 10-bit 4:2:2 colorsampling in h.265, which you find in more and more cameras. And all NVIDIA Geforce GPU's - even the new ones - do not support hardware accelaration of 4:2:2 material. They support 10-bit 4:2:0 (like the mavic pro 2 has), or even 4:4:4 but not 4:2:2. And also Quick-Sync only support h.265 4:2:2 10-bit in it's latest mobile processors (Ice & Tiger Lake) but in the desktop processors only with the Rocket Lake generation which will come out next year. So you have to keep that in mind, when planning a new Editing PC. What I do not know: what about the professional NVIDIA GPUs and in generell AMD GPUs. Maybe you have other informations about that.

Posted on 2020-10-24 12:33:04
Asaf Blasberg

Matt,
I currently have an X299 workstation with the old GTX 1080 Ti and old I7-7820X processor. I am now using the latest Studio driver and hardware decoding and encoding turned on for NVIDIA. I use tons and tons of H.264 60fps footage which all play very smoothly on a 3-year-old GPU. However, when I try to go multi-cam mode with three cameras, the results are terrible. I only get like 3 fps in half-res. Is it possible for you to do testing of H.264 60fps footage - this will help me figure out if it makes sense for me to swap out the old GPU with a 3080 RTX card.

Posted on 2020-10-27 20:44:06
Asaf Blasberg

Another question: on my old x299 workstation specs, if i were to keep the same processor (7820x) and just swap out the 1080 ti for a 3080, do you think it would improve the performance?

Posted on 2020-10-27 21:22:00

H.264 at 60FPS and multicam is going to be really, really hard to do. A completely new system with the upcoming Ryzen CPUs and an RTX 3080 might be able to handle it at half res, but it is going to be cutting it pretty close. Swapping the GPU to a 3080 alone will certainly help, but I bet you will just become immediately CPU bottlenecked if you aren't already.

In our hardware articles, one of the tests is actually a 59.94FPS H.264 multicam test with 4 streams. You might want to look in the raw results to see the kind of performance we are seeing with different hardware combinations. Our most recent CPU and GPU articles are https://www.pugetsystems.co... and https://www.pugetsystems.co... - although those are both before GPU decoding was in play. You can also look through the raw results for the public benchmark results here: https://www.pugetsystems.co... to see what we and others are seeing. The "4K H.264 150Mbps 8bit (59.94FPS) MultiCam - Live Playback (Half Res)" results are what you want to look at.

I wish hardware could solve problems like this, but you might need to switch to a proxy workflow if you need 60FPS. The good news is that with the GPU accelerated encoding/decoding they added, making lower res/bitrate H.264 proxies should be much faster than ever before, but it will still take a bit of time.

Posted on 2020-10-27 21:31:36
Asaf Blasberg

What if I were to replace the CPU with an Intel Core i9 10920X 12 Core, and leave the 1080 Ti inside my machine? According to your results, it scored a 59.67 frames per second using multicam h.264 59.94 footage according to your tests:

https://www.pugetsystems.co... (scroll all the way to the right):

The results are:

59.94 fps for single cam
2.53 fps for multi cam (full res)
59.77 fps for single cam (half res)
59.67 fps for multi cam (full res) <-- that's what i'm looking for

(and GPU acceleration would be OFF)

Posted on 2020-10-27 22:43:31

Whenever you look at CPU load, make sure you are looking at the load per individual core, not the overall load. Usually, there is one or two cores that are pegged at 100%, so even if there are a bunch of low load or idle cores, you are still CPU bottlenecked.

That said, since you are 100% GPU load, it is pretty likely that just a GPU upgrade will definitely improve performance. I'm not sure if it will make as big of an impact as you are hoping, but it will help. The nice thing is that you can go ahead and do the GPU upgrade now, and if it doesn't do as much as you hoped, you can continue to save up for a new CPU/motherboard/RAM, and when you can afford that, just move the new GPU over as well.

Posted on 2020-10-27 23:09:44
Asaf Blasberg

Thanks Matt so much - i really appreciate it. I just saw that the iMac Pro (2017) model scored a PERFECT 100 in live playback, including H.264 multicam in FULL res! Wow!!! What’s the catch? How did Apple do this? Should I just go and get that model? It’s expensive but no proxies ever! :)

Posted on 2020-10-28 01:12:50

I suspect that is a bad result for the full res MultiCam test. The 2019 Mac Pro only played that test at 7FPS, and there is no way the older iMac Pro is faster.

That is one of the difficulties with benchmarking applications like Premiere Pro that are not designed for this kind of testing. Probably what happened was that the preview resolution was stuck on half or quarter res for some reason, and we didn't catch that before publishing the article. There are literally thousands of individual results, and sometimes things like that slip past us.

This is a result from the public that was uploaded (and with a newer version of Premiere Pro to boot), and it only gave 10FPS for the multicam tests: https://www.pugetsystems.co...

Posted on 2020-10-28 01:22:06
Asaf Blasberg

Thank you so much. Is it possible for you to now test H.264 150mbps 59.94 footage with GPU acceleration for decoding turn ON using version 14.5? This article uses 29.97 footage. I know there was a previous article where you tested 59.94 footage, but the problem was that you didn't have GPU acceleration ON due to some weirdness with Threadripper and a beta version of Premiere (from what I read in the article). Thanks! :)

Posted on 2020-10-28 20:56:24

Yea, we're still seeing some oddities in certain situations. The things we mostly talked about in this post/video are definitely way better with the GPU decoding vs software, but for whatever reason, multicam with the 150mbps 59.94FPS media is worse with it enabled. I'm not sure if that is a bug, or if it is just too much for a GPU to handle and it turns the GPU into a bottleneck. That is a LOT of H.264 decoding to put onto a GPU, so it might just be too much for it with the current technology.

Posted on 2020-10-28 21:13:35
Asaf Blasberg

No worries. Quick question - how is performance with 59.94 FPS multicam using DaVinci Resolve Studio? I've started to use this program and I noticed that using the *same* media, scrubbing is much better, with GPU on for both programs. The test footage I used was 100mbps 29.97FPS H.264 media. And finally, I can't find a GTX 3080 anywhere. Where did you purchase your Gigabyte model?

Posted on 2020-10-29 02:24:31

We actually don't do a lot of software comparison testing. Usually, there are multiple reasons why someone uses one app over another, so we focus on just getting people the best performance for what their current workflow is.

As for supply, I honestly have no idea. Supply chains are different for consumers and system integrators like us, and I know we work directly with a number of different distributors. It is definitely tight for us (just like everyone else), but we at least are able to get our hands on enough cards to keep our customer order moving through at a decent pace. Hopefully a lot of the supply problems will clear up soon!

Posted on 2020-10-29 05:21:11