Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1676
Article Thumbnail

AMD Threadripper 3990X: Does Windows 10 for Workstations speed up photogrammetry?

Written on February 19, 2020 by William George
Share:

Introduction

When AMD launched the 64-core Threadripper 3990X, Anandtech reported that performance of this 128-thread beast was hindered by running a normal version of Windows 10 Pro - and that using Windows 10 Pro for Workstations or Windows 10 Enterprise instead gave better results. My initial round of testing with this processor was done on Windows 10 Pro, but we also offer the Windows 10 Pro for Workstations edition as an option on PCs we build here at Puget Systems... although we had never used that in our Labs testing.

My early testing of the 3990X had focused on two areas: CPU-based rendering, where it did very well, and photogrammetry. In the latter type of workload, having so many cores didn't turn out to be a benefit: models in the same processor family with fewer cores performed as well or better, and cost far less. But maybe that was due to limitations of Windows 10 Pro, rather than AMD's TR 3990X chip itself? Given the claims made by Anandtech in their article, I had to investigate further to make sure we weren't leading our customers astray.

AMD Threadripper 3990X Tested on Windows 10 Pro for Workstations in Photogrammetry

In addition to looking at Windows 10 Pro for Workstations as a potential way to improve performance, I also wanted to check on the impact of SMT (simultaneous multithreading... AMD's equivalent to Hyper-Threading) top see what turning it off might do to rendering. Some applications are known to benefit from turning SMT / HT off, which for example I had observed to be the case in Agisoft Metashape when using processors with more than about 10-16 cores. Given that past experience, I skipped Metashape for this round of testing (knowing that it didn't like high core count processors already), but I thought it would be a good idea to look into the same question for other photogrammetry applications - especially since it was specifically raised as another issue in the Anandtech article.

As a side note, I also took this opportunity to bump up the CPU cooler from Noctua's U12S to the U14S. Our product qualification team had found that the U12S was borderline for cooling the 3990X, and in some situations could lead to very slight thermal throttling. For this reason, some of the normal Windows 10 Pro results presented here may be slightly different than they were in our older articles.

Test Hardware

Here is a list of the hardware components that made up my test configuration, along with the OS and application versions which I used:

Test Platform
CPU AMD Threadripper 3990X
CPU Cooler Noctua NH-U14S TR4-SP3
Motherboard Gigabyte TRX40 AORUS Pro WiFi
RAM 4x DDR4-2933 16GB (64GB total)
Video Card NVIDIA GeForce RTX 2080 Ti 11GB
Hard Drive Samsung 970 Pro 1TB
Software Windows 10 Pro 64-bit
Windows 10 Pro for Workstations 64-bit
Pix4D 4.4.12
RealityCapture 1.0.3.10393

Benchmark Details

For testing photogrammetry applications, we have four image sets that we own the rights to - covering both smaller and larger size model and map projects. All of these image sets are available in our public Pix4D and RealityCapture benchmarks, split up to allow quick or extended tests, which you can download and run if you want to compare your system's performance to what we measured here.

  • Rock Model - A small model project taken with a smartphone camera at 20 megapixels
  • School Map - A small drone mapping project using photos that are 18 megapixels each
  • School Model - A medium size model using images that are each 18 megapixels
  • Park Map - A large drone-captured map project with photos that are 18 megapixels each

Benchmark Results

Each of the four benchmarks was run on Windows 10 Pro and Windows 10 Pro for Workstations - both with SMT enabled (the default setting) and disabled. That results in either 128 or 64 threads being available to the operating system, and the claim which needed verifying was that the normal version of Windows 10 Pro would not handle >64 threads as well as the Pro for Workstations variant.

To help keep things easy to track, I have separated the two versions of Windows 10 by color: Windows 10 Pro is shown in red, while Windows 10 Pro for Workstations is in blue. I also shaded the bars showing SMT enabled darker and SMT disabled lighter. The order of the four different combinations is kept the same throughout the charts, to avoid any confusion that could arise from sorting the results.

Here is a gallery with charts showing Pix4D benchmark performance, with lower times indicating faster overall processing:

And here is a similar gallery, using the same order and color scheme, with RealityCapture benchmark results:

Analysis

Looking first at the performance difference between operating systems with SMT enabled - Windows 10 Pro with darker red line and Windows 10 Pro for Workstations in dark blue - it looks like there is no substantial benefit either way. Across all eight tests, only one time did the performance variance between the two OSes exceed 1%: the RealityCapture School Model test, where Windows 10 Pro for Workstations was ~2.5% faster. Even that result is still within the margin of error I've observed on these photogrammetry benchmarks, though, and since the other seven tests all came out even closer I don't think there is evidence of one version of Windows being better.

Regardless of the OS version, though, turning off SMT (the lighter colored bars) resulted in a noticeable reduction in processing time! Its not huge, but disabling SMT on the 3990X reliably increased performance in both Pix4D and RealityCapture by around 5 to 10%.

Does the version of Windows 10 Pro impact photogrammetry with the 3990X?

My findings in this round of testing contradicts what Anandtech posted in their launch-day article: as far as I can tell, there is no difference between using Windows 10 Pro or Windows 10 Pro for Workstations with AMD's TR 3990X for photogrammetry.

Does turning SMT on vs off with AMD's TR 3990X affect photogrammetry?

Surprisingly, yes: turning SMT off on the Threadripper 3990X improved processing speeds in both Pix4D and RealityCapture. Given our past experience with Metashape, I am confident that it would help there as well. However, please note that the 3990X is not the most cost-effective processor for these programs: the lower core count Threadrippers and the Ryzen 9 3950X offer better value.

Closing Thoughts

Between this data, and our other articles looking at the same situation in Adobe CC programs and CPU-based rendering, it seems that something else must have been throwing off the performance measurements that Anandtech took. Other review websites, and indeed AMD themselves, have also come out with similar conclusions. This is a great example of why testing by multiple sources is important!

Looking for a Photogrammetry Workstation?

Puget Systems offers a range of workstations that are tailor-made for your unique workflow. Our goal is to provide the most effective and reliable system possible so you can concentrate on your work and not worry about your computer.

Configure a System!

Tags: Threadripper 3990X, AMD, AMD Threadripper 3rd Gen, Windows 10, Microsoft, CPU, SMT, Multi-threading, Processor, photogrammetry, Pix4D, RealityCapture
Andrzej Poznanski

Thanks for the wealth of information!
Any idea why longest benchmark took 141sec longer this time around, despite using better cooling?
As for squeezing top bang for the buck from this beast of CPU, it would be interesting to see results of running two instances of RC in parallel (on dual GPU system, each GPU paired with different instance).

Posted on 2020-02-22 10:45:09

It took me a moment to track down which results you are referring to, but it looks like you mean those for RealityCapture - correct? I don't have a good answer for the 4207.7 this time vs 4066.6 in the previous article... other that to say that photogrammetry, by its very nature, leads to somewhat greater variance in benchmarking results than I would personally like :)

I don't know how it all works under-the-hood, but I have always found that feeding the same photos and the same parameters into these applications never results in the same processing time or the same final product. I used to see swings as wide as 10% in performance from one run to the next, in my early benchmarks where the whole process was run start to finish, but it has been a lot better since moving to fixed project files for an equal starting point on each processing step. Even now, though, I still see 2-5% variance pretty routinely. Running the tests 2 or 3 times and taking the lowest result helps a lot, but even then sometimes there can be differences like what you have noted here :)

Posted on 2020-02-24 18:20:57
Andrzej Poznanski

I think I found the culprit - if you look at the Align Images stage for largest dataset, you'll notice it went from above 300sec (3960X and 3970X) down to 17 sec for 3990X - cache files not cleared perhaps?

Posted on 2020-02-24 22:28:34

Oh wow, yeah... good catch! I'm glad I posted that chart, and now wish I'd gone over it more closely. I always check for 0s (since I've had those crop in a couple of times) but not for just unbelievably low results. Now that you mention it, though, I also see a super-low score on a different part of the Park Map for the Core i9 10900X. Bummer, I'll go back and make a note of those things in the original article :/

Posted on 2020-02-24 22:35:41

Also, whatever the cause, that means that the time for the 3990X in that image set with the smaller U12S cooler should have been ~300 seconds longer, thus resulting in the bigger U14S cooler giving better performance as expected.

Posted on 2020-02-24 23:17:01
Zé Cotinha

windows 10 for workstation are indicated for CPUs above 64 cores or 2 or more installed CPUs. And for server motherboards with large amounts of SSD / HDs and that communicate with Windows Server / Azure

Posted on 2020-05-03 14:46:29
Christopher

Since SMT ON resulted in lower performance, can you try with processor group extender on?
https://bitsum.com/product-...

Posted on 2020-05-17 22:25:42