Home > Puget Systems Blog > Hyper-Threading may be Killing your Parallel Performance
Hyper-Threading may be Killing your Parallel Performance
Dr Donald Kinghorn (HPC and Scientific Computing)

Hyper-Threading may be Killing your Parallel Performance

Posted on July 2, 2014 by Dr Donald Kinghorn

[ View All Blog Posts ]


Hyper-Threading, hyperthreading, or just HT for short, has been around on Intel processors for over a decade and it still confuses people. I’m not going to do much to help with the confusion. I just want to point out an example from some testing I was doing recently with the ray-tracing application POV-ray that surprised me. Hyper-threading dramatically lowered the performance on a multi-core test system running Windows when running POV-ray in parallel.

POV-ray

POV-ray is a ray tracing program that has been running in parallel for many years and is well maintained and supported on Windows and Linux. It’s often used as a CPU benchmark because of its heavy computational demands and its parallel implementation. I was doing some benchmarking on our quad socket systems when I hit an unexpected anomaly with Hyper-Threading under Windows server 2008 R2. It was interesting enough that I thought I’d use it as an opportunity to talk about Hyper-Threading a bit.

Hyper-Threading ... not so great (in my experience)

It will not come as a surprise to anyone who has worked with computers for HPC that Hyper-Threading is usually detrimental to performance (or at best does nothing). The HPC sys-admin practice for Intel based systems, that are going to be used for any serious computing or parallel workloads, is to disable Hyper-Threading in the BIOS by default. In the first few years of Hyper-Threading existence it was not unusual to see from 10-40% performance loss on a system if it was left enabled. However, in the last few years I haven’t really seen much system performance loss from leaving it on so I usually don’t bother with disabling it anymore (just in case there is something that might benefit from it). I know how many real cores are in my own systems and what performance I can get out of them. Just remember, for computation heavy workloads Hyper-Threading doesn’t add any real compute capability.

Hyper-Threading is really meant for “better” thread scheduling. I can see how that might be a good thing is some cases. Imagine a code that is structured so that it is spawning lots of parallel work threads from a pool of “work” in a “round-robin” type of process. If you can eliminate the overhead of the thread spawning process you may get a speedup. (?) I would also guess there are codes that are not well adapted for parallel execution that may get an “accidental” boost too.

“However, I have never personally seen ANY program benefit from Hyper-Threading. I’m guessing there are some codes out there that may benefit, but I’ve never experienced it first hand. If I find one I’ll let you know …”  I found one! cinebench, see comments --dbk

I don’t see much harm with leaving Hyper-Threading enabled these days but if you are running a demanding application it is absolutely worth seeing if disabling it in the BIOS helps with performance. I would most likely leave it enabled on a “consumer” processor like the core-i7 … but test! Having said that, my computer use is mostly on Linux and modern Linux kernels are great at process scheduling and don’t seem to have any trouble with Hyper-Threading. I don't personally use Windows in a demanding way very often and this is where the big (bad) “surprise” came in.

POV-ray bad scaling on Windows with hyperthreading enabled

The plot above shows a lot of information. It is a plot of “speedup” vs “number of CPU threads” for the standard benchmark scene calculation in POV-ray 3.7.0. run on our testbench quad Xeon system (4 x E5-4624L v2 10-core) There are 40 real cores. Jobs were run on from 1 to 40 cores under CentOS 6.5 Linux, and Windows Server 2008 R2 with Hyper-Threading on and off. This plot is showing the “parallel scaling” of POV-ray on this system. The straight diagonal line is “perfect” linear scaling. The measured performance points are fit against Amdahl’s Law to show the parallel scaling fall-off with increasing thread process count.

Major Findings

  • Hyper-Threading made no difference under Linux (results were the same with it on or off)
  • Windows 2008 R2 with Hyper-threading on showed very poor thread scaling. This was shockingly unexpected!
  • Disabling Hyper-Threading in the BIOS greatly improved the parallel performance with Windows 2008 R2.
  • Overall performance on Linux was better than Windows. (I’ll have another post up soon that will have the performance data and more details about the POV-ray testing )

Caveats

  • Remember this is just one particular application POV-ray 3.7. I had recently tested Zemax OpticStudio on this system with the same Windows 2008 R2 install and the scaling was fine with Hyper-Threading on.
  • Also note that this is an older Windows Server version, things may be different on Server 2012. (you have to run a Server version to get support to run on a quad socket system with Windows)

Hope you found this interesting!

Happy computing! --dbk


Tags: Hyperthreading, HPC, Quad Xeon, POV-ray, Linux, Windows Server


Share this blog post!

Donald Kinghorn

OK,OK, I looked a little and I did find a program that benefits from Hyper-Threading. I ran the cinebench 11.5 (Cinema 4D benchmark) with HT on and off. Here's the results;

HT on: all cores 40/80, CPU score = 3118, 1 core CPU score = 65, MP ratio = 47.62x
HT off: all cores 40/40, CPU score = 2671, 1 core CPU score = 68, MP ratio = 39.18x

Notice that with HT on the result is "super-linear" i.e. speedup greater than 40! This looks like the case I mentioned in the post where there is a pool of work that is being farmed out to threads in batches as each one finishes. There is significant thread launch overhead in the code and HT is helping with the thread management. I don't know if this is the nature of the problem or just an artifact of how they chose to do their parallel implementation. ?? In any case, Hyper-Threading helps significantly with this code.
Best wishes
--dbk

Posted on 2014-07-03 22:08:05
Ken Johnson

(Not really a performance analysis person, though sometimes I have a hand in this sort of thing anyway.)

As with most performance related quandaries, it is tough to make a blanket statement about whether HT is beneficial or not. The reality is that it is going to vary quite a bit depending on your workload. The golden rule of performance analysis applies: To get an answer that really has teeth behind it, one has to measure the scenario in question and find out.

The theory behind HT is that while modern processors are out of order, that is not always sufficient to hide various latencies & keep the execution resources of the core busy. If there are a blend of workloads active that do not compete for the same execution resources, HT allows for the potential of leveraging more of the core's otherwise idled resources (during stalls etc.).

Now whether that is the case is, again, workload dependant. If your workload is entirely uniform for its inner loop computation, it is possible that all threads are going to compete for the same execution resources anyway and HT won't help much. And, of course, there is extra work required to synchronize & control most parallel workloads so this may outweigh the gains of HT if the gains are small in a particular scenario.

If you check the modern processor optimization manuals (e.g. from Intel, Agner, etc.), there are also a couple of processor resources that are statically partitioned between threads that disabling HT would allow one thread to obtain the full allocation thereof. Whether that is beneficial of course, again, is workload dependant - those resources may or may not be the bottleneck in any given scenario.

P.S. The scheduler code base in WS08R2 is a little bit less than 4 years behind the state of the art there nowadays. It'd be interesting to see what the numbers were on WS2012R2 as there have been a number of changes since then. Naturally, it'd be necessary to run the measurements to see how much of an impact those make to your workload.

Posted on 2014-07-04 07:26:34
Jeremy Hill

Just to mention, in case any of our customers should run across this post, as you found with Cinema, HT is beneficial with our renderer (Maxwell) as well -- coincidentally, I just happened to be testing this today on an i7-4930MX, where it yielded a 40% speedup.

Posted on 2014-07-07 04:57:11
Ken08534

Here is how I think if Hyperthreading:

Hyperthreading OFF: Imagine a carpenter making a birdhouse - all the tools are in front of him, and he can use any tool he wants without waiting.

Hyperthreading ON: Two carpenters, each making a birdhouse, both carpenters are sharing one set of tools - if the one carpenter is using the tool the other carpenter needs, the other carpenter needs to wait. No carpenter has priority over the other, both are equally likely to have to wait for a given tool.

Running with Hyperthreading on will cause execution penalties, but if the penalty is less than 50%, there should be a net increase in throughput because there was a doubling of the number of threads executing.

By altering what each carpenter is working on - say one making a birdhouse, the other a bookcase - you could minimize the execution penalties, as it is unlikely both carpenters would need the same tools at the same time, owing to their different workloads.

Of course, you typically have no control over what execution threads share the same processor core, but when you have several VMs running on the same multi-core Hyperthreading CPU, the likely hood that two threads sharing the same core would both want the same processor element at the same time is comparably lower than if you are running the exact same rendering code on each execution thread sharing the same core.

Posted on 2014-07-14 14:06:44
tech.kyle

That's a rather interesting review. The only thing I was thinking is that I don't think you let HyperThreading hit it's stride. With HyperThreading enabled, you had 80 logical cores at your disposal, but never tested POV-ray with more than 40 threads. With HyperThreading enabled, you , then only used half those cores. If the OS's scheduler wasn't aware of HyperThreading, it may have put the first 40 threads on the first 40 logical processors (out of the 80 available), which would have only put a workload on cores 1-20 with cores 21-40 staying idle.
I've heard rumors that Linux's thread scheduler is much more aware of such things and likely would have filled up every odd logical core before it started filling up the cores created by HyperThreading, hence the negligible difference between HT enabled and disabled under Linux since it would have spread the threads out in a way that would have had the least impact on performance.

I suspect if you run the test again with POV-ray running at 80 threads with HyperThreading enabled, you'll start to see some big differences with HyperThreading pulling away.

AMD's FX lineup had a similar issue under Windows 7. Thread schedulers weren't aware that cores were paired in to modules and that sticking a two thread workload on cores 1 and 2 would result in significantly less performance than if it were to stick it on cores 1 and 3.

Thanks for the review!

Posted on 2014-09-04 16:44:06
See a problem on this page? Let us know.