Puget Systems print logo


Read this article at https://www.pugetsystems.com/guides/578
Dr Donald Kinghorn (Scientific Computing Advisor )

Hyper-Threading may be Killing your Parallel Performance

Written on July 2, 2014 by Dr Donald Kinghorn

Hyper-Threading, hyperthreading, or just HT for short, has been around on Intel processors for over a decade and it still confuses people. I’m not going to do much to help with the confusion. I just want to point out an example from some testing I was doing recently with the ray-tracing application POV-ray that surprised me. Hyper-threading dramatically lowered the performance on a multi-core test system running Windows when running POV-ray in parallel.


POV-ray is a ray tracing program that has been running in parallel for many years and is well maintained and supported on Windows and Linux. It’s often used as a CPU benchmark because of its heavy computational demands and its parallel implementation. I was doing some benchmarking on our quad socket systems when I hit an unexpected anomaly with Hyper-Threading under Windows server 2008 R2. It was interesting enough that I thought I’d use it as an opportunity to talk about Hyper-Threading a bit.

Hyper-Threading ... not so great (in my experience)

It will not come as a surprise to anyone who has worked with computers for HPC that Hyper-Threading is usually detrimental to performance (or at best does nothing). The HPC sys-admin practice for Intel based systems, that are going to be used for any serious computing or parallel workloads, is to disable Hyper-Threading in the BIOS by default. In the first few years of Hyper-Threading existence it was not unusual to see from 10-40% performance loss on a system if it was left enabled. However, in the last few years I haven’t really seen much system performance loss from leaving it on so I usually don’t bother with disabling it anymore (just in case there is something that might benefit from it). I know how many real cores are in my own systems and what performance I can get out of them. Just remember, for computation heavy workloads Hyper-Threading doesn’t add any real compute capability.

Hyper-Threading is really meant for “better” thread scheduling. I can see how that might be a good thing is some cases. Imagine a code that is structured so that it is spawning lots of parallel work threads from a pool of “work” in a “round-robin” type of process. If you can eliminate the overhead of the thread spawning process you may get a speedup. (?) I would also guess there are codes that are not well adapted for parallel execution that may get an “accidental” boost too.

“However, I have never personally seen ANY program benefit from Hyper-Threading. I’m guessing there are some codes out there that may benefit, but I’ve never experienced it first hand. If I find one I’ll let you know …”  I found one! cinebench, see comments --dbk

I don’t see much harm with leaving Hyper-Threading enabled these days but if you are running a demanding application it is absolutely worth seeing if disabling it in the BIOS helps with performance. I would most likely leave it enabled on a “consumer” processor like the core-i7 … but test! Having said that, my computer use is mostly on Linux and modern Linux kernels are great at process scheduling and don’t seem to have any trouble with Hyper-Threading. I don't personally use Windows in a demanding way very often and this is where the big (bad) “surprise” came in.

POV-ray bad scaling on Windows with hyperthreading enabled

The plot above shows a lot of information. It is a plot of “speedup” vs “number of CPU threads” for the standard benchmark scene calculation in POV-ray 3.7.0. run on our testbench quad Xeon system (4 x E5-4624L v2 10-core) There are 40 real cores. Jobs were run on from 1 to 40 cores under CentOS 6.5 Linux, and Windows Server 2008 R2 with Hyper-Threading on and off. This plot is showing the “parallel scaling” of POV-ray on this system. The straight diagonal line is “perfect” linear scaling. The measured performance points are fit against Amdahl’s Law to show the parallel scaling fall-off with increasing thread process count.

Major Findings

  • Hyper-Threading made no difference under Linux (results were the same with it on or off)
  • Windows 2008 R2 with Hyper-threading on showed very poor thread scaling. This was shockingly unexpected!
  • Disabling Hyper-Threading in the BIOS greatly improved the parallel performance with Windows 2008 R2.
  • Overall performance on Linux was better than Windows. (I’ll have another post up soon that will have the performance data and more details about the POV-ray testing )


  • Remember this is just one particular application POV-ray 3.7. I had recently tested Zemax OpticStudio on this system with the same Windows 2008 R2 install and the scaling was fine with Hyper-Threading on.
  • Also note that this is an older Windows Server version, things may be different on Server 2012. (you have to run a Server version to get support to run on a quad socket system with Windows)

Hope you found this interesting!

Happy computing! --dbk

Tags: Hyperthreading, HPC, Quad Xeon, POV-ray, Linux, Windows Server
Donald Kinghorn

OK,OK, I looked a little and I did find a program that benefits from Hyper-Threading. I ran the cinebench 11.5 (Cinema 4D benchmark) with HT on and off. Here's the results;

HT on: all cores 40/80, CPU score = 3118, 1 core CPU score = 65, MP ratio = 47.62x
HT off: all cores 40/40, CPU score = 2671, 1 core CPU score = 68, MP ratio = 39.18x

Notice that with HT on the result is "super-linear" i.e. speedup greater than 40! This looks like the case I mentioned in the post where there is a pool of work that is being farmed out to threads in batches as each one finishes. There is significant thread launch overhead in the code and HT is helping with the thread management. I don't know if this is the nature of the problem or just an artifact of how they chose to do their parallel implementation. ?? In any case, Hyper-Threading helps significantly with this code.
Best wishes

Posted on 2014-07-03 22:08:05
Ken Johnson

(Not really a performance analysis person, though sometimes I have a hand in this sort of thing anyway.)

As with most performance related quandaries, it is tough to make a blanket statement about whether HT is beneficial or not. The reality is that it is going to vary quite a bit depending on your workload. The golden rule of performance analysis applies: To get an answer that really has teeth behind it, one has to measure the scenario in question and find out.

The theory behind HT is that while modern processors are out of order, that is not always sufficient to hide various latencies & keep the execution resources of the core busy. If there are a blend of workloads active that do not compete for the same execution resources, HT allows for the potential of leveraging more of the core's otherwise idled resources (during stalls etc.).

Now whether that is the case is, again, workload dependant. If your workload is entirely uniform for its inner loop computation, it is possible that all threads are going to compete for the same execution resources anyway and HT won't help much. And, of course, there is extra work required to synchronize & control most parallel workloads so this may outweigh the gains of HT if the gains are small in a particular scenario.

If you check the modern processor optimization manuals (e.g. from Intel, Agner, etc.), there are also a couple of processor resources that are statically partitioned between threads that disabling HT would allow one thread to obtain the full allocation thereof. Whether that is beneficial of course, again, is workload dependant - those resources may or may not be the bottleneck in any given scenario.

P.S. The scheduler code base in WS08R2 is a little bit less than 4 years behind the state of the art there nowadays. It'd be interesting to see what the numbers were on WS2012R2 as there have been a number of changes since then. Naturally, it'd be necessary to run the measurements to see how much of an impact those make to your workload.

Posted on 2014-07-04 07:26:34
Jeremy Hill

Just to mention, in case any of our customers should run across this post, as you found with Cinema, HT is beneficial with our renderer (Maxwell) as well -- coincidentally, I just happened to be testing this today on an i7-4930MX, where it yielded a 40% speedup.

Posted on 2014-07-07 04:57:11
Privat Privat

i have a i7-3960x, speedup isnt noticed because the game runs 60 fps stable, but disableing HT removed all the hickups.

Posted on 2016-04-23 11:09:44

Here is how I think if Hyperthreading:

Hyperthreading OFF: Imagine a carpenter making a birdhouse - all the tools are in front of him, and he can use any tool he wants without waiting.

Hyperthreading ON: Two carpenters, each making a birdhouse, both carpenters are sharing one set of tools - if the one carpenter is using the tool the other carpenter needs, the other carpenter needs to wait. No carpenter has priority over the other, both are equally likely to have to wait for a given tool.

Running with Hyperthreading on will cause execution penalties, but if the penalty is less than 50%, there should be a net increase in throughput because there was a doubling of the number of threads executing.

By altering what each carpenter is working on - say one making a birdhouse, the other a bookcase - you could minimize the execution penalties, as it is unlikely both carpenters would need the same tools at the same time, owing to their different workloads.

Of course, you typically have no control over what execution threads share the same processor core, but when you have several VMs running on the same multi-core Hyperthreading CPU, the likely hood that two threads sharing the same core would both want the same processor element at the same time is comparably lower than if you are running the exact same rendering code on each execution thread sharing the same core.

Posted on 2014-07-14 14:06:44
Privat Privat

why not develop a cpu that has extra tools, where both carpenters can have access to two of every tool each, so they can use both hands!

Posted on 2016-04-23 11:11:18

That's a rather interesting review. The only thing I was thinking is that I don't think you let HyperThreading hit it's stride. With HyperThreading enabled, you had 80 logical cores at your disposal, but never tested POV-ray with more than 40 threads. With HyperThreading enabled, you , then only used half those cores. If the OS's scheduler wasn't aware of HyperThreading, it may have put the first 40 threads on the first 40 logical processors (out of the 80 available), which would have only put a workload on cores 1-20 with cores 21-40 staying idle.
I've heard rumors that Linux's thread scheduler is much more aware of such things and likely would have filled up every odd logical core before it started filling up the cores created by HyperThreading, hence the negligible difference between HT enabled and disabled under Linux since it would have spread the threads out in a way that would have had the least impact on performance.

I suspect if you run the test again with POV-ray running at 80 threads with HyperThreading enabled, you'll start to see some big differences with HyperThreading pulling away.

AMD's FX lineup had a similar issue under Windows 7. Thread schedulers weren't aware that cores were paired in to modules and that sticking a two thread workload on cores 1 and 2 would result in significantly less performance than if it were to stick it on cores 1 and 3.

Thanks for the review!

Posted on 2014-09-04 16:44:06

Just a thought: For HT to be of benefit you need a couple of items that are met:

* you need to have enough threads. If you don't then it's irrelevant.
* Because half of your "cores" are in fact only register sets without any execution hardware attached, you need code that stalls the CPU often enough, so that the second "execution point" is a benefit.

That's the pure theory, but there are some more aspects to this:
* by executing twice the threads you are almost certainly increasing the pressure on the memory caches. So to be a benefit your code needs a memory access pattern that all HT threads fit well enough in the cache, size-wise. And at the same time it needs to have enough cache misses that the second set of threads gets to work when the first set of threads stall. That's quite possible, e.g. through the structure of the cache, but it's extremely hard to predict.
* Linux has highly optimized schedulers that have known about topology for some time.

But the 3rd point above, that it needs to fit in cache and at the same time have enough cache misses is what makes it so hard to predict if HT will help (improving cpu utilization on cache misses) or hurt (by making the hot spot data spill out of the cache).

Posted on 2016-01-29 14:41:21
Privat Privat

The only reson i found this post on the internett was because dark souls 3 apparently hates hyper treading... disabling it removes the "hickups".

I tought at first it was the graphics card due its a mostly visualy intensive game, not much going on.. but apparently it was hyper's fault...

Disabled it, and increased graphic settings.. now the game runs smoother... go figure.. also less noise from cpu fan, guess it runs cooler...

Posted on 2016-04-23 11:07:08