Keyshot Multi Core PerformanceWritten on April 14, 2016 by Matt Bach
Keyshot is a CPU-based rendering engine that is currently owned and developed by Luxion. As it is CPU-based, the primary hardware component that is going to impact render times is the CPU. In this article we want to benchmark Keyshot and analyze how effective it is at using multiple CPU cores. Based on this analysis, we can then determine exactly what CPU models should give you the best possible performance for your budget.
Before we get too far, it is important to understand the two most basic CPU specifications:
- The frequency is essentially how many operations a single CPU core can complete in a second (how fast it is).
- The number of cores is how many physical cores there are within a CPU (how many operations it can run simultaneously).
If life was easy, you would be able to compare the performance of different CPUs by simply multiplying the number of cores by the frequency - creating a sort of "combined" or "aggregate" frequency. Unfortunately, that is not how it works in the real world. In reality, the way to calculate the effective performance of a CPU is a bit more complicated and includes using a formula derived from Amdahl's Law:
Theoretical CPU Performance:
- is the theoretical overall speed of the CPU
- is the frequency of the CPU
- is the fraction of the task that can be made parallel
- is the number of CPU cores
This doesn't take into account the differences between CPU architectures - which can make a one generation newer CPU as much as 5-15% faster than an older CPU with identical specs - but what this formula is saying is that the theoretical overall speed of a CPU (S) depends on both the frequency it runs at (F) and an "adjusted" core count that takes into account how much of the task is actually able to be run across multiple cores (P). This P value is commonly called the "parallel efficiency" and while it is very often overlooked when comparing CPU models, it is actually extremely important. The higher the efficiency, the better performance you will see with a higher core count. The lower the efficiency, the less effective more CPU cores will be which makes it more important it is to have a CPU that runs at a higher frequency.
What we will be doing is performing a series of benchmarks to find out the speed of Keyshot with various numbers of cores, then using another formula from Amdahl's Law to determine it's parallel efficiency (P). Once we have this efficiency number, we can then leverage the formula above to estimate the relative performance of a CPU based on the number of cores and the frequency it runs at. By doing so, we can ensure that you will not purchase a more expensive CPU that actually gives you less performance than a cheaper option.
If you want a more in-depth explanation of how we use Amdahl's Law, we have a guide available for Estimating CPU Performance using Amdahls Law.
To accurately benchmark Keyshot, we will be using a pair of Xeon E5-2687W V3 3.1GHz Ten Core CPUs . This will allow us to test with up to 20 physical CPU cores across two CPUs to see exactly how well Mental Ray is able to utilize both a high number of cores as well as multiple physical CPUs. The full specifications for our test system are:
|Motherboard:||Asus Z10PE-D8 WS|
|CPU:||2x Intel Xeon E5-2687W V3 3.1GHz Ten Core|
|RAM:||8x Crucial DDR4-2133 16GB ECC Reg. (128GB total)|
|GPU:||NVIDIA GeForce GTX 980 4GB|
|Hard Drive:||Samsung 850 Pro 512GB SATA 6Gb/s SSD|
|OS:||Windows 10 Pro 64-bit|
|PSU:||Antec HCP Platinum 1000W|
|Software:||Keyshot 6.1 Standalone|
To make sure our results are as consistent as possible we created a custom automation script using AutoIt to start Keyshot, load the test scene, adjust how many CPU cores are available by setting the affinity of the process, then time how long it took to render the scene (or record the FPS). For our test scenes, we used three scenes to test various aspects of Keyshot.
(camera_benchmark.bip - included with Keyshot)
Continuous FPS - 800x554
(keyshot_Bathroom.bip - Keyshot scene download)
Single Frame - 1280x720
(animation.bip - included with Keyshot)
Animation - 1920x1079 - 126 frames
16 samples, 6 bounces, 1 Anti Aliasing, 1 Shadow, 1.5 Pixel Blur
These three scenes will let us benchmark the live FPS of the viewport (as suggested on the Keyshot forums), a single frame render, and rendering an animation. To analyze the results of our testing, we will be presenting our results in terms of the performance measured for each core count compared to the performance measured with just a single core. From these results, we will then use Amdahl's Law to estimate the parallel efficiency for the render. 100% is perfect efficiency where a high core count CPU is ideal, but as the efficiency drops having a CPU with a higher frequency (even at the cost of a lower core count) becomes more and more important.
One thing we want to make very clear is that our testing is only 100% accurate for the files and settings we used. While this should be able to give us a fairly accurate measurement for how well Keyshot can use multiple CPU cores, if you want more accurate results for the scenes you tend to work with we recommend following our Estimating CPU Performance using Amdahls Law guide. It can be a time consuming process (even with automation and rendering the scenes at only moderate settings the testing for this article took a significant amount of machine time) but it is really the only way to know for sure what the parallel efficiency is for exactly what you do.
Live FPS Results
Measuring the FPS of the camera_benchmark.bip scene is a somewhat standard measurement of Keyshot's performance and even has its own dedicated section on Keyshot's forums. Because of this, we thought we would begin our testing here before moving on to actual renders. This will also be a good opportunity to verify that simply recording the FPS of this scene is actually an accurate representation of the performance you would see when rendering a scene or animation.
For this benchmark, we saw near perfect scaling with a parallel efficiency of 99.5%. This means that every time we doubled the number of cores, the measured FPS almost doubled as well. However, while 99.5% is very close to perfect, that missing .5% means that instead of seeing a speedup of 20x with 20 CPU cores - which is what a program with 100% efficiency would have - we actually only saw an speedup of about 18x.
Single Image Render Results
Similar to the previous test, rendering a single image also had a very good parallel efficiency. It wasn't quite as good as the FPS test (99% vs 99.5%), but even 99% is a very respectable result. The nice thing here is that it confirms that the camera FPS benchmark is a fairly accurate way to compare your system's rendering performance in Keyshot.
Animation Render Results
Animations have a bit more going on than just rendering a single frame since the program needs to do a number of calculations (moving the camera, adjusting the model, etc.) between each frame. These in-between tasks are often only able to use one or two cores so the overall parallel efficiency for animations tends to be lower than rendering a single frame. In fact, that is exactly what we saw in our testing.
While our single frame render had an efficiency of 99%, rendering the 126 frames from the cube animation dropped the efficiency down to 96%. A 3% drop in efficiency may not sound like much, but it basically makes our 20 cores only 65% as effective as they were when rendering just a single frame. So instead of seeing a 18x speedup with 20 cores, we are now only seeing about an 11x speedup.
One thing to keep in mind is that the longer it takes to render each individual frame of an animation, the less impact there should be the overall multi core efficiency. This is due to the fact that doing things like changing the camera's location in space tends to take a fairly fixed amount of time - so the less relative time is spent on those actions, the less impact there should be on the overall efficiency. In this case, the time to render each individual frame was fairly quick so this should be considered a worst-case scenario. If your animation takes a minute or longer to render each frame, you should see a parallel efficiency much closer to the 99% we saw when rendering a single frame.
Overall, Keyshot is very effective at utilizing a large number of CPU cores. Even the best program is not perfect, however, so in order to ensure that you are purchasing the exact right CPU we need to determine the relative performance of Keyshot for various CPUs. To do this, we will once again Amdahl's Law - but this time in reverse to find the theoretical performance of a CPU based on it's core count, frequency, and the parallel efficiency we measured. Note that these results were calculated using the "all-core Turbo Boost" frequency (the speed at which the CPU actually runs when all cores are active) and not the base frequency that is in the product name. We've covered the difference between the advertised frequency, maximum Turbo Boost and that all-core Turbo Boost in a number of recent posts including Xeon E5v3 All Core Turbo Boost and Amdahl's Law and Actual CPU Speeds - What You See is Not Always What You Get. Using the all-core Turbo (which unfortunately is not often listed int he CPU's specifications) results in a much more accurate estimation of the real-world performance for each CPU.
If you wish to see the estimated performance for all single and dual E5 V3 CPUs (as well as the Core i7 equivalents), feel free to expand the option above. If you really dig into these numbers, you will find that it is very easy to end up spending more money on a CPU that will actually give you lower performance with Keyshot. For example, a Xeon E5-2687W V3 should be should be about 7% slower than a Xeon E5-2680 V3 even though it is about $500 more expensive.
To help ensure that you are choosing the right CPU, we closely examined both the estimated performance numbers as well as the cost associated with each CPU choice and found seven CPU options (three single CPU and four dual CPU) that should give you the best performance for their price:
- Intel Xeon E5-1650 V3 3.5GHz Six Core
(or Intel Core i7 5930K 3.5GHz Six Core)
- Intel Xeon E5-1660 V3 3.0GHz Eight Core
(or Intel Core i7 5960X 3.0GHz Eight Core)
- Intel Xeon E5-2680 V3 2.5GHz Twelve Core
To give you an idea about how each of these CPUs will perform relative to each other (without having to dig through the collapsed table above) we created a graph showing the theoretical time to render a scene in Keyshot that has the same parallel efficiency as the single frame render we tested:
As you can see, each increase in model (as well as price) gives a significant decrease in render times. The difference in render times between each of the CPUs above ranges from 17% to almost 23%, but on average the difference is about 18%.
One last thing we will note is that while these CPUs should give you the best bang for your buck for Keyshot, they may not be the best CPU for your system as a whole. Depending on what else you use the system for, it may be better to sacrifice a bit of performance in Keyshot to substantially increase the performance of any other programs you regularly use. This can often be a tricky balance to find, so we highly recommend speaking with one of our consultants before purchasing a system that is going to be used for multiple tasks.
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.