Network Rendering in Solidworks 2016Written on December 31, 2015 by Matt Bach
In a recent article, we explored the multi-core efficiency of Solidworks 2016 to determine the type of CPU (either one with a high core count or a high frequency) Solidworks works best with. Our overall conclusion was that a quad core CPU with a high operating frequency will actually give you the best overall performance for almost every task in Solidworks. However, the one exception to this rule is rendering where the efficiency is high enough that a high CPU core count can result in huge gains in performance. The problem is that higher core count CPUs tend to have lower operating frequencies so you would be giving up performance for every other Solidworks task in order to improve your render times.
Luckily, there are a few options available to let you have the best overall performance in Solidworks as well as reducing the time it takes to render images. The easiest option is simply to have a second dedicated machine that is optimized for rendering. In fact, this is exactly what our Rendering Optimized Recommended System for Solidworks machine is designed for. It has relatively poor performance for other Solidworks tasks, but it is as good as you can get (short of a Quad Xeon system) for rendering from a single machine.
If a dedicated rendering machine doesn't make sense, another option is to take advantage of the PhotoView 360 Network Rendering Client to split up the render across multiple machines. When you install this client onto other machines on your local network, it essentially creates a cluster which allows Solidworks to offload portions of a render to these machines. For example, in the screenshot to the right the orange buckets are being rendered on the local host machine while all the blue buckets are being sent over the network to be processed on other machines.
Setting up and using network rendering is actually very easy but there are a few limitations that you should be aware of:
- The network offload only applies to the final render - the preprocessing step still needs to be performed completely on the host machine. This means that if the preprocessing step takes a significant amount of time to complete even a large cluster may still be overall slower than a single high core count system.
- Installing the client does not require a Solidworks license, but you can only perform a network render if you are a Solidworks Subscription Services customer
- If you do not have a high speed network (gigbit preferably), you will not get anywhere near peak performance. We will go over how important this is in the Impact of network speed on performance section.
Setting up the client machine(s)
To install the PhotoView 360 Network Rendering Client on a system, you first need to download a few files from Solidworks. To do so, follow the steps below:
Log into the Solidworks Customer Portal then click on the "Downloads and Updates" link under the Download section.
Select the version of Solidworks you have at the top of the page then click on the "SOLIDWORKS Products" link next to the version and service pack you use.
After accepting the Solidworks EULA, you will be given the choice to either download the Solidworks Installation Manager or download files individually. While you can install the PhotoView 360 Client with the installation manager, we recommend clicking on the link to download the files individually as installing with this method will not require you to find your Solidworks product key.
On the next page, select the service pack, language, and product you are using. Note that since these files are considered update files, the service pack you are using may not be listed if there is not yet a service pack available for your product. In that case, simply choose whatever is available.
There are two files you need in order for PhotoView 360 to function correctly. The first is "Bonjour" which is listed under "Step 2 - Required Prerequisite Downloads." The second file is the "PhotoView 360 Network Render Client" itself which is listed under "Step 4 - Required Updates".
With these two files downloaded, install both onto any machine you want to function as a network offload client. Once both are installed, simply run the PhotoView 360 Net Render Client and click on the "Enter Client Mode Now" to make the machine available. If you find that the host system is not sending data to the client, ensure that Bonjour is installed and/or check your firewall settings.
If you would like, the client even supports setting a schedule so that you can make a machine usable as a network offload only at certain times. This can be extremely useful if you want to use a machine only when it is idle - such as when an employee is not in the office.
Another thing you may consider is to set the affinity of the PhotoView 360 client so that only a certain number of CPU cores can be used by the program. This is not natively supported by the client, but you can either manually set the affinity through task manager or create a shortcut that automatically sets the affinity when the client is launched. This can be extremely useful if you have an employee who has a quad (or more) core CPU, but only needs one or two CPU cores for what they actually do. In that case, you can take those extra unused CPU cores and make them available to speed up your render times without adversely impacting the user of the client machine.
Enabling network rendering in Solidworks
Once you have one or more client machines setup, all you need to do to enable network rendering in Solidworks. Then go to the PhotoView 360 options menu, scroll down to the bottom, and click the box for "Network Rendering". With this selected, Solidworks should automatically discover any available clients on the networks and send buckets (or cells) of a render to those clients for processing.
If you want to, there are two configurable options that may further improve performance. The first is the "client workload" which determines how many buckets Solidworks will group together at a time to send to each client. 200% is the default which means that for each thread (one per core without hyperthreading, two per core with) on the client machine, Solidworks will send two buckets at a time. So a client with a quad core hyperthreaded CPU (which has eight threads) would be sent 16 buckets at a time with the default setting.
Increasing the client workload may reduce the load on the network, but the higher the bucket count the larger the risk of over-allocating buckets towards the end of a render. If the last 100 buckets of a render are all sent to client machines, the host machine will actually start rendering those buckets anyway. This means that both the host and some of the client machines are both working on the same buckets which is a duplication of work.
The second setting is the choice to "Send data for network job". This is selected by default and basically means that the host machine will send the data necessary to the client machines directly. If you have a large number of clients, you may want to experiment with setting a shared network directory. This reduces the load on the host machine, but you will have to ensure that permissions and any cross-platform configurations are setup correctly.
Impact of network speed on performance
Offloading a calculation to a separate machine requires data to be sent over the network, so it stands to reason that the performance you can achieve is reliant on the speed of your network. To see how big of an impact this can have with Solidworks network rendering, we tested rendering an image over a 10mbps, 100mbps, and 1000mbps (gigabit) network.
This testing was performed using Solidworks 2016 with six machines in total (one host and five clients), each with an Intel Core i7 6700K Quad Core 4.0GHz CPU. Note that we will only be reporting the render time and ignoring (for now) the time it takes to complete the prepass step.
|Testing Hardware (for host and client machines)|
|CPU:||Intel Core i7 6700K 4.0GHz Quad Core 8MB 95W|
|RAM:||4x DDR4-2133 4GB (16GB total)|
|GPU:||PNY Quadro M4000 8GB (host only)|
|Hard Drive:||Samsung 850 Pro 512GB SSD|
|OS:||Windows 10 Pro 64-bit|
|Host Software:||Solidworks 2016 SP 0.1|
|Client Software:||PhotoView 360 Net Render Client (ver. 88486)|
In the chart above, we are showing the speedup for between one and five clients with various network speeds. The y-axis is the amount of speedup compared to just using the host system - so a speedup of two means the render finished in half the time and a speedup of four means the render completed in a quarter the time. The efficiencies shown were calculated using Amdahl's Law which we typically use for testing the multi-core efficiency of programs but works equally well for determining the efficiency of a cluster.
What we found is that using a 1000 mbps (gigabit) network results in an overall efficiency of about 96%. This basically means that each client you use has about a 4% overhead so there is diminishing returns as you add more and more clients. 96% is actually very impressive, but the efficiency drops to about 89% with a 100mbps network and all the way down to just 45% with a 10mbps network. In other words, you really want to have as fast of a network as possible to get the full benefit from network rendering. This definitely means you should avoid using wireless if at all possible.
Cluster vs Dual Xeon Performance
Now that we know the overhead associated with network rendering (about 4% with a gigabit network) we can determine how well a cluster of machines would perform compared to a dedicated Dual Xeon machine that is optimized solely for rendering in Solidworks. To do so, we need to combine the findings from our Solidworks 2016 Multi Core Performance article with the findings from the previous section.
To start, lets look at the estimated time it would take to perform just the render step (ignoring the prepass time) with various configurations:
Looking at this chart, creating a small cluster appears to be very attractive. With a 6700K host machine and just two client machines (also with 6700K CPUs) you should actually get better render performance than a dual Xeon system with two Xeon E5-2660 V3 CPUs. Adding just one more client machine would increase the performance to best a pair of Xeon E5-2680 V3 CPUs and using five client machines would roughly match the performance of a pair of Xeon E5-2699 V3 CPUs. In fact, if you work out the cost for a number of machines with 6700K CPUs versus a Dual Xeon workstation, a cluster would actually be both cheaper and faster than a Dual Xeon workstation when you ignore the prepass step.
However, once you take into account the prepass step a cluster looks much less attractive. This is due to the fact that the prepass cannot be offloaded (so it has to be done on the host machine), but it is actually fairly well threaded so a Dual Xeon system is ideal for completing that task in a timely manner. Since the amount of time the prepass takes varies based on the render file, we estimated the total time it would take to complete a render with each configuration when the prepass takes between 15 and 40% of the total render time.
If your prepass time tends to be fairly short (about 15% of the total render time), a cluster may still work well for you. It would take more than the eight machines we calculated the performance of to match a Dual Xeon E5-2699 V3, but only five clients (plus the host) to beat the much more reasonably priced Dual Xeon E5-2680 V3. However, once you get up to the 30-40% prepass range, you are never going to be able to match even a Dual Xeon E5-2680 V3 no matter how many clients you have. You can, however, still beat the performance of a Dual Xeon E5-2640 V3 or lower with around two to three clients.
If you look at the chart to the right, we estimated the total cost you would need to pay for a range of Dual Xeon systems alongside a cluster with up to eight clients using Intel Core i7 6700K CPUs. These costs were based off our Recommended Systems only with the GPU removed for the client machines.
The only time you can match the performance of a Dual Xeon E5-2680 V3 system is if the prepass is only 20% or less of the total render time. Even then, it would take about 5-6 client machines which is actually $2,650-$4150 more than the Dual Xeon system. Even going down to the Xeon E5-2660 V3 CPUs (which would run about $6,500 for a full system), you would need to have between three and five clients to match the performance depending on the prepass percentage. This would make a cluster anywhere from $400 to $3,400 more expensive.
Unfortunately, this trend continues all the way across the board with a Dual Xeon system consistently performing better than a cluster at a specific budget. The only exception is with the lowest end Xeon CPUs - the Dual Xeon E5-2620 V3. At that point, a host and single client with 6700K CPUs should actually be a good deal faster than the Dual Xeon for roughly the same price. Overall, this means that a cluster of quad core CPU machines is great when you need a simple and cost effective way to make a big dent in the time it takes to render an image, but it is not the best (or most cost-effective) way to get the best possible render times.
As long as you have a high speed network, using the PhotoView 360 Network Render Client can be an effective way to decrease the time it takes to render images in Solidworks. In fact, it is so effective that you can cut your render times in half with only one or two clients. However, the main limitation to this feature is the fact that the prepass step (which can be anywhere from 15-40% of the total render time) cannot be offloaded which limits the overall effectiveness. This means that in most situations, a dedicated Dual Xeon machine that has been optimized for rendering (such as our Rendering Optimized Recommended System) will be more cost effective than a cluster of quad core systems as well as having a much higher performance ceiling.
Of course, even if you are using a Dual Xeon workstation to render images that does not negate the usefulness of network rendering. If you really need to further improve your render times, you could use a combination of a Dual Xeon host machine along with a cluster of client machines for even better performance. By our estimations, a cluster of four Dual Xeon E5-2680 V3 machines should be able to complete a render with a 30% prepass in about 16% the total time it would take a single Intel Core i7 6700K machine. You would be paying over $40,000 to do so, but it would reduce what would be a one hour render with a 6700K down to less than 10 minutes!
Overall, we view network rendering as a easy way to improve render times if you are on a tight budget or already have a number of idle machines available. If you are looking for a solution to minimize the time it takes to render an image as much as possible, however, we would suggest looking into a dedicated Dual Xeon workstation first as it should be more cost effective than purchasing a bunch of machines that are individually lower cost.
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.