Table of Contents
We now have our quad Opteron testbench system up and running so I decided to continue testing with Zemax OpticStudio and run some parallel scaling and performance analysis using this new system and our quad Xeon test system. I’ve done some curve fitting to the Amdahl’s Law equation using the measured parallel performance on the two systems. With this analysis it’s possible to do some speculative performance estimates for processors at different clock speeds and core counts. Be sure to read the caveats section at the end of the post!
Software
 OS: Windows Server 2008 R2
 Test software: Zemax OpticStudio14 SP2
 Test job: “Double Gauss 28 degree field.zmx” file running the "performance" option
 Reported values: Millions of Ray Surfaces Per Second
Test Systems

Puget Systems Peak Quad Opteron:
 4 x AMD Opteron 6344 @2.6GHz 12core
 64GB DDR3 1600 Reg ECC
 …

Puget Systems Peak Quad Xeon:
 4 x Intel Xeon E54624L v2 @1.9GHz 10core
 64GB DDR3 1600 Reg ECC
 …
Amdahl’s Law
Amdahl’s Law basically says that speedup of a parallel code is limited by the sequential fraction. For example a program that runs in one time unit and spends ½ of it’s time in in a sequential code segment and ½ of it’s time in code that can be run in parallel, the speed up of the program can never be greater than a factor of 2 no matter how many parallel processes are used. A code that spends 99% of it’s runtime in parallel will never exceed a speedup of over 100. The following equation shows this relationship in terms of the parallel fraction;
S(n) = T(1)/T(n) = 1/( ( 1P ) + P/n )
S(n) is the speedup for n parallel processes and P is the “parallel fraction”, T is time.
We can use measured performance of OpticStudio’s standard benchmark using n processes and then do a nonlinear least squares curve fit to the equation above to determine the effective parallel fraction (P) and then use this curve to predict performance at different processor clock speeds.
Quad Xeon and Opteron performance with Zemax OpticStudio
The following two plots show the measured speedup of OpticStudio using from 1 to 40 processes for the Xeon system and 1 to 48 processes for the Opteron system. The brown line is what the performance would be if there was perfect scaling, i.e. linear scaling. The green(AMD) blue(Intel) lines are the Amdahl’s law curve fits.
The effective parallel fraction P determined from the curve fitting is as follows;
Xeon:  P = 0.994984 
Opteron:  P = 0.996589 
Now, to show the data in terms of the of the actual performance metric, namely, “millions of ray surfaces per second”, we observe that in Amdahl’s law, time is inversely proportional to the amount of work (W) done. Thus,
W(n) = W(1)/( (1P) + P/n )
The following plots show the work done with n threads and the fit of the equation above.
Finally, to predict the performance of various Xeon and Opteron processors, we can add a clock scaling factor to the W(n) equation using a ratio of CPU clock speed, C_new/C_old. ( C_new is the clock speed of the processor we are interested in, C_old is the clock speed of the processors we used for the performance measurements, namely, 1.9GHz for the Xeon and 2.6GHz for the Opteron). For example, the predicted performance of a quad 8core Xeon E54627v2 @3.3GHz looks like,
W(32) = 3.3/1.9 * 14.9 / ( (10.995) + 0.995/32 ) = 717 (million ray surfaces per second)
The table below lists most of the currently available quad socket Xeon and Opteron CPU's and their predicted performance on this benchmark. Enjoy!
Predicted Performance for Quad Socket CPU's
Processor  CPU Base Clock Speed  Cores (total "real" cores)  Million Ray Surf/sec (+10%)  Price for 1 CPU  Notes 

Opteron 6386SE  2.8GHz  64  751  $1392  
Xeon E54657Lv2  2.4GHz  48  731  $4394  
Xeon E54627v2  3.3GHz  32  717  $2108  This is my personal recommended system CPU 
Opteron 6380  2.5GHz  64  671  $1088  
Opteron 6378  2.4GHz  64  644  $867  
Xeon E54650v2  2.4GHz  40  630  $3616  
Opteron 6376  2.3GHz  64  617  $703  
Opteron 6348  2.8GHz  48  590  $575  
Xeon E54620v2  2.6GHz  32  566  $1611  
Opteron 6344  2.6GHz  48  548  $415  This is the test Opteron system, measured performance was 552 
Xeon E54610v2  2.3GHz  32  500  $1219  
Xeon E54624Lv2  1.9GHz  32  498  $2405  This is the test Xeon system, measured performance was 508 
Opteron 6366HE  1.8GHz  64  483  $575  
Opteron 6328  3.2GHz  32  472  $575  
Xeon E54607v2  2.6GHz  24  439  $885  
Opteron 6320  2.8GHz  32  413  $293  
Opteron 6308  3.5GHz  16  271  $501  
Xeon E54603v2  2.2GHz  16  257  $551 
The prices for these processors varies quite a bit … We’ll leave performance per dollar as an exercise for the reader 🙂
Caveats!
 It was fun doing this post but it is just a benchmark job for one particular program …Don't read too much into it! but I'm sure you will anyway 🙂
 This program had very good parallel thread scaling so performance was mostly dependent on CPU clock and number of cores. This will not always be the case!
 There are a lot of factors that go into finding the "best" hardware for a particular program and job type, don't over simplify your decisions.
 If you are a Zemax OpticStudio user you are welcome to try my performance formula and run the benchmark yourself to compare. I have had feedback already and the formula seems to be reasonably good even for older hardware and dual socket systems.
Happy computing! –dbk