Quad Xeon vs Opteron, Zemax OpticStudio

We now have our quad Opteron test-bench system up and running so I decided to continue testing with Zemax OpticStudio and run some parallel scaling and performance analysis using this new system and our quad Xeon test system. I’ve done some curve fitting to the Amdahl’s Law equation using the measured parallel performance on the two systems. With this analysis it’s possible to do some speculative performance estimates for processors at different clock speeds and core counts. Be sure to read the caveats section at the end of the post!

Software

  • OS: Windows Server 2008 R2
  • Test software: Zemax OpticStudio14 SP2
  • Test job: “Double Gauss 28 degree field.zmx” file running the "performance" option
  • Reported values: Millions of Ray Surfaces Per Second

Test Systems

  • Puget Systems Peak Quad Opteron:

    • 4 x AMD Opteron 6344 @2.6GHz 12-core
    • 64GB DDR3 1600 Reg ECC
  • Puget Systems Peak Quad Xeon:

    • 4 x Intel Xeon E5-4624L v2 @1.9GHz 10-core
    • 64GB DDR3 1600 Reg ECC

Amdahl’s Law

Amdahl’s Law basically says that speedup of a parallel code is limited by the sequential fraction. For example a program that runs in one time unit and spends ½ of it’s time in in a sequential code segment and ½ of it’s time in code that can be run in parallel, the speed up of the program can never be greater than a factor of 2 no matter how many parallel processes are used. A code that spends 99% of it’s runtime in parallel will never exceed a speedup of over 100. The following equation shows this relationship in terms of the parallel fraction;

S(n) = T(1)/T(n) = 1/( ( 1-P ) + P/n )

S(n) is the speedup for n parallel processes and P is the “parallel fraction”, T is time.

We can use measured performance of OpticStudio’s standard benchmark using n processes and then do a non-linear least squares curve fit to the equation above to determine the effective parallel fraction (P) and then use this curve to predict performance at different processor clock speeds.

Quad Xeon and Opteron performance with Zemax OpticStudio

The following two plots show the measured speedup of OpticStudio using from 1 to 40 processes for the Xeon system and 1 to 48 processes for the Opteron system. The brown line is what the performance would be if there was perfect scaling, i.e. linear scaling. The green(AMD) blue(Intel) lines are the Amdahl’s law curve fits.

 

The effective parallel fraction P determined from the curve fitting is as follows;

Xeon: P = 0.994984
Opteron: P = 0.996589

Now, to show the data in terms of the of the actual performance metric, namely, “millions of ray surfaces per second”, we observe that in Amdahl’s law, time is inversely proportional to the amount of work (W) done. Thus,

W(n) = W(1)/( (1-P) + P/n )

The following plots show the work done with n threads and the fit of the equation above.

Finally, to predict the performance of various Xeon and Opteron processors, we can add a clock scaling factor to the W(n) equation using a ratio of CPU clock speed, C_new/C_old. ( C_new is the clock speed of the processor we are interested in, C_old is the clock speed of the processors we used for the performance measurements, namely, 1.9GHz for the Xeon and 2.6GHz for the Opteron). For example, the predicted performance of a quad 8-core Xeon E5-4627v2 @3.3GHz looks like,

W(32) = 3.3/1.9 * 14.9 / ( (1-0.995) + 0.995/32 ) = 717 (million ray surfaces per second)

The table below lists most of the currently available quad socket Xeon and Opteron CPU's and their predicted performance on this benchmark. Enjoy!

Predicted Performance for Quad Socket CPU's

 
Processor CPU Base Clock Speed Cores (total "real" cores) Million Ray Surf/sec (+-10%) Price for 1 CPU Notes
Opteron 6386SE 2.8GHz 64 751 $1392  
Xeon E5-4657Lv2 2.4GHz 48 731 $4394  
Xeon E5-4627v2 3.3GHz 32 717 $2108 This is my personal recommended system CPU
Opteron 6380 2.5GHz 64 671 $1088  
Opteron 6378 2.4GHz 64 644 $867  
Xeon E5-4650v2 2.4GHz 40 630 $3616  
Opteron 6376 2.3GHz 64 617 $703  
Opteron 6348 2.8GHz 48 590 $575  
Xeon E5-4620v2 2.6GHz 32 566 $1611  
Opteron 6344 2.6GHz 48 548 $415 This is the test Opteron system, measured performance was 552
Xeon E5-4610v2 2.3GHz 32 500 $1219  
Xeon E5-4624Lv2 1.9GHz 32 498 $2405 This is the test Xeon system, measured performance was 508
Opteron 6366HE 1.8GHz 64 483 $575  
Opteron 6328 3.2GHz 32 472 $575  
Xeon E5-4607v2 2.6GHz 24 439 $885  
Opteron 6320 2.8GHz 32 413 $293  
Opteron 6308 3.5GHz 16 271 $501  
Xeon E5-4603v2 2.2GHz 16 257 $551  

The prices for these processors varies quite a bit … We’ll leave performance per dollar as an exercise for the reader 🙂

Caveats!

  • It was fun doing this post but it is just a benchmark job for one particular program …Don't read too much into it! but I'm sure you will anyway 🙂
  • This program had very good parallel thread scaling so performance was mostly dependent on CPU clock and number of cores. This will not always be the case!
  • There are a lot of factors that go into finding the "best" hardware for a particular program and job type, don't over simplify your decisions.
  • If you are a Zemax OpticStudio user you are welcome to try my performance formula and run the benchmark yourself to compare. I have had feedback already and the formula seems to be reasonably good even for older hardware and dual socket systems.

Happy computing! –dbk