Accelerated Parallel Computing
with NVIDIA Tesla and Intel Xeon Phi
Peak delivers the highest possible compute performance into the hands of developers, scientists, and engineers to advance computing enabled discovery and solution of the world's most challenging computational problems.
Puget Systems has over 16 years experience designing and building high quality and high performance PCs. Our emphasis has always been on reliability, high performance, and quiet operation. We take this experience to the HPC sector with our Peak family of workstations and servers. Through in-house testing we do not blindly follow the industry -- we help lead it. We provide the products below as starting points that we feel cover some of the most compelling areas that we can contribute to the HPC community. Do you have a project that needs some serious compute power, and you don't know where to turn? Let us help, it's what we do!
Minimum noise and maximum performance, reliability and usability. Puget Peak is an evolutionary step from our custom systems experience. Genesis performance post-production, Summit server stability, Serenity silent design, Obsidian reliability and even the diminutive Echo have influenced Peak. We've taken extra steps like developing our own custom Arduino based thermal fan controller and fabricating custom fan shrouds.
TeraFLOPS. Using dual Intel E5-2687W CPU's we see over 300 DP Linpack GFLOPS on the base Peak system and 765 GFLOPS directly logged into a single Xeon Phi 5110 coprocessor with the same Intel MKL library benchmark. The well established CUDA platform and libraries deliver similar levels of performance with NVIDIA Tesla and have been put to good use in many existing codes. There is tremendous potential for applications leveraging the computing power of the Intel Xeon Phi coprocessor and the NVIDIA Tesla and Titan GPGPU's.
Ready for use. Peak systems are installed, configured and tested under load before they ship and will (optionally) arrive with the setup and tools you need to get started. Our CentOS setup will provide a configuration that can be the basis of your working environment.
Part of what makes our cooling both effective and quiet is that we specifically target the hot spots of each system. We place fans only where they are needed and only when they are needed. We then verify the final configuration with extensive testing, full load stress testing, and thermal imaging to ensure excellent cooling.
We know that these PCs are intended for heavy, long duration workloads. We have designed them for long life with 24/7 load, and that is our primary design goal. Through targeted cooling and high quality thermal solutions, we are able to achieve an excellent low noise level while maintaining the cooling necessary for long term high load. Even better, since we are implementing a custom cooling plan for each order, if you have a preference of whether you'd like us to tune more aggressively in either direction (towards even quieter operation, or more extreme cooling), all you have to do is let us know!
NVIDIA TESLA GPU Accelerator
The NVIDIA Tesla series of GPU accelerator cards sparked intense interest in speeding up applications by using algorithms with high thread count parallelism utilizing the large number of execution cores available on GPU's. The Tesla cards although based on GPU cores are designed specifically for computation and forgo video output. Much to NVIDIA's credit the strong developer ecosystem they established around their CUDA SDK has spawned many successful projects. In general, programming for Tesla requires careful consideration of the hardware and re-thinking of CPU oriented algorithms.
NVIDIA Tesla Specifications:
|# of CUDA Cores||2496||2688||2880||4992|
|Clock Speed||706 MHz||732 MHz||745 MHz||562 MHz|
|Memory Size (GDDR5)||5 GB||6 GB||12 GB||24 GB|
|Memory Clock||2.6 GHz||2.6 GHz||3.0 GHz||2.8 GHz|
|Memory Bandwidth (ECC off)||208 GB/s||250 GB/s||288 GB/s||480 GB/s|
|ECC Memory Supported||Yes||Yes||Yes||Yes|
Intel® Xeon Phi™ Coprocessor
The Intel Xeon Phi x100 series of coprocessors offer double precision floating point performance approaching tera-FLOPS in a single add-in card. This performance is accessible through normal x86 instructions that leverages the high number of cores and 8GB high speed shared memory, the 512 bit wide SIMD vector unit and 4 layer hardware threading per core. Codes that have been optimized for standard Intel SSEx/AVX instructions should port readily to Phi. From a systems perspective the card appears as an additional node on an internal network over the PCIe bus and is indeed running an embedded Linux uOS with an interface provided by openbox. This means you can log into the card as a separate node and have a normal Linux command environment available. Booting, reseting, reconfiguring, user management, monitoring etc. is handled by a set of commands and kernel module which communicate with the card via a system daemon on the host. The Xeon Phi is an attractive alternative to the more well established NVIDIA Tesla CUDA environment. It provides a much more familiar programming "feel" and can take full advantage of Intel's advanced compiler suites.