Accelerated Parallel Computing
with NVIDIA Tesla and GPU Compute
Peak delivers the highest possible compute performance into the hands of developers, scientists, and engineers to advance computing enabled discovery and solution of the world's most challenging computational problems.
Puget Systems has over 19 years experience designing and building high quality and high performance PCs. Our emphasis has always been on reliability, high performance, and quiet operation. We take this experience to the HPC sector with our Peak family of workstations and servers. Through in-house testing we do not blindly follow the industry -- we help lead it. We provide the products below as starting points that we feel cover some of the most compelling areas that we can contribute to the HPC community. Do you have a project that needs some serious compute power, and you don't know where to turn? Let us help, it's what we do!
Minimum noise and maximum performance, reliability and usability. Puget Peak is an evolutionary step from our custom systems experience. Genesis performance post-production, Summit server stability, Serenity silent design, Obsidian reliability and even the diminutive Echo have influenced Peak.
TeraFLOPS. Using Intel Xeon CPU's and the Intel MKL library, or the well established CUDA platform and libraries, there is tremendous potential for applications leveraging the computing power of both the CPU and the GPU.
Ready for use. Peak systems are installed, configured and tested under load before they ship and will (optionally) arrive with the setup and tools you need to get started. Our CentOS setup will provide a configuration that can be the basis of your working environment.
Part of what makes our cooling both effective and quiet is that we specifically target the hot spots of each system. We place fans only where they are needed and only when they are needed. We then verify the final configuration with extensive testing, full load stress testing, and thermal imaging to ensure excellent cooling.
We know that these PCs are intended for heavy, long duration workloads. We have designed them for long life with 24/7 load, and that is our primary design goal. Through targeted cooling and high quality thermal solutions, we are able to achieve an excellent low noise level while maintaining the cooling necessary for long term high load. Even better, since we are implementing a custom cooling plan for each order, if you have a preference of whether you'd like us to tune more aggressively in either direction (towards even quieter operation, or more extreme cooling), all you have to do is let us know!
Ubuntu 19.04 will be released soon so I decided to see if CUDA 10.1 could be installed on it. Yes, it can and it seems to work fine. In this post I walk through the install and show that docker and nvidia-docker also work. I ran TensorFlow 2.0- alpha on Ubuntu 19.04 beta.
TensorFlow Performance with 1-4 GPUs -- RTX Titan, 2080Ti, 2080, 2070, GTX 1660Ti, 1070, 1080Ti, and Titan VWritten on 03/14/2019 by Dr Donald Kinghorn
I have updated my TensorFlow performance testing. This post contains up-to-date versions of all of my testing software and includes results for 1 to 4 RTX and GTX GPU's. It gives a good comparative overview of most of the GPU's that are useful in a workstation intended for machine learning and AI development work.
There are 2 recent Intel processors that are really strange, the Xeon W-3175X 28-core, and the Core i9 9990XE overclocked 14-core. I was able to get a little time in on the these processors. I ran a couple of numerical compute performance tests with the Intel MKL Linpack benchmark and NAMD. I used the same system image that I had used recently to look at 3 Intel 8-core processors so I will include those results here as well. **There will be results for W-3175, 9990XE, 9800X, W-2145, and 9900K**.
RTX Titan TensorFlow performance with 1-2 GPUs (Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti, and Titan V)Written on 01/30/2019 by Dr Donald Kinghorn
I've done some testing with 2 NVIDIA RTX Titan GPU's running machine learning jobs with TensorFlow. The RTX Titan is a great card but there is good news and bad news.
In this post I'll take a brief look at the numerical computing performance of three very capable 8-core processors -- i9 9900K, i9 9800X and Xeon 2145W All three are great CPU's but there are some significant differences that can cause confusion. I'll discuss these differences and see how the processors stack up when running Linpack and NAMD molecular dynamics simulations.
There has been some concern about Peer-to-Peer (P2P) on the NVIDIA RTX Turing GPU's. P2P is not available over PCIe as it has been in past cards. It is available with very good performance when using NVLINK with 2 cards. I did some testing to see how the performance compared between the GTX 1080Ti and RTX 2080Ti. There were some interesting results!
In my recent testing with the AMD Threadripper 2990WX is was impressed by the CPU based performance with the molecular dynamics program NAMD. NAMD makes a good benchmark for looking at CPU/GPU performance since it requires a balance and is usually limited by CPU. After some discussions I decided it would be good to look at multi-GPU performance with NAMD on Threadripper.
I recently wrote a post about building and running AMD Threadripper 2990WX with HPL Linpack - a "How-To". Most of the time I had with the processor went into getting that to work. However, I did run a few other test jobs that I thought the 2990WX would do well with. I compared that against my personal workstation with a Xeon-W 2175. In this post I share those test runs with you. It's not thorough testing by any means but it was interesting and I was surprised a couple of times with the results.
How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper -- 2990WX 32-core PerformanceWritten on 11/30/2018 by Dr Donald Kinghorn
The AMD Ryzen Threadripper 2990WX with 32 cores is an intriguing processor. I've been asked about performance for numerical computing and decided to find out how well it would do with my favorite benchmark the "High Performance Linpack" benchmark. This is used to rank Supercomputers on the Top500 list. It is not always simple to run this test since it can require building a few libraries from source. This includes the all important BLAS library which AMD has optimized in their BLIS package. I give you a complete How-To guide for getting this running to see what the 2990WX is capable of.
RTX 2080Ti with NVLINK - TensorFlow Performance (Includes Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti and Titan V)Written on 10/26/2018 by Dr Donald Kinghorn
More Machine Learning testing with TensorFlow on the NVIDIA RTX GPU's. This post adds dual RTX 2080 Ti with NVLINK and the RTX 2070 along with the other testing I've recently done. Performance in TensorFlow with 2 RTX 2080 Ti's is very good! Also, the NVLINK bridge with 2 RTX 2080 Ti's gives a bidirectional bandwidth of nearly 100 GB/sec!