We've been curious about the performance of WSL for scientific applications and decided to do a few relevant benchmarks. This is also a teaser for some hardware-specific optimized application containerization that I've been working on!
This is just a short post to announce a more usable version of the NVIDIA GPU powerlimit setup script that I released a few months ago. This update to version 0.2 uses an interactive mode to set GPU powerlimits and optionally setup a systemd unit file to set these limits on subsequent reboots.
We have a new collection of GPU accelerated Molecular Dynamics benchmark packages put together for GROMACS, NAMD 2, and NAMD 3-alpha10. (The benchmark packages will be available to the public soon.) In this post we present results for, - 3 applications: GROMACS, NAND 2 and NAMD 3alpha10, - 8 MD simulations, - 12 different NVIDIA GPUs, - 96 total results.
In this post we look at using a testing Lab of Windows systems as a benchmarking platform for Linux scientific application using network boot with nfsroot and home mounts. Linux is boot on the systems "diskless" leaving the Windows installs untouched. LTSP turned out to be a great time saver for setting up the configuration.
This post presents testing data showing that power-limit reduction on NVIDIA GPUs have give significant benefits for both high wattage and lower wattage GPUs. Power-limit vs Performance data is presented for 1-4 A5000 and 1-4 RTX3090 GPUs.
In this post I am referencing a Bash shell script I recently put together for setting up automatic NVIDIA GPU power-limit lowering at system boot. This allows a reliable way to configure and maintain multi-GPU systems for stable operation under heavy load.
In this post I'll show you how to setup isolated conda envs for Python without having a base conda install! I'll cover Linux and Windows including an example to get you started. Read on to learn about the wonderful micromamba project.
This post will guide you through the process of creating an Ubuntu 20.04 (or newer) autoinstall ISO by modifying the default installer ISO. The install configuration will be done using cloud-init cloud-config method that is now used for the Ubuntu server installer.
The single socket version of Intel third generation Xeon SP is out, the Ice Lake Xeon W 33xx. This is a much better platform with faster large capacity 8 channel memory and PCIe v4 with plenty of lanes. The new Intel platform is very much like the AMD Threadripper Pro (single socket version of EPYC Rome) so this is the obvious comparison to make. Read on to see how the numerical computing testing went!
NVIDIA Enroot has a unique feature that will let you easily create an executable, self-contained, single-file package with a container image AND the runtime to start it up! This allows creation of a container package that will run itself on a system with or without Enroot installed on it! "Enroot Bundles".
For computing tasks like Machine Learning and some Scientific computing the RTX3080TI is an alternative to the RTX3090 when the 12GB of GDDR6X is sufficient. (Compared to the 24GB available of the RTX3090). 12GB is in line with former NVIDIA GPUs that were "work horses" for ML/AI like the wonderful 2080Ti.
The NVIDIA A100 (Compute) GPU is an extraordinary computing device. It's not just for ML/AI types of workloads. General scientific computing tasks requiring high performance numerical linear algebra run exceptionally well on the A100.
Enroot is a simple and modern way to run "docker" or OCI containers. It provides an unprivileged user "sandbox" that integrates easily with a "normal" end user workflow. I like it for running development environments and especially for running NVIDIA NGC containers. In this post I'll go through steps for installing enroot and some simple usage examples including running NVIDIA NGC containers.
The new Intel Rocket Lake CPUs have been officially released. There were numerous posts and reviews before the official release date of March 30 2021, but I haven't seen anything about the numerical compute performance. I've had access to a Core-i9 11900KF 8-core CPU and have compared it with (my own) AMD 5800X system.
Threadripper Pro! AMD has released the long awaited Threadripper Pro CPUs. I was able to spend a (long) day (and night) running compute performance testing on the flagship 64-core TR Pro 3995WX. In this post I've got some HPC workload benchmark results from putting this excellent CPU through its compute paces.
I recently wrote a post introducing Intel oneAPI that included a simple installation guide of the Base Toolkit. In that post I promised a follow-up about the the oneAPI AI Analytics Toolkit. This is it! I'll describe what it is and give recommendations for doing an install setup of the AI toolkits using conda with Anaconda Python.
Intel oneAPI is a massive collection of very high quality developer tools, and, it's free to use! In this post I'll give you a little background on what oneAPI is and my recommendations for doing an install setup to get started exploring the collection of tool-kits.
In this post I will show you how to install NVIDIA's build of TensorFlow 1.15 into an Anaconda Python conda environment. This is the same TensorFlow 1.15 that you would have in the NGC docker container, but no docker install required and no local system CUDA install needed either.
This is a follow up post to "Quad RTX3090 GPU Wattage Limited "MaxQ" TensorFlow Performance". This post will show you a way to have GPU power limits set automatically at boot by using a simple script and a systemd service Unit file.
Can you run 4 RTX3090's in a system under heavy compute load? Yes, by using nvidia-smi I was able to reduce the power limit on 4 GPUs from 350W to 280W and achieve over 95% of maximum performance. The total power load "at the wall" was reasonable for a single power supply and a modest US residential 110V, 15A power line.
The GeForce RTX3070 has been released. The RTX3070 is loaded with 8GB of memory making it less suited for compute task than the 3080 and 3090 GPUs. we have some preliminary results for TensorFlow, NAMD and HPCG.
When you install Miniconda3 or Anaconda3 on Windows it adds a PowerShell shortcut that has the necessary environment setup and initialization for conda. It's listed in the Windows menu as "Anaconda Powershell Prompt (Anaconda3)". However, this opens a separate/detached PowerShell instance and it would be nice to have this as an optional shell from Windows Terminal! In this post we will add that functionality as a new shell option in Windows Terminal.
The second new NVIDIA RTX30 series card, the GeForce RTX3090 has been released. The RTX3090 is loaded with 24GB of memory making it a good replacement for the RTX Titan... at significantly less cost! The performance for Machine Learning and Molecular Dynamics on the RTX3090 is quite good, as expected.
The much anticipated NVIDIA GeForce RTX3080 has been released. How good is it with TensorFlow for machine learning? How about molecular dynamics with NAMD? I've got some preliminary numbers for you!
WSL2 offers improved performance over version 1 by providing more direct access to the host hardware drivers. Recent "Insider Dev Channel" builds of Win10 even allows access to the Windows NVIDIA display driver for GPU computing applications for WSL2 Linux applications! The performance improvements with WSL2 are largely because this version is running as a privileged virtual machine on to of MS Hyper-V. This means that at least low level support for the Hyper-V virtualization layer needs to be enabled to use it. In particular, the Windows feature "VirtualMachinePlatform" must be enabled for WSL2. We tested to see if there was any negative application performance impact.