Table of Contents
*** NOTE: THERE IS AN UPDATE/REPLACEMENT TO THIS POST AT ***
We (Puget Systems) recently configured a large order for machines with quad NVIDIA A6000 GPUs. Very nice machines! These are being used for Machine Learning/AI research and development work. They will be under heavy load and that means the GPUs will be running near their Max Power Limit most of the time. By default the max power limit on many of the high-end NVIDIA GPUs is 300+ Watts. This high power limit stresses the system power delivery and cooling as well as placing a high load on the circuit that the system is plugged into. We found that power limits can be lowered by 10 to 20 percent with only a small impact on overall compute performance in multi-GPU setups.
In this post I am referencing a Bash shell script I recently put together for setting up automatic NVIDIA GPU power-limit lowering at system boot. This allows a reliable way to configure and maintain multi-GPU systems for stable operation under heavy load.
Initial trials with GPU power limiting we discussed in these to blog posts;
The setup script is available on GitHub for use by anyone who may benefit from it.
Note that this script is still under development but is currently in a usable state. Expect updates and of course feel free to download the script and modify it as you wish.
The following is from the GitHub repository for this project.
This scrip can be used to install a systemd unit file that will set the power-limit for NVIDIA GPUs at system boot.
The bash script is for Ubuntu >= 18.04 but should be easy to adapt to other distributions
This script will;
- make sanity checks for OS version and NVIDIA GPUs
- create a config /usr/local/etc/nv-powerlimit.conf with the powerlimit value
- create and install /usr/local/sbin/nv-power-limit.sh
- create and install /etc/systemd/system/nv-power-limit.service
- enable nv-power-limit.service
- install is under /usr/local
!!** powerlimit will be set on all NVIDIA GPUs **!!
- Allow GPUs and power limits to be set independently .i.e. ./nv-gpu-powerlimit-setup.sh –gpus 0,1,2,3 –powerlimit 300,250,250,250
The higher end NVIDA RTX desktop GPUs like the RTX3090, A5000, etc.. Make wonderful compute devices in a multi-GPU setup. However the default power limits are set very high. As much as 350W! Those high power limits can strain the capability of a system power supply cooling capability and possibly even overload the circuit that the system is plugged into.
Our testing has shown that lowering the power limit to more reasonable values has very little impact on performance. https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Wattage-Limited-MaxQ-TensorFlow-Performance-1974/
./nv-gpu-powerlimit-setup.sh --help USAGE: sudo ./nv-powerlimit-setup.sh <powerlimit to set> <powerlimit to set> should probably be between 200 and 300 i.e. 280 should still give approx 95 percent performance on 350W GPU powerlimit will be stored in /usr/local/etc/nv-powerlimit.conf !!** If you do not know what all this means then do not use this script **!! see: https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Power-Limiting-with-Systemd-and-Nvidia-smi-1983/ This script will; - make sanity checks for OS version and NVIDIA GPUs - create a config /usr/local/etc/nv-powerlimit.conf with the powerlimit value - create and install /usr/local/sbin/nv-power-limit.sh - create and install /etc/systemd/system/nv-power-limit.service - enable nv-power-limit.service - install is under /usr/local !!** powerlimit will be set on all NVIDIA GPUs **!!
The setup script will do the initial setup for automatically setting power limits at during system boot. The script can be used again to reset the power limit. The power limit can also be changed in /usr/local/etc/nv-powerlimit.conf to the desired value and that value will be set at next reboot.
I hope this post gives and script are useful to you! The NVIDIA Ampere GPUs are wonderful compute devices and multi-GPU is an advantage in many applications. Having reasonable power limits on these GPUs will hopefully give you a much more stable and easy to manage system.
Happy computing! –dbk @dbkinghorn
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.
Why Choose Puget Systems?
Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.
We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!
By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry-leading ship time.
Puget Systems Hardware Partners