Benchmarking with TensorRT-LLM

Evaluating the speed of GeForce RTX 40-Series GPUs using NVIDIA’s TensorRT-LLM tool for benchmarking GPU inference performance.

LLM Server Setup Part 1 – Base OS

This post is Part 1 in a series on how to configure a system for LLM deployments and development usage. The configuration will be suitable for multi-user deployments and also useful for smaller development systems. Part 1 is about the base Linux server setup.

NVIDIA GTC logo image

GTC23  Notes And Selected Sessions

NVIDIA GTC 2023 was outstanding! To say that about a virtual conference tells you how much I value it. This post is largely a catalog of the talks I found interesting along with titles that I think will be interesting to a larger audience and my colleagues at Puget Systems.

Problems With RTX4090 MultiGPU and AMD vs Intel vs RTX6000Ada or RTX3090

I was prompted to do some testing by a commenter on one of my recent posts. They had concerns about problems with dual NVIDIA RTX4090s on AMD Threadripper Pro platforms. I ran some applications to reproduce the problems reported above and tried to dig deeper into the issues with more extensive testing. The included table below tells all!

Ryzen 7950x Zen4 AVX512 Performance With AMD AOCCv4 HPL HPCG HPL-MxP

This post is a first-look at performance of the Ryzen7 7950x CPU using the latest AMD compiler release with support for Zen4 arch including AVX512 vector instructions. Performance is tested using the HPC standard benchmarks, HPL (High Performance Linpack), HPCG (High Performance Conjugate Gradient) and the newer HPC Top500 benchmark, HPL-MxP (formerly HPL-AI).