How I used “Vibe Coding” and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.


How I used “Vibe Coding” and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.

A brief look into using a hybrid GPU/VRAM + CPU/RAM approach to LLM inference with the KTransformers inference library.
An introduction to NPU hardware and its growing presence outside of mobile computing devices.

Presenting local AI-powered software options for tasks such as image & text generation, automatic speech recognition, and frame interpolation.

This post is Part 2 in a series on how to configure a system for LLM deployments and development usage. Part 2 is about installing and configuring container tools, Docker and NVIDIA Enroot.

This post is Part 1 in a series on how to configure a system for LLM deployments and development usage. The configuration will be suitable for multi-user deployments and also useful for smaller development systems. Part 1 is about the base Linux server setup.

In this post address the question that’s been on everyone’s mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?
This is just a short post to announce a more usable version of the NVIDIA GPU powerlimit setup script that I released a few months ago. This update to version 0.2 uses an interactive mode to set GPU powerlimits and optionally setup a systemd unit file to set these limits on subsequent reboots.
This post presents testing data showing that power-limit reduction on NVIDIA GPUs have give significant benefits for both high wattage and lower wattage GPUs. Power-limit vs Performance data is presented for 1-4 A5000 and 1-4 RTX3090 GPUs.
In this post I am referencing a Bash shell script I recently put together for setting up automatic NVIDIA GPU power-limit lowering at system boot. This allows a reliable way to configure and maintain multi-GPU systems for stable operation under heavy load.