LLM Icon
AI Large Language Models
Machine Learning and AI recommended system banner

Large Language Model Servers

These rackmount AI servers offer high GPU memory capacities in order to facilitate inference and training with cutting-edge large language models (LLMs).

  System Image System Image

4 GPU LLM Server

8 GPU LLM Server

Puget’s Take
Puget’s Take

Compact 2U server for large language model inference

Maximum GPU power in a 4U server for LLM inference and training

CPU AMD EPYC 9354P 2 x AMD EPYC 9354
GPU(s) 4 x NVIDIA L40S 48GB 8 x NVIDIA L40S 48GB
RAM 384GB DDR5-4800 REG ECC (12x32GB) 768GB DDR5-4800 REG ECC (24x32GB)

NVIDIA RTX Ada and L40S available!

Provides up to 192 GB of VRAM

70B model inference in fp16 with room for large context / KV cache

NVIDIA RTX Ada, L40S, and H100 NVL Available!

Provides up to 752 GB of VRAM

150B model inference in fp16 with room for large context / KV cache

Price as Configured
Price as Configured



Starting At
Starting At
  Configure Configure

Our Customers Include

View more of our customers here.

Equipped to Serve Customers of Any Size

Puget Systems has specialists on staff who cater to the needs of businesses and educational institutions. We are listed on numerous purchasing portals and offer optional onsite support. Click through to read more about how we can help your organization!

We specialize in building workstation PCs tailored for each of our customers. The best way we’ve found to accomplish that is to speak with you directly. There is no cost or obligation, and our no-pressure, non-commissioned consultants are experts at configuring a computer that will meet your specific needs. They are happy to discuss a quote you have already saved or guide you through each step of the process by asking a few questions about how you’ll be using your computer. There are several ways to start a conversation with us, so please pick what works best for you:


If you’d rather not wait, you can reach out to us via phone during our business hours.

Monday – Friday | 7am – 5pm (Pacific)