AI Training & Inference Server

Quad GPU On-Premises Large Language Model Training and Inference Server from Puget Systems Supporting Up to Four NVIDIA RTX 6000 Ada or A6000 GPUs

Buy Now

Enterprise AI Workloads. Workstation Hardware.

This rackmount workstation is built to host a web-based chat interface for large language models (LLMs) and supports multiple simultaneous users.

Two Silverstone RM51 Cases in Horizontal and Vertical Positions Next To Each Other

We have this exact system running at our office with a full set of four NVIDIA RTX 6000 Ada graphics cards. It is running Ubuntu 22.04 LTS with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI for the web interface. This model uses approximately 130GB of video memory (VRAM), and the system should work with any other LLM that fits within available GPU memory (192GB with four cards installed).

This configuration was also tested with Falcon-40b, which needs less memory space and can run on just two RTX 6000 Ada GPUs. Besides running a chat interface, this hardware is also suitable for base model fine-tuning within the available GPU memory limits. Using a 4-bit quantized QLoRA should work well.

Sample response times (single user):
- Validation Time = 0.59673 ms
- Queue Time = 0.17409 ms
- Time per Token = 54.558 ms
Stress tested with multiple concurrent users
- Data below is from a session with 114 prompts over 5 minutes
Average prompt response under multi-user load:
- Validation Time = 3.0312 ms
- Queue Time = 4687.9 ms
- Time per Token = 68.076 ms

Try it out for yourself at the Puget Systems SIGGRAPH 2023 booth (#630)!

System Specifications

Puget Systems Quad GPU AI Training and Inference Server in Silverstone 5U Rackmount Chassis

CPU	Intel Xeon W-3400 Processor
RAM	Up to 512GB DDR5 ECC
GPUs	Up to 4 NVIDIA RTX 6000 Ada or RTX A6000
Storage	Up to 3 M.2 NVMe SSDs and 8 SATA drives
Chassis	SilverStone 5U Rackmount (convertible to tower)
Power Supply	1300W + 750W Dual PSUs (requires two outlets on a 20A 120V circuit)
Operating System	Ubuntu 22.04 LTS Server

Customize Your AI Server Today

This system is available for purchase now! Save a quote or buy now using the link below:

Quad GPU Rackmount 5U

Intel Xeon W-3400
Optimized for 4x RTX 6000 Ada
512GB DDR5 ECC

Why Choose Puget Systems?

Built Specifically for You

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

We’re Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry-leading ship time.

Lifetime Labor & Tech Support

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.
Click here for even more reasons!

Equipped to Serve Customers of Any Size

Puget Systems has specialists on staff who cater to the needs of businesses and educational institutions. We are listed on numerous purchasing portals and offer optional onsite support. Click through to read more about how we can help your organization!

Enterprise

Government & Education

How Our Process Works

Configure

Customize your own desktop computer from scratch. You’re choosing from the best, because we only sell products we recommend and stand behind.

Refine

Let us save you money! Work with our experts to find the best choices for your needs and your budget, to give you the best bang for your buck.

Purchase

Place your order on our secure website. Buy a PC with complete confidence from our case studies and testimonials.

Track

Follow your order in real-time through our extensive checklist, and receive a tracking number by email. Most orders ship in 2 weeks.