AI Training & Inference Server
Quad GPU On-Premises Large Language Model Training and Inference Server from Puget Systems Supporting Up to Four NVIDIA RTX 6000 Ada or A6000 GPUs
Enterprise AI Workloads. Workstation Hardware.
This rackmount workstation is built to host a web-based chat interface for large language models (LLMs) and supports multiple simultaneous users.
We have this exact system running at our office with a full set of four NVIDIA RTX 6000 Ada graphics cards. It is running Ubuntu 22.04 LTS with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI for the web interface. This model uses approximately 130GB of video memory (VRAM), and the system should work with any other LLM that fits within available GPU memory (192GB with four cards installed).
This configuration was also tested with Falcon-40b, which needs less memory space and can run on just two RTX 6000 Ada GPUs. Besides running a chat interface, this hardware is also suitable for base model fine-tuning within the available GPU memory limits. Using a 4-bit quantized QLoRA should work well.
- Sample response times (single user):
- Validation Time = 0.59673 ms
- Queue Time = 0.17409 ms
- Time per Token = 54.558 ms
- Stress tested with multiple concurrent users
- Data below is from a session with 114 prompts over 5 minutes
- Average prompt response under multi-user load:
- Validation Time = 3.0312 ms
- Queue Time = 4687.9 ms
- Time per Token = 68.076 ms
Try it out for yourself at the Puget Systems SIGGRAPH 2023 booth (#630)!
System Specifications
CPU | Intel Xeon W-3400 Processor |
RAM | Up to 512GB DDR5 ECC |
GPUs | Up to 4 NVIDIA RTX 6000 Ada or RTX A6000 |
Storage | Up to 3 M.2 NVMe SSDs and 8 SATA drives |
Chassis | SilverStone 5U Rackmount (convertible to tower) |
Power Supply | 1300W + 750W Dual PSUs (requires two outlets on a 20A 120V circuit) |
Operating System | Ubuntu 22.04 LTS Server |
Customize Your AI Server Today
This system is available for purchase now! Save a quote or buy now using the link below:
Why Choose Puget Systems?
Built Specifically for You
Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.
We’re Here, Give Us a Call!
We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!
Fast Build Times
By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry-leading ship time.
Lifetime Labor & Tech Support
Even when your parts warranty expires, we continue to answer your questions and service your computer with no labor costs.
Click here for even more reasons!
Equipped to Serve Customers of Any Size
Puget Systems has specialists on staff who cater to the needs of businesses and educational institutions. We are listed on numerous purchasing portals and offer optional onsite support. Click through to read more about how we can help your organization!
How Our Process Works
Configure
Customize your own desktop computer from scratch. You’re choosing from the best, because we only sell products we recommend and stand behind.
Refine
Let us save you money! Work with our experts to find the best choices for your needs and your budget, to give you the best bang for your buck.
Purchase
Place your order on our secure website. Buy a PC with complete confidence from our case studies and testimonials.
Track
Follow your order in real-time through our extensive checklist, and receive a tracking number by email. Most orders ship in 2 weeks.