AI Training & Inference Server
Quad GPU On-Premises Large Language Model Training and Inference Server from Puget Systems Supporting Up to Four NVIDIA RTX 6000 Ada or A6000 GPUs
Enterprise AI Workloads. Workstation Hardware.
This rackmount workstation is built to host a web-based chat interface for large language models (LLMs) and supports multiple simultaneous users.
We have this exact system running at our office with a full set of four NVIDIA RTX 6000 Ada graphics cards. It is running Ubuntu 22.04 LTS with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI for the web interface. This model uses approximately 130GB of video memory (VRAM), and the system should work with any other LLM that fits within available GPU memory (192GB with four cards installed).
This configuration was also tested with Falcon-40b, which needs less memory space and can run on just two RTX 6000 Ada GPUs. Besides running a chat interface, this hardware is also suitable for base model fine-tuning within the available GPU memory limits. Using a 4-bit quantized QLoRA should work well.
- Sample response times (single user):
- Validation Time = 0.59673 ms
- Queue Time = 0.17409 ms
- Time per Token = 54.558 ms
- Stress tested with multiple concurrent users
- Data below is from a session with 114 prompts over 5 minutes
- Average prompt response under multi-user load:
- Validation Time = 3.0312 ms
- Queue Time = 4687.9 ms
- Time per Token = 68.076 ms
Try it out for yourself at the Puget Systems SIGGRAPH 2023 booth (#630)!
|CPU||Intel Xeon W-3400 Processor|
|RAM||Up to 512GB DDR5 ECC|
|GPUs||Up to 4 NVIDIA RTX 6000 Ada or RTX A6000|
|Storage||Up to 3 M.2 NVMe SSDs and 8 SATA drives|
|Chassis||SilverStone 5U Rackmount (convertible to tower)|
|Power Supply||1300W + 750W Dual PSUs|
(requires two outlets on a 20A 120V circuit)
|Operating System||Ubuntu 22.04 LTS Server|
Customize Your AI Server Today
This system is available for purchase now! Save a quote or buy now using the link below:
Intel Xeon W-3400
Optimized for 4x RTX 6000 Ada
512GB DDR5 ECC
Why Choose Puget Systems?
Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.
We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!
By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry-leading ship time.
Equipped to Serve Customers of Any Size
Puget Systems has specialists on staff who cater to the needs of businesses and educational institutions. We are listed on numerous purchasing portals and offer optional onsite support. Click through to read more about how we can help your organization!
How Our Process Works
Let us save you money! Work with our experts to find the best choices for your needs and your budget, to give you the best bang for your buck.