Press Release: Puget Systems Debuts “On-Premise” AI and ML Server

Puget Systems Debuts “On-Premise” Generative AI and Machine Learning Custom Server at SIGGRAPH 2023

Puget Systems’ Specialized AI Training and Inference Server Supports Up to Four NVIDIA RTX 6000 Ada Graphics Cards to Host Web-Based Chat Interface

Auburn, WA (August 8, 2023) – Puget Systems (www.pugetsystems.com) today announced it will debut a custom Generative AI and Machine Learning server at SIGGRAPH 2023 in Los Angeles this week. In its booth #630 in the LA Convention Center, the team from Puget Systems will demonstrate its new specialized AI Training and Inference server, configured with four NVIDIA RTX 6000 Ada graphics cards to handle intensive generative AI and machine learning and to effectively manage real-time rendering, graphics, AR/MR/VR/XR, compute, and deep learning processing.

The Puget Systems AI Training and Inference server is a rackmount workstation capable of hosting a web-based chat server using STOA models such as the Meta-Llama-2-70b large language models (LLMs) supporting multiple simultaneous users. Puget Systems Labs conducted extensive testing of this configuration with Llama-2-70b and Falcon-40b. (Falcon-40b requires less memory space and can run with only two RTX 6000 Ada GPUs.) In addition to running a chat interface, this hardware is also suitable for base model fine-tuning within the available GPU memory limits.

Puget Labs Testing Processes and Results

The Puget Systems Lab team conducted extensive testing of the new AI Training and Inference servier, utilizing a full set of four NVIDIA RTX 6000 Ada graphics cards. Labs tested the system with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI. The test model used approximately 130GB of video memory (VRAM), and the Labs confirmed that the system should work well with other LLMs that fit within available GPU memory (192GB with four cards installed).

Following are some notable performance stats from the testing:

Typical usage measured response:
- Validation Time = 0.59673 ms
- Queue Time = 0.17409 ms
- Time per Token = 54.558 ms
Stress tested with multiple concurrent users
- Data below is from a session with 114 prompts (20-30 users) over 5 minutes
Average prompt response under multi-user load:
- Validation Time = 3.0312 ms
- Queue Time = 4687.9 ms
- Time per Token = 68.076 ms

For more information on Puget Systems AI Training and Inference server, please visit here.

Pricing and Availability

Puget Systems custom AI Training and Inference servers will be available for configuration for a wide range of generative AI applications beginning in the coming weeks. To learn more or to join the waitlist, please visit here. To learn more about Puget Systems Canadian consulting and sales operations, please visit here.

About Puget SystemsPuget Systems is based in the Seattle suburb of Auburn, WA, and specializes in high performance custom built computers. We emphasize customization with laser focus on understanding each customer’s specific workflow, and offer personal consulting and support that we believe is becoming quite rare in the industry. Our goal is to provide each client with the best possible computer for their needs and budget. For more information or to see how Puget Systems can design a system specifically tailored to the work that you do, please visit www.pugetsystems.com.

Tags: Generative AI, Machine Learning, Press, SIGGRAPH