LLM Icon
Quad GPU LLM Server
Machine Learning and AI recommended system banner

Quad GPU Large Language Model Server

Compact 2U rackmount server supporting up to four NVIDIA GPUs for fine-tuning and inference with AI large language models.


Quad GPU 2U server supporting NVIDIA RTX Ada and L40S graphics cards

  • Up to 192GB of VRAM across four GPUs
  • Great for 70B parameter fp16 inference and fine-tuning smaller models
  • Requires two power connections on separate circuits
  • 240V power required for PSU redundancy

Not sure what you need?

and one of our experts will reply within 1 business day to help configure the right computer for your workflow. If you don’t see what you are looking for here, check out our other systems for more options.

System Core

NVIDIA Mellanox Dual 100GbE QSFP28 PCI-E Card   Limited Supply [add $1150.94]
Up to one OCP 3.0 card and one PCI-E card may be selected to provide high-speed networking capability.


Chassis & Cooling

For redundancy to be functional, the total power consumption of the system must be lower than the maximum output of an individual PSU module. That output depends on the input voltage.



Additional Information

Help us help you! We review each configuration to ensure you’re getting the right hardware. Any info you can provide about your workflow and software will help us provide you with a better experience.

System Cost


per unit

Typically ships in 1-2 weeks

Contact us for lead times

Contact us for quotes for more than 100 units