Multi-headed VMWare Gaming Setup

Always look at the date when you read an article. Some of the content in this article is most likely out of date, as it was written on July 9, 2014. For newer information, see our more recent articles.

Table of Contents

Introduction

At Puget Systems, we are constantly trying out new (and sometimes old) technologies in order to better serve our customers. Recently, we were given the opportunity to evaluate desktop virtualization with NVIDIA GRID which GPU virtualization and virtual machines to stream a virtual desktop with full GPU acceleration to a user. NVIDIA GRID is built around streaming the desktop, which requires robust network infrastructure and high quality thin clients. Even with the best equipment, there is latency, video compression, and high CPU overhead. These can be worked around for many applications, but are all big turn-offs to gamers.

What that in mind, we set out to build a PC that uses virtualization technologies to allow multiple users to game on one PC but where there is no streaming and no additional latency because all of the user inputs (video, sound, keyboard and mouse) are directly connected to the PC. By creating virtual machines and using a mix of shared resources (CPU, RAM, hard drive and LAN) and dedicated resources (GPU and USB) we were able to create a PC that allows up to four users to game on it at the same time. Since gaming requires minimal input and display lag, we kept the GPU and USB controllers outside of the shared resource pool and directly assigned them to each virtual OS which allows the keyboard/mouse input and video output to bypass the virtualization layer. The end result is a single PC running four virtual machines; each of which behaves and feels like any other traditional PC.

Hardware Requirements

Unlike a normal PC, there are a number of hardware requirements that limit what hardware we were able to use in our multi-headed gaming PC. Since we are using virtual machines to run each virtual OS, the main requirement was that the motherboard and CPU both support virtualization which on Intel-based systems is most commonly called VT-x and VT-d. Checking a CPU for virtualization support is usually easy as it is listed right in the specifications for the CPU. Motherboards are a bit trickier since virtualization support is not often listed in the specs, but typically either the BIOS or the manual will have settings for "VT-d" and/or "Intel Virtualization Technology". If those options (or different wording of the same setting) are available, then virtualization and PCI Passthrough should work.

Also, since we are passing video cards through to each virtual OS, the video card itself needs to actually support PCI passthrough. This was the most difficult hardware requirement we had to figure out since video cards do not list anywhere (at least that we could find) whether or not they support it. In our research and contact with manufacturers, we found that almost all AMD cards (Radeon and FirePro) work, but from NVIDIA officially only Quadro and GRID (not GeForce) support it.

We tried to get a number of GeForce cards to work (we tested with a GTX 780 Ti, GTX 660 Ti, and GTX 560), but no matter what we tried they always showed up in the virtual machine's device manager with a code 43 error. We scoured the internet for solutions and worked directly with NVIDIA but never found a good solution. After a lot of effort, NVIDIA eventually told us that PCI passthrough is simply not supported on GeForce cards and that they have no plans to add it in the immediate future.

Update 8/1/2014: We still don't know of a way to get NVIDIA GeForce cards to work in VMWare, but we have found that you can create a multiheaded gaming PC by using Ubuntu 14.04 and KVM (Kernel-based Virtual Machine). If you are interested, check out our guide.

In addition, we also had trouble with multi-GPU cards like the new AMD Radeon R9 295×2. We could pass it through OK, but the GPU driver simply refused to install properly. Most likely this is an issue with passing through the PCI-E bridge between the two GPUs, but no matter what is actually causing the issue the end result is that multi-GPU cards currently do not work well for PCI passthrough.

While this list is nowhere near complete, we specifically tested the following cards during this project:

Cards that work	Cards that don't work
AMD Radeon R9 280	NVIDIA GeForce GTX 780 Ti
AMD Radeon R7 250	NVIDIA GeForce 660 Ti
AMD Radeon HD 7970	NVIDIA GeForce 560
NVIDIA Quadro K2000	AMD Radeon R9 295×2

For our final testing configuration, we ended up using the following hardware:

Testing Hardware
Motherboard:	Asus P9X79 WS
CPU:	Intel Xeon E5-2695 v2 12 Core @ 2.4 GHz
RAM:	4x Kingston DDR3-1600 8GB ECC Reg. (32GB total)
GPU:	4x ASUS Radeon R9 280 3GB DirectCU II
Hard Drive:	Samsung 840 EVO 1TB SATA 6Gb/s SSD
PSU:	Silverstone ST1500 1500W

Four Radeon R9 280 video cards put out quite a bit of heat – especially when stacked right on top of each other – so we had to have a good amount of airflow in our chassis to keep them adequately cooled. There are a number of chassis available that have enough PCI slots and good fan placement like the Rosewill Blackhawk Ultra or Xigmatek Elysium that would work for this configuration, but for our testing we used a custom acrylic chassis since we were planning on showing this system off in our booth at PDXLAN 24.

Our test system in a custom acrylic enclosure with four Asus Radeon R9 280 DirectCU II video cards. Note the four groups of keyboard/mouse/video cables in the third picture that go to the four sets of keyboard/mouse/monitor.

A very similar system we recently built for a customer (for a completely different purpose) in a Xigmatek Elysium chassis with four XFX Radeon R9 280X video cards.

Virtual Machine Setup

Since we want to have four operating systems running at the same time, we could not simply install an OS onto the system like normal. Instead, we had to run a virtual machine hypervisor on the base PC and create multiple virtual machines inside that. Once the virtual machines were created, we were then able to install an OS on each of them and have them all run at the same time.

While there are many different hypervisors we could have used, we chose to use VMWare ESXI 5.5 to host our virtual machines since it is the one we are most familiar with. We are not going to go through the entire setup in fine detail, but to get our four virtual machines up and running we performed the following:

Step 1
We installed the VMWare ESXI 5.5 hypervisor on the system and assigned a static IP to the network adapter so we could remotely manage it.

Step 2
In Configuration -> Advanced Settings -> Edit, we selected the devices we wanted to pass through from the main system to the individual virtual machines. In our case, we passed through two USB 2.0 controllers, two USB 3.0 controllers, and the four AMD Radeon R9 280 video cards (both the GPU itself and the HDMI audio controller).

Step 3
Next we created four virtual machines and added one USB controller and one video card to the machine's PCI devices making sure to include both the GPU itself and the HDMI audio device. Figuring out which USB controller was which on the motherboard was matter of trial and error, but we eventually got it set so that we knew which USB ports were allocated to each virtual machine.

For each virtual machine we assigned 4 vCPUs, 7GB of RAM, and 180GB of storage space.

Step 4
With the PCI devices added, we next changed the boot firmware from BIOS to EFI. This was really only required to get the USB 2.0 controllers to function properly on our motherboard as a passthrough device, but for the sake of consistency we changed all of the virtual machines to EFI.

Step 5
For the final configuration step, we entered the datastore in Configuration -> Storage, downloaded the .vmx file for each virtual machine, added

pciHole.start = "1200"
pciHole.end = "2200"

to the file and re-uploaded it to the datastore. This was required for the AMD cards to properly pass through to the virtual machines. If you are doing this yourself and are having troubles, you could also try adding pciPassthru0.msiEnabled = "FALSE" where "0" is the PCI number for both the GPU and HDMI audio device.

With all this preparatory setup complete, we were able to install Windows 8.1 through the vSphere client console. Once we had the GPU driver installed we were then able to plug a monitor and keyboard/mouse into the appropriate GPU and USB ports, configure the display settings to use the physical monitor instead of the VMWare console screen and complete the setup and testing as if the virtual machine was any other normal PC.

Performance and Impressions

Since resource sharing makes it really difficult to benchmark hardware performance we are not going to get into anything like benchmark numbers. Really, the performance of each virtual OS is going to entirely depend on what hardware you have in the system and how you have that hardware allocated to each virtual machine. Instead of FPS performance, what we are actually more concerned about is if there is any input or display lag. Since gaming is so dependent on minimizing lag, this is also a great way to test the technology in general. If there are no problems while gaming then less rigorous tasks like web browsing, word processing, Photoshop, etc. should be no problem.

To get a subjective idea of how a multi-headed gaming system performs, we loaded four copies of Battlefield 4 onto the four virtual machines and "borrowed" some employees from our production department to test out our setup. After getting the video settings dialed in (2560×1440 with med/high settings gave us a solid 60 FPS in 48 man servers), we simply played the game for a while to see how it felt. Universally, everyone who tried it said that they noticed absolutely no input or display lag. So from a performance standpoint, we would call this a complete success!

Screenshot of Battlefield 4 running on 1/4 of our test system (one of the four virtual machines)

2560×1440 with a mix of medium and high settings gave us >60 FPS in 48 man servers

With four people playing Battlefield 4 at the same time, our setup was not really limited by either the CPU or GPU since both were performing near 100%. Instead, we found that the ASUS Radeon R9 280 paired almost perfectly with four shared cores from the Xeon E5-2695 v2. However, keep in mind that the Xeon E5-2695 v2 we used is pretty much the fastest Xeon CPU currently available so if you are considering doing this yourself you may run into a CPU limitation if you want to have four people gaming at once. Of course, you could very easily use a dual CPU motherboard with a pair of more reasonably priced Xeons to get even more CPU power for your dollar than what we have in our test system.

Conclusion & Use Cases

Getting four gaming machines running on a single physical system was not quite as easy as we originally expected it to be, but it worked very well once we figured out all the little tricks. The only obstacle we were not able to overcome was the fact that NVIDIA GeForce cards do not support virtualization, but the other issues we ran into were all resolved with various small configuration and setup adjustments. With those things figured out, it really was not overly difficult to get each virtual OS up and running with its own dedicated GPU and USB controller.

Performance wise, we are very impressed with how well this configuration worked. Being a custom PC company, we have plenty of employees that enjoy gaming and none of them noted any input or display lag. But beyond the cool factor, what is the benefit of doing something like this over using four traditional PCs with specs similar to each of the four virtual machines?

Hardware consolidation – This probably is not terribly important to many users, but having a single physical machine instead of four means that you have less hardware to maintain and a smaller physical footprint.
Shared resources – For our testing we only assigned four CPU cores to each of our virtual machines since we knew that we were going to be loading each virtual machine equally but with virtualization you typically would over-allocate resources like CPU cores to a much greater extent. Instead of equally dividing up a CPU, with virtualization you can assign anything from a single core to all the cores on the physical CPU to each virtual machine. Modern virtualization is efficient enough that it can dynamically allocate resources to each virtual machine on the fly which gives each machine the maximum performance possible at any given time. Especially in non-gaming environments, it is rare to use more than a small percentage of the CPUs available power the majority of the time, so why not let your neighbor use it while they use Photoshop or encode a video?
Virtual OS management – Since virtual environments are primarily made for use with servers, they have a ton of features that make them very easy to manage. For example, if you want to add a new virtual machine or replace an existing machine that is having problems, you can simply make a copy of one of the other virtual machines. You need to change things like CD keys and the PCI passthrough devices, but the new machine would be up and running in the fraction of the time it would take to install it from scratch. In addition, you can use what are called "Snapshots" to create images of a virtual machine at any point. These images can be used to revert the virtual machine back to a previous state which is great for recovering from things like virsuses. In fact, you can even set a virtual machines to revert to a specific snapshot whenever the machine is rebooted. This makes it so you don't have to worry about what a user might install on the machine since it will automatically revert to the specified snapshot when the machine is shut down.

As for specific use-cases, a multi-headed system could be useful almost any time you have multiple users in a local area. Internet and gaming cafes, libraries, schools, LAN parties, or any other place where a user is temporarily using a computer would likely love the snapshot feature of virtual machines. In fact, you could even give all users administrator privileges and just let them install whatever they want since you can have the virtual machine set to revert back to a specific snapshot automatically.

In a more professional environment, snapshots might not be as exciting (although they would certainly still be very beneficial), but the ability to share hardware resources to give extra processing power to users when they need would be very useful. While it varies based on the profession, most employees spend the majority of their time doing work that requires little processing power intermixed with periods where they have to wait on the computer to complete a job. By sharing resources between multiple users, you can dramatically increase the amount of processing power available to each user – especially if it is only needed in short bursts.

Overall, a multi-headed system is very interesting but is a bit of a niche technology. The average home user would probably never use something like this, but it definitely has some very intriguing real-world benefits. Would something like this be useful in either your personal or professional life? Let us know how you would use it in the comments below.

Extra: How far is too far?

For this project we used four video cards to power four gaming virtual machines because that was a very convienant number considering the PCI slot layout and the fact that the motherboard had four onboard USB controllers. However, four virtual machines is not at all the limit of this technology. So just how many virtual desktops could you run off a single PC with each desktop still having direct access to its own video card and USB controller?

The current Intel Xeon E5 CPUs have 32 available lanes that PCI-E cards can use. If you used a quad Xeon system you would get 128 PCI-E lanes which you could theoretically divide into 128 individual PCI-E x1 slots using PCI-E expanders and risers. The video cards would likely see a bit of a performance hit unless you are using very low-end cards, but by doing this you could technically get 66 virtual personal computers from a single quad Xeon system (assuming the motherboard has 4 onboard USB controllers).

Is 66 virtual machines off a single box too far? Honestly: yes. The power requirements, cooling, layout and overall complexity is pretty ridiculous at that point. Plus, how would you even fit 66 users around one PC (if it could even be called a PC at that point)? USB cables only have a maximum length of about 16 feet, so very quickly you would simply run out of space to put people. Really, at that point you should probably look into virtual desktop streaming instead of the monstrosity above that we mocked up in Photoshop.