| |||||||||||||||||||||||||
IntroductionAt Puget Systems, we are constantly trying out new (and sometimes old) technologies in order to better serve our customers. Recently, we were given the opportunity to evaluate desktop virtualization with NVIDIA GRID which GPU virtualization and virtual machines to stream a virtual desktop with full GPU acceleration to a user. NVIDIA GRID is built around streaming the desktop, which requires robust network infrastructure and high quality thin clients. Even with the best equipment, there is latency, video compression, and high CPU overhead. These can be worked around for many application, but are all big turn-offs to gamers. What that in mind, we set out to build a PC that uses virtualization technologues to allow multiple users to game on one PC but where there is no streaming and no additional latency because all of the user inputs (video, sound, keyboard and mouse) are directly connected to the PC. By creating virtual machines and using a mix of shared resources (CPU, RAM, hard drive and LAN) and dedicated resources (GPU and USB) we were able to create a PC that allows up to four users to game on it at the same time. Since gaming requires minimal input and display lag, we kept the GPU and USB controllers outside of the shared resource pool and directly assigned them to each virtual OS which allows the keyboard/mouse input and video output to bypass the virtualization layer. The end result is a single PC running four virtual machines; each of which behaves and feels like any other traditional PC. | |||||||||||||||||||||||||
Hardware RequirementsUnlike a normal PC, there are a number of hardware requirements that limit what hardware we were able to use in our multi-headed gaming PC. Since we are using virtual machines to run each virtual OS, the main requirement was that the motherboard and CPU both support virtualization which on Intel-based systems is most commonly called VT-x and VT-d. Checking a CPU for virtualization support is usually easy as it is listed right in the specifications for the CPU. Motherboards are a bit trickier since virtualization support is not often listed in the specs, but typically either the BIOS or the manual will have settings for "VT-d" and/or "Intel Virtualization Technology". If those options (or different wording of the same setting) are available, then virtualization and PCI Passthrough should work. Also, since we are passing video cards through to each virtual OS, the video card itself needs to actually support PCI passthrough. This was the most difficult hardware requirement we had to figure out since video cards do not list anywhere (at least that we could find) whether or not they support it. In our research and contact with manufacturers, we found that almost all AMD cards (Radeon and FirePro) work, but from NVIDIA officially only Quadro and GRID (not GeForce) support it. We tried to get a number of GeForce cards to work (we tested with a GTX 780 Ti, GTX 660 Ti, and GTX 560), but no matter what we tried they always showed up in the virtual machine's device manager with a code 43 error. We scoured the internet for solutions and worked directly with NVIDIA but never found a good solution. After a lot of effort, NVIDIA eventually told us that PCI passthrough is simply not supported on GeForce cards and that they have no plans to add it in the immediate future. Update 8/1/2014: We still don't know of a way to get NVIDIA GeForce cards to work in VMWare, but we have found that you can create a multiheaded gaming PC by using Ubuntu 14.04 and KVM (Kernel-based Virtual Machine). If you are interested, check out our guide. In addition, we also had trouble with multi-GPU cards like the new AMD Radeon R9 295x2. We could pass it through OK, but the GPU driver simply refused to install properly. Most likely this is an issue with passing through the PCI-E bridge between the two GPUs, but no matter what is actually causing the issue the end result is that multi-GPU cards currently do not work well for PCI passthrough. While this list is nowhere near complete, we specifically tested the following cards during this project:
| |||||||||||||||||||||||||
Virtual Machine SetupSince we want to have four operating systems running at the same time, we could not simply install an OS onto the system like normal. Instead, we had to run a virtual machine hypervisor on the base PC and create multiple virtual machines inside that. Once the virtual machines were created, we were then able to install an OS on each of them and have them all run at the same time. While there are many different hypervisors we could have used, we chose to use VMWare ESXI 5.5 to host our virtual machines since it is the one we are most familiar with. We are not going to go through the entire setup in fine detail, but to get our four virtual machines up and running we performed the following:
With all this preparatory setup complete, we were able to install Windows 8.1 through the vSphere client console. Once we had the GPU driver installed we were then able to plug a monitor and keyboard/mouse into the appropriate GPU and USB ports, configure the display settings to use the physical monitor instead of the VMWare console screen and complete the setup and testing as if the virtual machine was any other normal PC. | |||||||||||||||||||||||||
Performance and ImpressionsSince resource sharing makes it really difficult to benchmark hardware performance we are not going to get into anything like benchmark numbers. Really, the performance of each virtual OS is going to entirely depend on what hardware you have in the system and how you have that hardware allocated to each virtual machine. Instead of FPS performance, what we are actually more concerned about is if there is any input or display lag. Since gaming is so dependent on minimizing lag, this is also a great way to test the technology in general. If there are no problems while gaming then less rigorous tasks like web browsing, word processing, Photoshop, etc. should be no problem. To get a subjective idea of how a multi-headed gaming system performs, we loaded four copies of Battlefield 4 onto the four virtual machines and "borrowed" some employees from our production department to test out our setup. After getting the video settings dialed in (2560x1440 with med/high settings gave us a solid 60 FPS in 48 man servers), we simply played the game for a while to see how it felt. Universally, everyone who tried it said that they noticed absolutely no input or display lag. So from a performance standpoint, we would call this a complete success!
| |||||||||||||||||||||||||
Conclusion & Use Cases
Performance wise, we are very impressed with how well this configuration worked. Being a custom PC company, we have plenty of employees that enjoy gaming and none of them noted any input or display lag. But beyond the cool factor, what is the benefit of doing something like this over using four traditional PCs with specs similar to each of the four virtual machines?
As for specific use-cases, a multi-headed system could be useful almost any time you have multiple users in a local area. Internet and gaming cafes, libraries, schools, LAN parties, or any other place where a user is temporarily using a computer would likely love the snapshot feature of virtual machines. In fact, you could even give all users administrator privileges and just let them install whatever they want since you can have the virtual machine set to revert back to a specific snapshot automatically. In a more professional environment, snapshots might not be as exciting (although they would certainly still be very beneficial), but the ability to share hardware resources to give extra processing power to users when they need would be very useful. While it varies based on the profession, most employees spend the majority of their time doing work that requires little processing power intermixed with periods where they have to wait on the computer to complete a job. By sharing resources between multiple users, you can dramatically increase the amount of processing power available to each user - especially if it is only needed in short bursts. Overall, a multi-headed system is very interesting but is a bit of a niche technology. The average home user would probably never use something like this, but it definitely has some very intriguing real-world benefits. Would something like this be useful in either your personal or professional life? Let us know how you would use it in the comments below. | |||||||||||||||||||||||||
Extra: How far is too far?
For this project we used four video cards to power four gaming virtual machines because that was a very convienant number considering the PCI slot layout and the fact that the motherboard had four onboard USB controllers. However, four virtual machines is not at all the limit of this technology. So just how many virtual desktops could you run off a single PC with each desktop still having direct access to its own video card and USB controller? The current Intel Xeon E5 CPUs have 32 available lanes that PCI-E cards can use. If you used a quad Xeon system you would get 128 PCI-E lanes which you could theoretically divide into 128 individual PCI-E x1 slots using PCI-E expanders and risers. The video cards would likely see a bit of a performance hit unless you are using very low-end cards, but by doing this you could technically get 66 virtual personal computers from a single quad Xeon system (assuming the motherboard has 4 onboard USB controllers). Is 66 virtual machines off a single box too far? Honestly: yes. The power requirements, cooling, layout and overall complexity is pretty ridiculous at that point. Plus, how would you even fit 66 users around one PC (if it could even be called a PC at that point)? USB cables only have a maximum length of about 16 feet, so very quickly you would simply run out of space to put people. Really, at that point you should probably look into virtual desktop streaming instead of the monstrosity above that we mocked up in Photoshop. What do you think? How many virtual desktops do you think is right to aim for with a setup like this? | |||||||||||||||||||||||||
Tags: VMWare, ESXI, PCI passthrough, virtualization, virtual machine, multi-head, gaming | |||||||||||||||||||||||||

What stops Bluetooth technology from being used here so you don't have to use USB cables? I know some gamers like to be hardwired but I don't notice any lag with my Logitech wireless mouse and keyboard.
I'm not sure why they told you geforce doesn't support PCI passthrough. I have it working nicely currently.
My setup is a linux host + windows guest using qemu+kvm.
The linux host is running some crappy ati card while the guest is given a gtx 760.
You can see other setups on https://bbs.archlinux.org/view...
Could use a converter for longer distances with hdmi/usb cables over cat5/6.
hdmi: http://www.hdtvsupply.com/hdto...
usb: http://tinyurl.com/pu3l3xx
Pros: Reduce the number of physical towers in the building.
Multiple roles. It could be a HTPC/kitchen/gaming/kids pc all in one.
Limiting factors: Higher inital cost as well as upgrades, Cable runs, Difficulty upgrading parts. Having things break whenever a part is upgraded would be annoying. Technically advanced configuration.
Can I please have that rig? I will donate all my gaming life to it.
How did you divive 12 cores into 4 cours each divided by 4 stations? Or did you count hyperthreading as cores in which case you had 3 machines on physical cores and the 4th on hyperthreading which is not feasible as far as i know.. ? Could anybody explain or is there a mistake?
Which would be a better performance an amd r9 280 or a Nvidia Quadro 4000? Not the K4000 since that's about 100 dollars more expensive, but the r9 280 and the Nvidia Quadro 4000 are about the same price, anyone know which one I should get? And can I not just use a r9 290? The article only mentions r9 290x not the r9 290?
It really depends on what you are doing. If you are gaming then a R9-290/R9-290X (either should work) would be much better than a Quadro 4000. If you are making more of a workstation-type system (for AutoCAD, video encoding, etc) then the Quadro would be a better choice in my opinion.
From what I read at various vmware forums the 290x does not work in passthrough with esxi 5.1 or 5.5. If someone manages to make it work let me know, I went down the route of modding my 780ti to a Quadro K6000, works a treat!
Would there be anyway to take 1 graphics card, perhaps a Quadro and divide it up so that multiple guests can draw from 1 graphics card instead of having each guest having their own individual graphics card? I was hoping to maybe split a Quadro 6000 and assign about 256 mb from the gpu to multiple guests
I don't believe you can do that with a Quadro card. As far as I understand, NVIDIA GRID cards are the only cards from NVIDIA that can do something like that. Unfortunately, they are headless cards (no ports) so you have to do virtual desktop streaming. Plus, they are pretty expensive. We did an article on it if you want to read abnout them though: http://www.pugetsystems.com/la...
Most likely just getting multiple cheaper cards would be the best route for a situation like yours.
From what I can tell, the FX AMD processors should be able to do this correct? Obviously the correct motherboard would be needed as well, I'm just thinking about making a 2, maybe 3 headed system out of an FX 8 core. Do you know if its possible to say, having an odd number (3?) vCPUs with the cores being "modules" and not a true 8 core? Thanks!
So long as it has the proper VMware support (including VT-d for PCI-passthrough), it seems to me like it should work fine. No problem with odd numbers of cores per VM. We actually ran our setup that way for a while!
VT-d is an Intel technology, so I don't think that will show up on an AMD processor / motherboard. They may have an equivalent technology, of course, but I haven't kept up on AMD chips as well the last couple of years... so you'd want to look into that beforehand.
Thanks guys! In AMD speak its referred to as AMD-Vi or IOMMU from what I can tell. Also in my forum trolling and research I've found that most AM3 and newer CPUs (Phenom II's, FX Series, Opterons) from AMD will work as long as the motherboard has the functionality. (970-990 seem to be your best bet, though some Asus boards have some issues) Once I found out that older Phenom IIs work (I've seen some people use Regor core Athlon X2s even) I'm thinking I will run an FX series (not set on which CPU) in my main system, and run just a dual headed system for server/gaming station with my current Phenom II X4 830. Will update if you guys want.