[FIXED] Watchdog DPC Violation Error with Multiple NVIDIA GPUs
Written on May 11, 2018 by Ken ColorossiIntroduction
Recently we have been seeing an issue in our support department with a specific hardware combination: the Asus X99E-10G WS motherboard and triple or quad NVIDIA GPU configurations. When running a workload that is heavy on both the CPU and GPUs, systems with this hardware can crash and give the DPC_WATCHDOG_VIOLATION error or "blue screen of death" (BSOD).
We have created an instructional video to guide users through resolving this problem by reverting to an older NVIDIA graphics driver and then preventing Windows from automatically updating to a newer version. We are also currently coordinating with NVIDIA on a permanent fix. Updates will be published to the article as they become available.
Updates
Final Update 9/10/2018 [SOLUTION]
We have identified a solution that mitigates the DPC errors and while it does not completely solve the problem, in all testing the systems are significantly more stable with fewer crashes. If you need to update to the newest NVIDIA driver in order to run a particular application then this adjustment will help you. If you do NOT need to update in order to run an application we would advise against deploying the adjustment. The adjustment involves editing the registry so at this time we are not releasing a detailed guide however we are releasing an executable that will perform the registry edits for you.
Please navigate HERE and download the files to your affected computer. Once downloaded, run the application to perform the adjustment.
Alternatively, if you have any questions or prefer our Puget Systems Support staff perform the adjustment please don't hesitate to reach out; we are here to help!
Update 6/13/2018
Nivida has stated that the watchdog BSOD w/ multi GPU on X99-E 10G WS is now a "blocking issue", which means they cannot release a new driver unless it contains a fix.
Update 5/24/2018
With the release of new drivers Nvidia has removed the 382.53 driver from their list. You can access the download from the Nividia database here.
Update 5/15/2018
Currently Nivida has been able to replicate this issue in their lab and are working on a solution.
How to install NVIDIA Graphics Driver v382.53
The video is embedded below but if you would prefer to read the instructions rather than watch it you can click here to see the script text.
Video Script:
Today, we will demonstrate how to find, install, and hide the older driver for the Nvidia GPU problem with the DPC_WATCHDOG_VIOLATION. The issue is with the Asus X99E-10G WS motherboard and a triple or quad Nvidia GPU stack. When running a GPU and CPU heavy workload, it will cause the system to crash and give the DPC_WATCHDOG_VIOLATION, which looks like this. (SHOW SCREENSHOT).
Step 1: Revert to Older Graphics Driver
So we are going to go to “nvidia.com” then “click on support” then select “Download Drivers” then down in the “Beta, Older drivers and more” section select “Beta and Older Drivers” For this system, we are going to use the defaults because that is all correct. However, if you have GTX1080s or Titans, you’ll need to change the Product drop down to match your system.
From there, we are going to click “Search.” We are going to download the the driver version “382.53” by clicking on “GeForce Game Ready Driver,” then click “Download” and finish by clicking “Agree & Download.”
Once the download is complete. Close your browser, open file explore and navage to the downloads folder
Now, we are going to double click on the file, select yes to the “Do you want the app to make changes to your desktop?” Once you’ve done that, select “OK” on the “Specify the folder where the driver files are to be saved” that is going to unpack where it needs to for installation and start the installer. Next click Agree and Continue to the “License Agreement” then select “Custom” and “NEXT.” We are doing this so we can select “Perform a clean installation” This is very important. Also we are going to un select “NVIDIA GeForce Experience.” This is for gamers and isn’t needed for what we are doing.
Then click “NEXT” This will remove all remanence of the new driver and install the older 382.53 driver. Now that it is done with the removal we will restart the system by clicking “Restart Now” and when we come back it will automatically start the installer. Now that the driver is finished installing we will restart by clicking “Restart now”.
Step 2: Installing Wushowhide
Now that the driver is installed, let’s download and run a program from Microsoft to hide the NVIDIA driver from Windows Update so it doesn't try and force install the newer driver again. We are going to do a Google search for wushowhide (windows update show hide). We want the one what shows it’s from Microsoft. We are going to scroll down to the section “For Windows 10 v1607 (Anniversary Update) and click on “Download the “Show or hide updates” troubleshooter package now” .
Now we are going to go back to our Downloads folder and open the wushowhide program.
This may need to be run a few times until the NVIDIA driver update shows up. So we are going to double click on the file to run it. We are going to click on “Next”. It’s going to look for pending updates. Then we are going to click on “Hide Updates” Then we are going to select the NVIDIA Display driver and click “NEXT”. It will end at “Troubleshooting has completed” then click “Close” and you are done.
Conclusion
We hope you found the video above helpful!
If you do not own a Puget System, but are experiencing this issue and need assistance, please go here to submit a help request with NVIDIA directly. You may also want to contact the manufacturer of your system for additional support.
Need help with your Puget Systems PC?
If something is wrong with your Puget Systems PC. We are readily accessible, and our support team comes from a wide range of technological backgrounds to better assist you!
Looking for more support guides?
If you are looking for a solution to a problem you are having with your PC, we also have a number of other support guides that may be able to assist you with other issues.
Yeah unfortunately this is not a good fix when you use software that requires the newest Nvidia drivers.
I'm affected by this issue too. Unable to use Vray RT and Octane Render. Both need newer driver than 382.53. Very sad this problem is going on for 6 months now. Nvidia please wake up. Thanks Puget to make Nvidia acknowledge the problem.
Thank Puget Systems for taking care of this problem and giving us this update! I really hope the problem will be fixed soon !
Any update on this? New nVidia driver does not contain the fix and still lists this issue as Unresolved.
New Nvidia Driver has been released and the bug is still in the Known Issue. So apparently Nvidia didn't think it was a "blocking issue". That is really frustrating.
Hello. Nvidia did publish a new driver that did not resolve the issue. However, we have been conducting internal testing and have found that the new driver + a specific registry edit DOES resolve the issue. We are working on putting together detailed instructions for implementing this fix. Please stay tuned in to the article for an update in the next couple of days
Hi!, if you are talking about TDR Delay it didn“t fix the issue for me
Thank you Ken. Looking forward to test it.
Thanks Kens. Really hope this works!
Ken, do you have any update? We're are hanging on your words
Thank you for your Inquires everyone. We are still working on our internal testing and hope to have something for you very soon. Our current solution is not yet stable enough for public release.
I need a favor from someone using an x99-e 10g ws motherboard. I have been dealing with an sli issue on this particular motherboard for awhile and I do believe it is related to the PLX chip found on the board. I have rma'd the videocards, the motherboard and the memory. At this point I am ruling the memory and the videocards out. As for my request, can someone please SLI two (no more, no less) nvidia cards on this motherboard on slots one and five (cannot be any other slots) using the MOST RECENT drivers and tell me if they get any crashes, errors etc. I only ask for this to be done as I currently have two GTX 1070s in slots one and three (liquidcooled and the linking block only spans that far) and I am wondering if I sli the two cards across slots one and five (hence each card will be operating on a different PLX chip) will the issue cease. Currently I am limited to using only nvidia driver 382.53 (its the only one that works, no crashes, errors, BSODs). Without the latest drivers installed a lot of current games will not recognize my second videocard, some games wont even start (GTA V). Also I am using Windows 10 64 bit. Thanks guys.
The registry script above did not set my GPUs correctly in MSI mode. I found the MSI utility v2 linked here doing a better job.
https://forums.guru3d.com/t...
It did not fix my problem nonetheless.
Sorry for posting in an old thread, but thank you for this, I spent so much money on my system only to have it unstable for so long and this fix worked! I only have one question if anyone knows; Once the fix is applied is it permanent or do I have to re-apply it after every nvidia driver update? Thanks again!
I'm glad this was able to help you. Unfortunately, This does need to be applied after every Nvidia update. We have found that Geforce Experience will alert you of an update. That way you can install the update and apply the fix right away instead of windows doing it in the background without your knowledge. Thus allowing you to maintain a stable system. Side note if your running Quadro GPUs I do not know if Geforce Experience will work.
I would like to take a minute and thank your entire team for finding a solution for this problem, even if it's a temporary one. This debilitating issue has plagued my 4 GPU setup for two years now, ever since I upgraded to Windows 10. I've reinstalled the system software multiple times, got a brand new set of GPUS, a new harddrive, shipped my machine out to the company that built it TWICE and the only solution anyone ever had was to roll it back to Windows 7. Sure, this worked for a year but now that Windows 7 is no longer supported, I knew i'd have to find a better solution. I was literally on the verge of giving up until I stumbled upon your post...all because I happen to add "with multiple GPUs" to my google search parameters when searching for additional clues about this dreaded "DPC_WATCHDOG_VIOLATION" bsod. I don't know a whole lot about PCs (have always been a Mac guy) and this issue almost had me throwing in the towel, but today you folks have restored a teenie tiny bit of confidence into my PC workflow. Thank you.
It is unfortunate that Nvidea is unable to implement this fix to their driver stack. But, They do say this is a proper fix to the problem with system equipped with PLX chips. Thankfully Intel has released CPUs that have enough lanes to make this not a problem on future systems. I'm glad that my article was able to help you make your system more stable.
hello, wow so i get the crashes all the time too since Windows 10 update 2 months ago. 4 nvidia gtx 1080ti cards and i believe it's the mobo but i'm not sure about the 10G part. anyway if i could ask a couple of questions: 1. is there a way to undo this once i run the executable? 2. what generally does it do? 3. since it has been a while, has nvidia addressed this in their driver - i have driver version 441.66 on my cards. i typically do not install geforce experience, i just try to get the driver only. thank you for this - for the first time i have HOPE!
So this fix works on any motherboard that uses PLX chips. The Asus X99E-10G WS is just one, the Asus X99E WS would be another. To try and answer your questions.
1. To unto this executable would simply require you to reinstall your GPU drivers.
2. This tool forces the drivers to run is MSI (Message Signaled-Based Interrupts) Mode.
3. Unfortunately, Nvidia has not addressed this in their driver and can't. When we were working with them on this, they said it couldn't be added into the driver.
On a side note with the release of the new TRX40 platform and the new Asus Pro WS C621-64L SAGE/10G this fix won't need to be used on newer systems due to them having enough CPU lanes to not need the PLX chips.
well, i think we've fixed it due to this thread/article! i and my IT guy didn't want to run the exe, but we researched it and found what needed to change. so far so good. i have two monitors and a sony 75" tv plugged into one card, the other three are not SLI'd or plugged into anything. i use for GPU rendering. so far no crashes at all since the mods were made. Ken thanks a ton!
Hello! It's been a couple years since you folks put this fix out (which worked great for my 4 GPU setup), and was wondering if it's still needed? Nvidia just released new Studio Drivers today (8/18/20) and i'm wondering if I still need to run the application to adjust? Thanks
After reading through the patch notes for Studio Drivers v452.06 (released 8/18/2020) it seems they still have not addressed the problem with Multi-GPU equipped systems that use PLX Chips. This fix will still be needed.
On a side note with the release of motherboards like the Asus PRO WS C621-64L SAGE/10G and the Gigabyte TRX40 Aorus Xtreme that have enough lanes to the PCI-E slots, PLX is no longer needed. Because of that, I do not expect NVIDIA will ever fix this problem on older systems.
Good to know. Thanks Ken