In part 1 of the article, we have discussed what a GPU crash is, why it happens, debug it, and how to resolve one of the issue GPU out-of-memory. In this part, we will continue to troubleshoot other issue which might cause the GPU driver crash in Unreal Engine 5.
Let’s explain what cause a TDR event. It actually occurs when the GPU takes too much time to complete the action assigned by CPU. By default, the GPU need to complete the operation in two seconds in Windows. If it takes longer than that, CPU will reset the driver. That leads to a GPU crash.
The engine should not send such a big work to the GPU, because it could triggers a TDR event. To avoid the occurrence of TDR, the engine should split the task into smaller chunks, or you can increase the time it takes for a timeout to occur by editing the Windows Registry.
To change the registry key, you should create two new keys which are TdrDelay and TdrDiDelay. The first one is to set a timeout threshold. It is the number of seconds the GPU delays the preempt request from the GPU scheduler that handles processing and memory (VRAM). The second one is to set the amount of time the operating system (OS) allows threads to leave the driver. After that time has elapsed, a timeout delay failure occurs.
Please note that changing the registry keys on your Windows OS could lead to unexpected consequences and require a full re-installation of Windows.
You can refer on how to change it here: changing Registry Keys.
Although this is a good way to curb GPU crashes based on rendering, this will not resolve all crashes. If you try to process too much data at once, the GPU may time out regardless of how long you set the timeout delay. This solution is only designed to give your graphics card a little extra time.
There are some technology that could takes a lot of resource and easily triggers TDR event. Hardware Ray Tracing is one of them. If you use expensive ray tracing passes like Ray Tracing Global Illumination or reflections at very large resolutions, it could take longer to render and eventually a TDR event could occur.
You can avoid timeout detection by rendering the passes in tiles instead of a single pass. To do that, you should follow these console variables:
If your tile size of a pass is greater than 0, these passes will be rendered N x N pixel tiles, where each tile is submitted as a separate GPU command buffer.
There are many reason for GPU crashes and it could happen when you have bugs in the engine code, drivers, or operating systems. If you have checked and known that OOM and TDR events are not the reasons, you can investigate more to determine what the root case is.
- Run the engine with -gpucrashdebugging and -d3ddebug. Please run them separately.
- Force the engine to run with only one thread by using -onethread and -forcerhibypass. It will determine if the underlying problem is a threading/timing issue.
- Run the engine with r.RDG.Debug=1 to provide information about render passes that have not been properly set.
- Force the Render Dependence Graph (RDG) to execute passes immediately after creation by running the engine with r.RDG.ImmediateMode=1.
- Switch to a different RHI. For example, if you are using DirectX 12 (DX12), you can switch to DirectX 11 (DX11). However,some features only work with a specific RHI (such as Hardware Ray Tracing only being supported by DX12).
- Use A/B Testing for your scene:
- Turn rendering passes on and off to check if the crash occurs. Sometimes a faulty pass could be the cause of the crash.
- Turn rendering features on and off, such as Lumen, Nanite, Ray Tracing.
- Hide/Show specific objects in the scene. This could isolate if problems are related to a specific asset.
All the above information applies to GPU crash related to driver or operating system issue. For the driver, you should use the latest one available. For the operating system, especially Windows, Unreal Engine team strong recommend using version 20H2. You can check it by pressing the Windows key and typing in winver.
As you may know, iRender provide to you high performance and configurable server system for 3D rendering, AI Training, VR & AR, simulation, etc. With our server, you can install any software you need, add your license, and do whatever you like with your project. Unreal Engine is no exception.
At iRender, we offer you server 3P with single RTX3090 and powerful hardware like Processor AMD Ryzen Threadripper Pro 3955 WX 3.9GHz, RAM 256GB and Storage NVMe SSD with 2TB. With those specification, the server can serve any project of yours in Unreal, ensure it to render and load faster and more stable.
Moreover, iRender has other beneficial features. We have NVlink available if you want to test (contact for more details), and free transferring tool iRender drive/GPUhub sync. Our support is available 24/7 via livechat and could help you troubleshoot any issues.
iRender has many other features which would help you to render faster and easier. You can create an account via this link to experience our service. And don’t hesitate to contact us via WhatsApp: (+84) 916806116 for advice and support.
Thank you & Happy Rendering!
Source + image: docs.unrealengine.com