Performance recommendations for Unity

This article builds on the discussion outlined in performance recommendations for mixed reality but focuses on learnings specific to the Unity engine environment.

It is also highly advisable that developers review the recommended environment settings for Unity article. This article contains content with some of the most important scene configurations for building performant Mixed Reality apps. Some of these recommended settings are highlighted below, as well.

How to profile with Unity

Unity provides the Unity Profiler built-in, which is a great resource to gather valuable performance insights for your particular app. Although one can run the profiler in-editor, these metrics do not represent the true runtime environment and thus, results from this should be used cautiously. It is recommended to remotely profile your application while running on device for most accurate and actionable insights. Further, Unity's Frame Debugger is also a very powerful and insight tool to utilize.

Unity provides great documentation for:

  1. How to connect the Unity profiler to UWP applications remotely
  2. How to effectively diagnose performance problems with the Unity Profiler

Note

With the Unity Profiler connected and after adding the GPU profiler (see Add Profiler in top right corner), one can see how much time is being spent on the CPU & GPU respectively in the middle of the profiler. This allows the developer to get a quick approximation if their application is CPU or GPU bounded.

Unity CPU vs GPU

CPU performance recommendations

The content below covers more in-depth performance practices, especially targeted for Unity & C# development.

Cache references

It is best practice to cache references to all relevant components and GameObjects at initialization. This is because repeating function calls such as GetComponent<T>() are significantly more expensive relative to the memory cost to store a pointer. This also applies to to the very regularly used Camera.main. Camera.main actually just uses FindGameObjectsWithTag() underneath, which expensively searches your scene graph for a camera object with the "MainCamera" tag.

using UnityEngine;
using System.Collections;

public class ExampleClass : MonoBehaviour
{
    private Camera cam;
    private CustomComponent comp;

    void Start() 
    {
        cam = Camera.main;
        comp = GetComponent<CustomComponent>();
    }

    void Update()
    {
        // Good
        this.transform.position = cam.transform.position + cam.transform.forward * 10.0f;

        // Bad
        this.transform.position = Camera.main.transform.position + Camera.main.transform.forward * 10.0f;

        // Good
        comp.DoSomethingAwesome();

        // Bad
        GetComponent<CustomComponent>().DoSomethingAwesome();
    }
}

Note

Avoid GetComponent(string)
When using GetComponent(), there are a handful of different overloads. It is important to always use the Type-based implementations and never the string-based searching overload. Searching by string in your scene is significantly more costly than searching by Type.
(Good) Component GetComponent(Type type)
(Good) T GetComponent<T>()
(Bad) Component GetComponent(string)>

Avoid expensive operations

  1. Avoid use of LINQ

    Although LINQ can be very clean and easy to read and write, it generally requires much more computation and particularly more memory allocation than writing the algorithm out manually.

    // Example Code
    using System.Linq;
    
    List<int> data = new List<int>();
    data.Any(x => x > 10);
    
    var result = from x in data
                 where x > 10
                 select x;
    
  2. Common Unity APIs

    Certain Unity APIs, although useful, can be very expensive to execute. Most of these involve searching your entire scene graph for some matching list of GameObjects. These operations can generally be avoided by caching references or implementing a manager component for the GameObjects in question to track the references at runtime.

     GameObject.SendMessage()
     GameObject.BroadcastMessage()
     UnityEngine.Object.Find()
     UnityEngine.Object.FindWithTag()
     UnityEngine.Object.FindObjectOfType()
     UnityEngine.Object.FindObjectsOfType()
     UnityEngine.Object.FindGameObjectsWithTag()
     UnityEngine.Object.FindGameObjectsWithTag()
    

Note

SendMessage() and BroadcastMessage() should be eliminated at all costs. These functions can be on the order of 1000x slower than direct function calls.

  1. Beware of boxing

    Boxing is a core concept of the C# language and runtime. It is the process of wrapping value-typed variables such as char, int, bool, etc. into reference-typed variables. When a value-typed variable is "boxed", it is wrapped inside of a System.Object which is stored on the managed heap. Thus, memory is allocated and eventually when disposed must be processed by the garbage collector. These allocations and deallocations incur a performance cost and in many scenarios are unnecessary or can be easily replaced by a less expensive alternative.

    One of the most common forms of boxing in development is the use of nullable value types. It is common to want to be able to return null for a value type in a function, especially when the operation may fail trying to get the value. The potential problem with this approach is that allocation now occurs on the heap and consequently needs to be garbage collected later.

    Example of boxing in C#

    // boolean value type is boxed into object boxedMyVar on the heap
    bool myVar = true;
    object boxedMyVar = myVar;
    

    Example of problematic boxing via nullable value types

    This code demonstrates a dummy particle class that one may create in a Unity project. A call to TryGetSpeed() will cause object allocation on the heap which will need to be garbage collected at a later point in time. This example is particularly problematic as there may be 1000+ or many more particles in a scene, each being asked for their current speed. Thus, 1000's of objects would be allocated and consequently de-allocated every frame, which would greatly diminish performance. Re-writing the function to return a negative value such as -1 to indicate a failure would avoid this issue and keep memory on the stack.

        public class MyParticle
        {
            // Example of function returning nullable value type
            public int? TryGetSpeed()
            {
                // Returns current speed int value or null if fails
            }
        }
    

Repeating code paths

Any repeating Unity callback functions (i.e Update) that are executed many times per second and/or frame should be written very carefully. Any expensive operations here will have huge and consistent impact on performance.

  1. Empty callback functions

    Although the code below may seem innocent to leave in your application, especially since every Unity script auto-initializes with this code block, these empty callbacks can actually become very expensive. Unity operates back and forth over an unmanaged/managed code boundary, between UnityEngine code and your application code. Context switching over this bridge is fairly expensive, even if there is nothing to execute. This becomes especially problematic if your app has 100's of GameObjects with components that have empty repeating Unity callbacks.

    void Update()
    {
    }
    

Note

Update() is the most common manifestation of this performance issue but other repeating Unity callbacks, such as the following can be equally as bad, if not worse: FixedUpdate(), LateUpdate(), OnPostRender", OnPreRender(), OnRenderImage(), etc.

  1. Operations to favor running once per frame

    The following Unity APIs are common operations for many Holographic Apps. Although not always possible, the results from these functions can very commonly be computed once and the results re-utilized across the application for a given frame.

    a) Generally it is good practice to have a dedicated Singleton class or service to handle your gaze Raycast into the scene and then re-use this result in all other scene components, instead of making repeated and essentially identical Raycast operations by each component. Of course, some applications may require raycasts from different origins or against different LayerMasks.

     UnityEngine.Physics.Raycast()
     UnityEngine.Physics.RaycastAll()
    

    b) Avoid GetComponent() operations in repeated Unity callbacks like Update() by caching references in Start() or Awake()

     UnityEngine.Object.GetComponent()
    

    c) It is good practice to instantiate all objects, if possible, at initialization and use object pooling to recycle and re-use GameObjects throughout runtime of your application

     UnityEngine.Object.Instantiate()
    
  2. Avoid interfaces and virtual constructs

    Invoking function calls through interfaces vs direct objects or calling virtual functions can often times be much more expensive than utilizing direct constructs or direct function calls. If the virtual function or interface is unnecessary, then it should be removed. However, the performance hit for these approaches are generally worth the trade-off if utilizing them simplifies development collaboration, code readability, and code maintainability.

    Generally, the recommendation is to not mark fields and functions as virtual unless there is a clear expectation that this member needs to be overwritten. One should be especially careful around high-frequency code paths that are called many times per frame or even once per frame such as an UpdateUI() method.

  3. Avoid passing structs by value

    Unlike classes, structs are value-types and when passed directly to a function, their contents are copied into a newly created instance. This copy adds CPU cost, as well as additional memory on the stack. For small structs, the effect is usually very minimal and thus acceptable. However, for functions repeatedly invoked every frame as well as functions taking large structs, if possible modify the function definition to pass by reference. Learn more here

Miscellaneous

  1. Physics

    a) Generally, the easiest way to improve physics is to limit the amount of time spent on Physics or the number of iterations per second. Of course, this will reduce simulation accuracy. See TimeManager in Unity

    b) The type of colliders in Unity have widely different performance characteristics. The order below lists the most performant colliders to least performant colliders from left to right. It is most important to avoid Mesh Colliders, which are substantially more expensive than the primitive colliders.

     Sphere < Capsule < Box <<< Mesh (Convex) < Mesh (non-Convex)
    

    See Unity Physics Best Practices for more info

  2. Animations

    Disable idle animations by disabling the Animator component (disabling the game object won't have the same effect). Avoid design patterns where an animator sits in a loop setting a value to the same thing. There is considerable overhead for this technique, with no effect on the application. Learn more here.

  3. Complex algorithms

    If your application is using complex algorithms such as inverse kinematics, path finding, etc, look to find a simpler approach or adjust relevant settings for their performance

CPU-to-GPU performance recommendations

Generally, CPU-to-GPU performance comes down to the draw calls submitted to the graphics card. To improve performance, draw calls need to be strategically a) reduced or b) restructured for optimal results. Since draw calls themselves are resource-intensive, reducing them will reduce overall work required. Further, state changes between draw calls requires costly validation and translation steps in the graphics driver and thus, restructuring of your application's draw calls to limit state changes (i.e different materials, etc) can boost performance.

Unity has a great article that gives an overview and dives into batching draw calls for their platform.

Single pass instanced rendering

Single Pass Instanced Rendering in Unity allows for draw calls for each eye to be reduced down to one instanced draw call. Due to cache coherency between two draw calls, there is also some performance improvement on the GPU as well.

To enable this feature in your Unity Project

  1. Open Player XR Settings (go to Edit > Project Settings > Player > XR Settings)
  2. Select Single Pass Instanced from the Stereo Rendering Method drop-down menu (Virtual Reality Supported checkbox must be checked)

Read the following articles from Unity for details with this rendering approach.

Note

One common issue with Single Pass Instanced Rendering occurs if developers already have existing custom shaders not written for instancing. After enabling this feature, developers may notice some GameObjects only render in one eye. This is because the associated custom shaders do not have the appropriate properties for instancing.

See Single Pass Stereo Rendering for HoloLens from Unity for how to address this problem

Static batching

Unity is able to batch many static objects to reduce draw calls to the GPU. Static Batching works for most Renderer objects in Unity that 1) share the same material and 2) are all marked as Static (Select an object in Unity and click the checkbox in the top right of the inspector). GameObjects marked as Static cannot be moved throughout your application's runtime. Thus, static batching can be difficult to leverage on HoloLens where virtually every object needs to be placed, moved, scaled, etc. For immersive headsets, static batching can dramatically reduce draw calls and thus improve performance.

Read Static Batching under Draw Call Batching in Unity for more details.

Dynamic batching

Since it is problematic to mark objects as Static for HoloLens development, dynamic batching can be a great tool to compensate for this lacking feature. Of course, it can also be useful on immersive headsets, as well. However, dynamic batching in Unity can be difficult to enable because GameObjects must a) share the same Material and b) meet a long list of other criteria.

Read Dynamic Batching under Draw Call Batching in Unity for the full list. Most commonly, GameObjects become invalid to be batched dynamically, because the associated mesh data can be no more than 300 vertices.

Other techniques

Batching can only occur if multiple GameObjects are able to share the same material. Typically, this will be blocked by the need for GameObjects to have a unique texture for their respective Material. It is common to combine Textures into one big Texture, a method known as Texture Atlasing.

Furthermore, it is generally preferable to combine meshes into one GameObject where possible and reasonable. Each Renderer in Unity will have its associated draw call(s) versus submitting a combined mesh under one Renderer.

Note

Modifying properties of Renderer.material at runtime will create a copy of the Material and thus potentially break batching. Use Renderer.sharedMaterial to modify shared material properties across GameObjects.

GPU performance recommendations

Learn more about optimizing graphics rendering in Unity

Optimize depth buffer sharing

It is generally recommended to enable Depth buffer sharing under Player XR Settings to optimize for hologram stability. When enabling depth-based late-stage reprojection with this setting however, it is recommended to select 16-bit depth format instead of 24-bit depth format. The 16-bit depth buffers will drastically reduce the bandwidth (and thus power) associated with depth buffer traffic. This can be a big win both in power reduction and performance improvement. However, there are two possible negative outcomes by using 16-bit depth format.

Z-Fighting

The reduced depth range fidelity makes z-fighting more likely to occur with 16-bit than 24-bit. To avoid these artifacts, modify the near/far clip planes of the Unity camera to account for the lower precision. For HoloLens-based applications, a far clip plane of 50m instead of the Unity default 1000m can generally eliminate any z-fighting.

Disabled Stencil Buffer

When Unity creates a Render Texture with 16-bit depth, there is no stencil buffer created. Selecting 24-bit depth format, per Unity documentation, will create a 24-bit z-buffer, as well as an [8-bit stencil buffer] (https://docs.unity3d.com/Manual/SL-Stencil.html) (if 32-bit is applicable on a device, which is generally the case such as HoloLens).

Avoid full-screen effects

Techniques that operate on the full screen can be quite expensive since their order of magnitude is millions of operations every frame. Thus, it is recommended to avoid post-processing effects such as anti-aliasing, bloom, and more.

Optimal lighting settings

Real-time Global Illumination in Unity can provide outstanding visual results but involves quite expensive lighting calculations. It is recommended to disable Realtime Global Illumination for every Unity scene file via Window > Rendering > Lighting Settings > Uncheck Real-time Global Illumination.

Furthermore, it is recommended to disable all shadow casting as these also add expensive GPU passes onto a Unity scene. Shadows can be disable per light but can also be controlled holistically via Quality settings.

Edit > Project Settings, then select the Quality category > Select Low Quality for the UWP Platform. One can also just set the Shadows property to Disable Shadows.

Reduce poly count

Polygon count is usually reduced by either

  1. Removing objects from a scene
  2. Asset decimation which reduces the number of polygons for a given mesh
  3. Implementing a Level of Detail (LOD) System into your application which renders far away objects with lower-polygon version of the same geometry

Understanding shaders in Unity

An easy approximation to compare shaders in performance is to identify the average number of operations each executes at runtime. This can be done easily in Unity.

  1. Select your shader asset or select a material, then in the top right corner of the inspector window, select the gear icon followed by "Select Shader"

    Select shader in Unity

  2. With the shader asset selected, click the "Compile and show code" button under the inspector window

    Compile Shader Code in Unity

  3. After compiling, look for the statistics section in the results with the number of different operations for both the vertex and pixel shader (Note: pixel shaders are often also called fragment shaders)

    Unity Standard Shader Operations

Optimize pixel shaders

Looking at the compiled statistic results using the method above, the fragment shader will generally execute more operations than the vertex shader, on average. The fragment shader, also known as the pixel shader, is executed per pixel on the screen output while the vertex shader is only executed per-vertex of all meshes being drawn to the screen.

Thus, not only do fragment shaders have more instructions than vertex shaders because of all the lighting calculations, fragment shaders are almost always executed on a larger dataset. For example, if the screen output is a 2k by 2k image, then the fragment shader can get executed 2,000*2,000 = 4,000,000 times. If rendering two eyes, this number doubles since there are two screens. If a mixed reality application has multiple passes, full-screen post-processing effects, or rendering multiple meshes to the same pixel, this number will increase dramatically.

Therefore, reducing the number of operations in the fragment shader can generally give far greater performance gains over optimizations in the vertex shader.

Unity Standard shader alternatives

Instead of using a physically based rendering (PBR) or another high-quality shader, look at utilizing a more performant and cheaper shader. The Mixed Reality Toolkit provides the MRTK standard shader that has been optimized for mixed reality projects.

Unity also provides an unlit, vertex lit, diffuse, and other simplified shader options that are significantly faster compared to the Unity Standard shader. See Usage and Performance of Built-in Shaders for more detailed information.

Shader preloading

Use Shader preloading and other tricks to optimize shader load time. In particular, shader preloading means you won't see any hitches due to runtime shader compilation.

Limit overdraw

In Unity, one can display overdraw for their scene, by toggling the draw mode menu in the top-left corner of the Scene view and selecting Overdraw.

Generally, overdraw can be mitigated by culling objects ahead of time before they are sent to the GPU. Unity provides details on implementing Occlusion Culling for their engine.

Memory recommendations

Excessive memory allocation & deallocation operations can have adverse effects on your holographic application, resulting in inconsistent performance, frozen frames, and other detrimental behavior. It is especially important to understand memory considerations when developing in Unity since memory management is controlled by the garbage collector.

Garbage collection

Holographic apps will lose processing compute time to the garbage collector (GC) when the GC is activated to analyze objects that are no longer in scope during execution and their memory needs to be released, so it can be made available for re-use. Constant allocations and de-allocations will generally require the garbage collector to run more frequently, thus hurting performance and user experience.

Unity has provided an excellent page that explains in detail how the garbage collector works and tips to write more efficient code in regards to memory management.

One of the most common practices that leads to excessive garbage collection is not caching references to components and classes in Unity development. Any references should be captured during Start() or Awake() and re-used in later functions such as Update() or LateUpdate().

Other quick tips:

  • Use the StringBuilder C# class to dynamically build complex strings at runtime
  • Remove calls to Debug.Log() when no longer needed, as they still execute in all build versions of an app
  • If your holographic app generally requires lots of memory, consider calling System.GC.Collect() during loading phases such as when presenting a loading or transition screen

Object pooling

Object pooling is a popular technique to reduce the cost of continuous allocations & deallocations of objects. This is done by allocating a large pool of identical objects and re-using inactive, available instances from this pool instead of constantly spawning and destroying objects over time. Object pools are great for re-useable components that have variable lifetime during an app.

Startup performance

You should consider starting your app with a smaller scene, then using SceneManager.LoadSceneAsync to load the rest of the scene. This allows your app to get to an interactive state as fast as possible. Be aware that there may be a large CPU spike while the new scene is being activated and that any rendered content might stutter or hitch. One way to work around this is to set the AsyncOperation.allowSceneActivation property to "false" on the scene being loaded, wait for the scene to load, clear the screen to black, and then set it back to "true" to complete the scene activation.

Remember that while the startup scene is loading, the holographic splash screen will be displayed to the user.

See also