Mobile game optimization in Unity

Preface

Optimizing Unity demos for mobile devices is crucial to ensure the best possible user experience. In this project, the focus is on researching, analyzing, and implementing new optimizations and replacements for a demo made by Unity themselves. The goal is to identify performance bottlenecks and find new techniques to improve performance while maintaining visual quality. Once implemented, the optimized demo will be tested on a Samsung Galaxy S8 to ensure smooth performance. By following these steps, developers can create applications that run smoothly on a variety of mobile devices, providing users with an outstanding experience.

Table of Content

Diagnosing performance problems

Unity offers an array of tools and features to aid developers in creating and optimizing their mobile games. However, before diving into the optimization process, it’s crucial to first research and understand the various analyzing techniques that Unity offers. Profiling tools such as the Profiler, Profile Analyzer, and Frame Debugger can provide valuable insights into a game’s performance and help identify bottlenecks, compatibility issues, and areas for improvement. By grasping the details of a game’s performance, developers can make smart choices about where to concentrate their optimization work, guaranteeing an enjoyable gaming experience for players on different devices.

Profiler
Game optimization starts with looking at the profiler window in Unity. Unity’s profiler is an instrumentation-based profiler that captures every process and function call by inserting markers at the beginning and end of them. This differs from a sample-based profiler, which takes snapshots of the game running but can miss certain events.

Unity’s profiler provides different views such as CPU, GPU, Memory, and Audio to help identify performance bottlenecks in specific areas of a game. It’s better to calculate a budget for a single frame, rather than focusing on FPS. The budget is the amount of milliseconds a frame takes on the target FPS, which can be calculated by 1000 milliseconds / target frames per second. It is also important to look at the overall frame time distribution to identify any spikes or dips that may cause stuttering or lag. The profiler can also help identify memory leaks and excessive garbage collection (Unity, 2020).

To avoid confusion, it’s recommended to profile the build instead of the editor. This can be done by checking Development Build and Autoconnect Profiler in the build settings, and unchecking ‘Run In Background’ in the Player settings (Unity, 2020).

Rendering API calls can cause some frames to have higher usage than others. When the GPU is waiting for the CPU to finish, it is called CPU bound. When the CPU is waiting, it is called GPU bound.
On mobile phones, VSync is forced on at the hardware level, so turning it off in Unity has no effect. VSync stabilizes the frame rate so that every frame takes the exact same amount of milliseconds. When 60 FPS isn’t achieved, it is brought down to 30 FPS and so on (Unity, n.d.).

The garbage collector is a background process that can cause spikes in CPU usage, which can lead to frame stutters. In Unity 2019.1, there is limited control over the garbage collection process. However, the setting ‘Use Incremental GC’ in the Player settings is in an experimental phase. When enabled, Unity tries to spread the load of the garbage collection over multiple frames, instead of trying to do it all in one frame, which can help reduce stutters. The GC.Alloc and GC.Collect methods can be used to search for memory allocations and collections in the profiler, and ordering by GC Alloc in the profiler will sort the results by memory usage, making it easier to identify areas where excessive allocations are occurring. (Technologies, n.d.)

Profile Analyzer
In addition to profiling with Unity’s built-in profiler, a very useful technique in game optimization is using Unity’s Profile Analyzer tool, which allows for more in-depth analysis of profiling data. The Profile Analyzer provides visualizations and statistical analysis of profiler data, making it easier to identify performance bottlenecks and track down the root causes of problems. It can also help in comparing profiling data from different sessions or builds, allowing for tracking of progress in optimization efforts. The tool includes features such as call trees, flame graphs, and frame summary reports. By using the Profile Analyzer in conjunction with the profiler, game developers can gain a deeper understanding of their game’s performance and make informed optimization decisions (Unity, 2020).

Frame Debugger
Another helpful tool for game optimization is the Unity Frame Debugger. The Frame Debugger allows developers to step through every draw call and inspect the objects being rendered in a frame. This can help identify any unnecessary or redundant draw calls that can impact performance. It also provides insights into shader performance and can help detect any issues related to rendering, such as z-fighting or incorrect sorting. By using the Frame Debugger in conjunction with the Unity profiler and Profile Analyzer, game developers can gain a comprehensive understanding of their game’s performance and make informed decisions on optimization efforts (Unity, 2020).

Analyzing the project

This project focuses on optimizing the Unity 3D Kit Demo using a Samsung Galaxy S8, an older device that currently experiences a low framerate when running the demo. The aim is to enhance the performance of the 3D Kit Demo on this device, demonstrating the power of Unity’s optimization tools and techniques, as well as providing useful insights and lessons for developers working on similar projects.

In this particular case, the rendering process for a single frame took 36 ms. The target for this project is 30 FPS, which translates to a budget of 33 milliseconds per frame. As a result, rendering optimizations are necessary. Additionally, the physics, animation, and script processes took 11 ms, 13 ms, and 14 ms, respectively. Since the combined time of these three processes exceeds the budget, further optimizations are needed in each of these areas as well. The PlayerLoop took 86 ms on average per frame to render. This means 1000 ms / 90 ms = 12 fps on average.

When checking the hierarchy, it’s clear that 33% of the Playerloop time is used by an API function called PostLateUpdate.FinishFrameRendering. This shows that the game relies heavily on the GPU. The CPU has to wait for the GPU to finish rendering the last frame. Also, 21.5% of the total cost comes from rendering OpaqueGeometry, which means the GPU has a hard time handling all the objects in the scene.

Understanding mobile GPU’s

Since rendering accounted for the largest portion of the profiler’s recorded cost, it’s now an appropriate time to gain a better understanding of mobile GPUs. By doing so, developers can identify the factors affecting GPU performance and make informed decisions to optimize rendering techniques, assets, and other adjustments (OpenSystems Media, n.d.).

GPUs have become essential components in mobile devices, enabling visually impressive experiences for users. They can handle many tasks simultaneously and are especially good at processing complex 3D graphics. Shader programs, such as vertex, pixel, and newer types like geometry and tessellation shaders, are used to control object properties and create visual effects (OpenSystems Media, n.d.).

The GPU is designed as a single instruction, multiple data (SIMD) that can handle many tasks at the same time. It is especially good at handling complex 3D graphics since it can process billions of pixels and floating-point operations (GFLOPS) every second. The GPU has shader units (SIMD units) that can work on vertices, primitives, and pixels independently (OpenSystems Media, n.d.).

Modern GPUs use “unified” shaders to optimize hardware resource management and balance workloads, which minimizes bottlenecks and stalls. GPUs have evolved beyond graphics rendering and are now used for a variety of applications, including compute, image processing, and video processing tasks (OpenSystems Media, n.d.).

The primary 3D API for mobile devices is OpenGL ES, which is optimized for mobile devices by removing redundancies and adding mobile-friendly data formats. Different GPU architectures, such as ARM Mali, Qualcomm Adreno, Nvidia Tegra, and Imagination Technologies PowerVR, are used in mobile devices, each with its own strengths and weaknesses. The specific GPU used in a mobile device depends on the manufacturer’s design choices (OpenSystems Media, n.d.).

Challenges of the mobile GPU
Mobile GPU’s have a very low fill rate. The reason for this is that power consumption. They are clocked at relatively low frequencies, and therefor can process fewer pixels than a similar GPU with a higher clock frequency. All this while most phone displays have a very high pixel count, which is not a great combination (Sproing Interactive Media, 2014).

The adoption of tile-based deferred rendering (TBDR) in mobile graphics began in the mid-2010s with the introduction of mobile GPUs that supported this technique. Since then, many mobile devices have adopted TBDR as a primary rendering technique, offering high-quality 3D graphics with improved performance and lower power consumption. TBDR remains a popular rendering technique for mobile devices today (Epic Games, 2014).

TBDR can complicate resource management because it requires the GPU to process small sections of the screen, known as tiles, independently. This can create challenges for managing resources such as textures and memory, as the GPU may need to access and load different resources for each tile, and must also manage the resources that are shared between tiles. Also, render target switches can be expensive as the GPU must switch between different tiles, each with its own set of render targets, adding significant overhead to the rendering pipeline. This is because the GPU must save and restore the state of each tile’s render targets between switches and also access and load different resources. To mitigate this cost, developers can use techniques such as grouping similar rendering operations together and minimizing the number of switches between different render targets. Post-processing is therefore less performant (Epic Games, 2014).

Rendering Optimization

Balancing resource usage can improve overall graphics performance in games. If the GPU is slowing down the game, reducing its load by transferring it to the CPU or memory can improve performance. Baking lightmaps can significantly reduce performance impact by precalculating static lighting for non-movable objects. Avoiding dynamic lights and reducing their usage as much as possible can also greatly improve performance, especially on low-end devices. Lightmapping should not be used for moving lights (Dickinson, 2015).

Draw calls in Unity games can create CPU overhead and slow down performance. Optimize by using draw call batching, where similar graphics are grouped together to speed up rendering. Dynamic batching is used for moving objects, transforming vertices with the CPU. Enable dynamic batching in the Player settings for automatic optimization. Static batching can be used for stationary objects marked as static, storing combined geometry in memory for faster rendering. However, it has memory usage impacts and should be reviewed before use (Dickinson, 2015).

Optimizing textures, materials, and assets is important for 3D game performance and build size. Use compressed textures with reduced resolution to free up memory and increase rendering speeds. Enable mipmaps in texture import settings for GPU optimization, except for UI elements and 2D graphics that should be rendered in their original sizes (Dickinson, 2015).

Three main causes of GPU bottleneck
The first thing to do if a game is GPU bound is to find out what is causing the GPU bottleneck. GPU performance is most often limited by fill rate, especially on mobile devices, but memory bandwidth and vertex processing can also be concerns (Unity Learn, 2022).

Fill Rate
Fill rate refers to the number of pixels that the GPU can render to the screen each second. If a game has fill rate limitations, it means that the game is attempting to draw more pixels per frame than the GPU can handle. To determine if fill rate is the problem, profile the game and note the GPU time. Then, decrease the display resolution. If performance improves, fill rate is the issue (Unity Learn, 2022).

To address fill rate problems, there are a few approaches. Fragment shaders, the code responsible for drawing a single pixel, can cause performance issues if not optimized. Complex fragment shaders are a common cause of fill rate bottlenecks. To improve performance, it is recommended to use the most efficient and optimized shaders for the desired visual effect, such as the mobile shaders provided by Unity.

When using Unity’s Standard Shader, it is important to understand that the shader is compiled based on the current material settings, only including the utilized features. By removing features, such as detail maps, the complexity of the fragment shader code can be reduced, thereby improving performance.

Overdraw occurs when the same pixel is drawn multiple times, leading to fill rate issues. The order in which objects are drawn in a scene is determined by the shader’s render queue, which can result in maximized overdraw if the objects are sorted back-to-front in the Transparent queue. To reduce overdraw, it is recommended to reduce the number of overlapping objects and optimize transparent materials, unoptimized particles, and overlapping UI elements. The Unity Scene view has a Draw Mode that can help identify and reduce overdraw.

The use of image effects can also impact fill rate, especially when multiple image effects are used. To address this, experiment with different settings or more optimized image effects, or combine the shader code into a single pass using Unity’s PostProcessing Stack. If optimization does not solve fill rate issues, disabling image effects may be necessary for lower-end devices.

Memory Bandwidth
Memory bandwidth refers to the rate at which the GPU can read from and write to its dedicated memory. If a game is limited by memory bandwidth, it likely means that the game is using textures that are too large for the GPU to handle quickly (Unity Learn, 2022).

Diagnosing memory bandwidth issues in a game can be done by profiling the GPU time, then reducing the Texture Quality in the Quality Settings. If there is an improvement in performance after making this change, it’s a sign that memory bandwidth might be the issue. If memory bandwidth is the issue, it is necessary to minimize the usage of textures in the game. The most effective method may vary, but there are several strategies that can be used to optimize textures.

To reduce texture memory usage and improve performance, texture compression can be implemented in a game. There are various texture compression formats and settings available in Unity, and it’s best to experiment with different options to find the best one for each texture. Using texture compression can greatly decrease the size of textures both on disk and in memory, making it a useful technique to consider if memory bandwidth is a concern. Information on different compression formats and settings can be found in the Unity Manual.

Another method is to utilize mipmaps to reduce the memory usage of textures. Mipmaps are lower-resolution versions of textures that Unity uses on objects that are far from the camera. By using mipmaps, we can ease the burden on memory bandwidth. To determine which objects in our scene could benefit from mipmaps, we can use the Mipmaps Draw Mode in Scene view. Further information on enabling mipmaps for textures can be found in the Unity Manual.

Vertex Processing
If a game is limited by vertex processing, it indicates that the GPU is spending a significant amount of time rendering each vertex in a mesh. The number of vertices and the number of operations executed on each vertex affect this. If the game is GPU bound and not constrained by fill rate or memory bandwidth, vertex processing is likely the root of the problem. To improve performance, reducing the number of vertices or operations executed on each vertex is the goal. Various approaches may be taken, and experimentation is recommended to determine the best fit for the game (Unity Learn, 2022).

To decrease vertex processing, one option is to minimize mesh complexity that is not necessary. Excessively detailed meshes or meshes with too many vertices due to creation errors waste GPU resources. To reduce vertex processing expenses, meshes with lower vertex counts should be created in the 3D art program.

The use of normal mapping to simulate complex geometry in meshes may improve performance by reducing the number of vertices the GPU needs to process. Instead of using a high vertex count mesh, normal mapping employs textures to create the illusion of geometric complexity. This results in the GPU processing fewer vertices, which can increase performance, especially on lower-end devices. Nonetheless, it’s crucial to keep in mind that normal mapping incurs a small amount of GPU overhead, so experiment and check if it provides a net performance benefit for a specific project.

To reduce the amount of data sent to the GPU for each vertex, vertex tangents for a mesh in its import settings can be turned off if normal mapping is not utilized.

The use of Level of Detail (LOD) is another approach to reducing vertex processing. This optimization technique involves simplifying the complexity of meshes far from the camera, which reduces the number of vertices the GPU needs to render. The outcome is improved performance without compromising the game’s visual quality. Refer to the LOD Group page in the Unity Manual for more information on how to implement LOD in your game.

Another approach is to simplify the vertex shaders that control how the GPU draws each vertex. Reducing the complexity of vertex shaders may help mitigate the vertex processing problem if a game is experiencing limitations.

Furthermore, vertex shaders can be optimized in terms of how they are employed. For example, minimizing the number of operations performed in the shader or simplifying complex calculations can help reduce the cost of vertex processing. This may require rearranging the code or utilizing efficient algorithms and data structures. It’s crucial to keep in mind that although these optimizations may boost performance, they may also affect the game’s visual quality. Therefore, performance gains should always be prioritized over the game’s visual impact.

If a project employs custom shaders, they should be optimized as much as possible. Optimizing shaders can be a difficult task, but the Unity Manual provides useful resources, including this page and the Shader optimization section of this page, to help guide us in optimizing our shader code.

Implementing Rendering Improvements

Having completed the research phase, it is now time to apply the gained knowledge to optimize rendering. The information gathered so far will help in identifying performance bottlenecks and making informed decisions about improvements. This next stage will concentrate on implementing optimizations, addressing rendering challenges, and enhancing the project’s overall performance. The process of optimization will demonstrate the positive impact of informed decision-making and targeted action.

First, it’s important to determine the cause of the GPU bottleneck. As mentioned earlier, the fill rate is usually the main bottleneck. To see if the fill rate is causing the issue, the resolution can be lowered. The Samsung Galaxy S8 has a DPI of 411. By reducing it to 300, some improvements may be observed if the fill rate is indeed the bottleneck. Current frames are taking 81ms to render.

Changing the DPI definitely increased performance. On average, frames were 20ms cheaper to render. This increase in performance is significant.

To address fill rate problems, a few approaches can be taken, such as optimizing fragment shaders, removing unnecessary features in Unity’s Standard Shader, reducing overdraw, and using more efficient image effects. By following these recommendations, it’s possible to overcome the fill rate bottleneck and improve the game’s performance.

There are many custom shaders in the project that use a lot of resources, especially on mobile devices. It’s a good idea to change these shaders to the Legacy/Mobile shaders. This change greatly improves performance. After turning off the resolution scaling mode, the project sees an average increase in speed by 20 ms.

To ensure optimal performance, it’s essential to check for overdraw after optimizing shaders in the Unity Editor. However, the built-in implementation of Unity to visualize overdraw may result in false positives on opaque objects due to the additive shader without z-writing. To overcome this issue, it’s recommended to use an external tool such as RenderDoc to monitor the places in the scene with high overdraw (Bonet, 2020).

RenderDoc can be used to monitor overdraw in the Unity Editor. To access RenderDoc, right-click on the Game View or Scene View tab and choose the ‘Load RenderDoc’ option.

During play mode, capture the frame in RenderDoc and view the overdraw in the ‘Texture Viewer’ panel. This will help identify areas in the scene that require optimization to reduce overdraw and improve performance.

In the ‘Texture Viewer’ panel of RenderDoc, there is an event browser on the left side. To use this tool, open the ‘UIR.DrawChain’ and then ‘Camera.Render’ sections. Inside ‘RenderDeferred.Gbuffer’, you can view each mesh that is being drawn. To see overdraw, click on one of the events and select ‘Quad Overdraw (Draw)’. The overdraw is then painted on the screen, allowing you to identify areas in the scene with high overdraw that may require optimization.

Unfortunately, it appears that Unity has already optimized this demo for overdraw, meaning that it’s not possible to further improve the performance in this particular area, except for the post-processing. A simple post-processing effect such as color grading will probably touch every single pixel at least once. This adds one level of overdraw, as it is redrawing every pixel of your screen.

Some of the most common post-processing effects that cause overdraw are bloom, depth of field, anti-aliasing, volumetric lighting, screen space reflections and ambient occlusion. The project currently has the following post-processing effects. Luckily, there already is a profile for performance. This turned off the SSR and ambient occlusion.

Also, anti-aliasing can be completely turned off. On mobile screens with high pixel densities, the individual pixels are already so small that the jagged edges of objects are less noticeable, even without anti-aliasing.

These changes together gave a nice performance boost to around 30 FPS.

As features are removed from a Unity project to optimize performance, visual quality can suffer. To compensate for this loss, a fake bloom effect can be implemented. Bloom is a post-processing effect that adds a soft, glowing effect to bright areas of the scene, creating a more visually appealing and cinematic look. Implementing a fake bloom effect can help improve the visual quality of the scene without significantly impacting performance, as it does not require additional rendering passes or pixels.

When the features were removed to optimize performance in a Unity project, visual quality decreased. One solution is to add a fake bloom effect, which adds a soft, glowing effect to bright areas without impacting performance a lot, resulting in a more visually appealing and cinematic look.

The package I downloaded from for a ‘fake bloom’ effect is from this page: https://simonschreibt.de/gat/doom-3-volumetric-glow/

As stated before, real post-processing bloom is computationally expensive, because it involves rendering the scene multiple times and applying a blur filter to bright areas of the image, which can exceed the processing capacity of mobile GPUs with their limited fill rates. The DoomGlow classes use 3D geometry and vertex colors to create a similar glowing effect, which is more efficient as it can be optimized to reduce the number of pixels that need to be processed.

The DoomGlowManager class manages and updates a set of DoomGlow objects based on their visibility, calling their UpdateMeshVR methods to update their meshes based on the camera and light source positions. Each DoomGlow object calculates the extrusion vectors for its quad and updates its vertex positions and colors based on a gradient that fades from the quad color to transparent at the edges.

With all the improvements implemented, the application can now achieve a stable frame rate of 30 fps on mobile devices. By adjusting the resolution scaling mode to a slightly lower setting, it’s possible to achieve a smoother frame rate of 60 fps without significant loss of visual quality, resulting in a better user experience on mobile devices.

Code optimization I couldn’t ignore

During profiling, it was noticed that several frames in the program were experiencing a significant performance hit, largely due to the physics. After taking a closer look, it became clear that the enemy controller was responsible for much of this slowdown, particularly because of the extensive Raycasting it employs. As a result, there are several potential solutions to improve the performance.

  • Reduce the frequency of function calls that are made every frame to improve performance.
  • Simplify collision detection methods to avoid using computationally expensive functions like Physics.Raycast and Rigidbody.SweepTest.
  • Minimize the use of m_NavMeshAgent.Warp, and only call it when necessary, such as when the player moves a significant distance.
  • Optimize the calculation in the if statement within the ForceMovement() method to avoid unnecessarily adding Physics.gravity * Time.deltaTime every frame.

Upon further examination of the code, it was discovered that falling animation is triggered by the animator rather than gravity. Consequently, when the falling condition is not met, the falling animation still plays. Because all enemies are spawned directly above the floor, they need to fall onto the ground before the grounded check can be performed. Therefore, it may be possible to reduce the frequency of the grounded check to once every few seconds after the enemies have touched the ground.

As a result of implementing this change, the program’s performance improved by approximately 4ms, which is a significant boost overall.

What if there was more time?

Given more time, the approach would be to continue identifying the next heaviest performance cost and looking for optimization techniques to address it. This would involve a continuous cycle of profiling, analysis, and implementation of optimizations to improve performance. It is important to keep in mind that a balance between optimization and visual quality must be maintained throughout the process, to ensure that the game runs smoothly while still providing a good user experience.

Sources