Sunday, 18 April 2021

Rust Game Development


 


Ever since I started learning Rust my intention has been to use it for writing games. In many ways it seems like the obvious choice but the restrictions around lifetimes and ownership can make it hard to apply usual game programming patterns. Rust is sufficiently different from all the other languages to require a different its own way to think about problems.

To learn more about how best to use Rust for game development I decided to write a Windows version of an Android game I had made a couple of years ago in Java. Doing a
rewrite meant that all my efforts were focussed on the programming problems rather than game design issues.

I wanted to make sure that this is a 'proper' game in the sense that it deals with all the ancillary issues such as saving progress, adjusting settings etc. and is built in a way that in a way that could be used as a starting point for other, more complex games. I also wanted to avoid, as much as possible, using Cell or RefCell which can be really useful but can sometimes chip away at some of the advantages of Rust.

Crates

There are a lot of game development libraries for Rust that provide a one stop solution for sound, graphics, UI, controls, etc. I think these can be great, especially in the early stages of development. My own preference is to assemble a set of set of smaller, more targeted crates. Rust's dependency management makes this particularly easy to do.

Most Rust game development libraries push some kind of Entity-Component solution. I don't think there is anything wrong with them but I wanted to to 'discover' the need for one rather than assuming that I needed to use one. ( Also, having spent a lot of time debugging Guice on Java I have become weary of code that automagically wire up things behind the scenes )

The downside with assembling your own set of crates is that sometimes they do not play nicely together and it can be tricky to find out why. I spent ages tracking down a problem caused by winit making drag and drop support the default which did not play well with cpal because they had incompatible Windows Thread Apartment models.

The crates I ended up using are;

  • glium. This is one of my favorite Rust crates. It is the almost perfect wrapper for OpenGL; it handles all the tedious and error prone state management but is close enough to the OpenGL API to keep my existing OpenGL knowledge relevant
  • rodio In my previous projects I used cpal which is an excellent library but this time I wanted something more high level that would let me just focus on what to play and when to play it. rodio definitely delivered.
  • glium-glyph This crate does a nice job of making glium play nicely with glyph-brush making text rendering very easy in glium
  • nalgebra-glm I really like what is is trying to do and it works but ergonomics aren't always great. It never felt intuitive like the other crates I used and because it uses deep multi-level traits it plays poorly with rust-analyzer
  • serde-json Another crate that 'just works'

High level architecture

Rust's ownership model puts strict limits on the what kind of architecture's are allowed. The biggest language restriction is that there can only be one mutable reference to a piece of data. Many design choices revolve around how to manage those mutable references. My experience is that the restrictions force you to think much harder about the design but this ultimately results in much cleaner designs.

Below is the very high level design that I ended up with;


 

  • Page This is trait that each page object implements. Examples of pages are; start menu, settings menu, game page. Only one page is active at any time and the implementation of each page keeps track of its own mutable data. The pages communicate with the rest of the system by sending PageAction messages to the PageManager.
  • PageManager Holds mutable references to all pages and keeps track of the active page. The page manager routes calls to the active page and is also responsible for coordinating access to shared objects, such as configuration data or glium.
  • PageAction Pages send PageAction messages to Page Manager to accomplish tasks that affect objects that are not owned by the Page. Examples of PageActions are; Save Settings, Change Page, Change screen mode.

When the game first starts up it first loads all the assets and then transfers the ownership to the PageManager.

This design works quite well and makes the ownership of any data very clear. There is a danger that PageManager could evolve into a god object as the project becomes more complex. I could foresee a situation where the number of PageActions could become hard to manage and tedious to maintain.

I don't think the design ended up being that different from the Java version. The main difference is that it is much cleaner in Rust, mainly because the ownership rules make it harder to break your own design rules. There were times where it would have been short-term convenient for multiple pages to control the same mutable object but would have introduced longer term code debt (In many ways Rust makes it harder to create code debt)

Inefficiencies

The glium-glyph crate sets glium's DrawParameters when a glyph_brush object is created. This means that if you need different DrawParameters you need to create a new glyph_brush object. Usually this would never happen but the game uses OpenGL scissor rectangles to constrain the draw area which is set on DrawPrameters. The scissor rectangle depends on the window dimensions so whenever the window is resized the scissor rectangle must be adjusted. This meant that the entire glyph_brush object had to be recreated just to change one value.

I had three ways to address this;

  • Fork my own version of the glium-glyph that let me override the DrawParameters. I didn't like this as the whole point of using crates is to outsource complexity.
  • Submit a PR for to allow DrawParam overrides. I liked this better but I wasn't convinced that solving my specific uses case would improve.
  • Live with the inefficiency Although recreating the entire object every time window size changes is horribly inefficient it doesn't really matter. It makes no perceptible difference to the user so there is little point in spending time changing it.

Rust Enums are awesome

Controlling all the different visual transitions can quickly turn the code into an unreadable mess. For example; when you enter a game page, it needs to control fading in, fly text in and out, drop the movable pieces and disable/enable the input.

Before using Rust I would typically end up with tons of little variables that would either be used only during a particular sequence or have variables whose meaning depended on the current state. Both made it very hard to reason about the code and had a tendency to introduce some very tricky bugs.

Rusts enums let me store the current state and the data relating to that state in the same enum. This stops polluting the code with state dependent variables. Combined with the match functionality and its requirement for completeness enums are very incredibly powerful. The more I used them, to more uses I find for them.

enum GameState{
    ShowingNewLevel( f64 ),
    Playing,
    ShowingSolution( f64 ),
    InGameMenu,
    ChangingPage( PageAction, f64 )
}

Code readability

It takes a little bit time to appreciate this. Initially all Rust code is a bit overwhelming because there are so many new concepts and the syntax is unfamiliar. Once I got used to the syntax I found that it was actually faster to read and understand Rust code. Because Rust is so strict about object ownership it is easier to build a mental model about how the code works. (This is not entirely true for code that has complex lifetime annotations which I still struggle with)

This project has been my pet project for some time with very infrequent opportunities to spend time on it. Usually when I don't spend time on pet project for a couple of weeks I find it tricky to get back into it. With this project I found it easier because Rust makes the state of any code more binary; if the code compiles I can be fairly confident that there are no unresolved design issues.

Conclusion

Rust changes the way you think. Working with other languages after spending time with Rust feels like entering a wild west where anything goes.

 Not having well defined ownership or rules on mutability feels irresponsible.

Code

You can get the source code for the project on github. My original original Android game is here.

Sunday, 5 July 2020

Writing a winning 4K intro in Rust


I recently wrote my first 4K intro in Rust and released it at the Nova 2020 where it took first place in the new school intro competition. Writing a 4K intro is quite involved and requires you to master many different areas at the same time. Here I will focus on what I learned about making Rust code as small as possible.



You can view the demo on youtube, download the executable at pouet or get the source code from github

A 4K intro is a demo where the entire program ( including any data ) has two be 4096 bytes or less so it is important that the code is as space efficient as possible. Rust has a bit of a reputation for creating bloated executables so I wanted to find out if is possible to create very space efficient code with it.

The setup

The entire intro is written in a combination of Rust and glsl. Glsl is used for rendering everything on screen but Rust does everything else; world creation, camera and object control, creating instruments and playing music etc.

Some of the features I depend on, such as xargo, are not yet part of stable Rust so I use the nightly rust toolchain. To install and use the nightly toolchain as default you need the following rustup commands.

rustup toolchain install nightly
rustup default nightly

I use crinkler to compress the object file generated by the rust compiler.

I also used shader minifier for pre-processing the glsl shader to make it smaller and more crinkler friendly. The shader minifier doesn't support output into .rs files so I ended up using its raw output and manually copying it into my shader.rs file. (In hindsight, I should have written something to automate that stage. Or even created a PR for shader minifier)

The starting point was the proof of concept code I developed earlier (https://www.codeslow.com/2020/01/writing-4k-intro-in-rust.html) which I thought was pretty lean at the time. That article also goes into but more detail about setting up the toml file and how to use xargo for compiling tiny executable.

Optimizing the design for code size

Many of the most effective size optimizations have nothing to do with clever hacks but are the result of rethinking the design.

My initial design had one part of the code creating the world, including placing the spheres and another part was responsible for moving the spheres. At some point I realized that the sphere placement and sphere moving code were doing very similar things and I could merge them into one sightly more complicated function that did both. Unfortunately, this type of optimization can make the code less elegant and readable.

Looking at the assembly code

At some point you have to look at the compiled assembly code to understand what the code gets compiled into and what size optimizations are worth it. The Rust compiler has a very useful option, --emit=asm for outputting assembler code. The following command creates a .s assembly file;

xargo rustc --release --target i686-pc-windows-msvc -- --emit=asm

It is not necessary to be an expert in assembler to benefit from studying the assembler output but it definitely helps to have a basic understanding assembler syntax. The release version uses opt-level = "z which causes the compiler to optimize for the smallest possible size. This can make it a bit tricky to work out which part of the assembly code corresponds to which part of the Rust code.

I discovered that the Rust compiler can be surprisingly good at minimizing code; getting rid of unused code and unnecessary parameters and folding code. It can also do some strange things which is why it is essential to occasionally study the resulting assembly code.

Using cargo features

I worked with two versions of the code; one version does logging and allows the viewer to manipulate the camera which is used for creating interesting camera paths. Rust allows you to define features that you can use to optionally include bits of functionality. The toml file has a [features] section that lets you declare the available features and their dependencies. My 4K intro has the following section in the toml file;

[features]
logger = []
fullscreen = []

Neither of the optional features has dependencies so they effectively work as being conditional compilation flags. The conditional blocks of code are preceded by #[cfg(feature)] statement. Using features in itself does not make the code smaller but it makes development process much nicer when you easily switch between different feature sets.

        #[cfg(feature = "fullscreen")]
        {
            // This code is compiled only if the full screen feature has been selected
        }

        #[cfg(not(feature = "fullscreen"))]
        {
            // This code is compiled only if the full screen feature has not been selected
        }

Having inspected the compiled code I am certain that only the selected features get included in the compiled code.

One of the main uses of features was to enable logging and error checking for the debug build. The code loading and compiling the glsl shader failed frequently and without useful error messages it would have been extremely painful to find the problems.

using get_unchecked

When putting code inside an unsafe{} block I sort of assumed that all safety checks would be disabled within this block but this is not true, all the usual checks are still applied and these checks can be expensive.

By default Rust range checks all array accesses. Take the following Rust code

    delay_counter = sequence[ play_pos ];

Before doing the table look up the compiler would insert code that checks that play_pos is not indexing past the end of sequence and panic if that was the case. This adds considerable size to the code as there can be a lot of table look-ups like this.

Converting the above code into

    delay_counter = *sequence.get_unchecked( play_pos );

tells the compiler to not perform any range checks and just do the table look-up. This is clearly a potentially dangerous operation and can thus only be performed within an unsafe code block

Making loops space efficient.

Initially all my loops used the idiomatic rust way of doing loops, using the for x in 0..10 syntax which I just assumed would be compiled into tightest possible loop. Surprisingly, this was not the case. The simplest case;

for x in 0..10 {
    // do code
}

would get translated into assembly code that did the following;

    setup loop variable
loop:
    check for loop condition    
    if loop finished, jump to end
    // do code inside loop
    unconditionally jump to loop
end:

whereas if did the following rust code

let x = 0;
loop{
    // do code
    x += 1;
    if i == 10 {
        break;
    }
}

would get directly compiled into;

    setup loop variable
loop:
    // do code inside loop
    check for loop condition    
    if loop not finished, jump to loop
end:

Note that the loop condition is checked at the end of each loop which makes the unconditional jump unnecessary. This is small space saving for one loop but they do add up when there are 30 loops in the program.

The other, much harder to understand, problem with the idiomatic Rust loop is that in some cases it the compiler would add some additional iterator setup code that really bloated the code. I never fully understood what triggered this additional iterator setup as it was always trivial to replace the for {} constructs with a loop{} construct.

Using vector instructions

I spent a lot of time optimizing the glsl code and one of the best class of optimizations ( which also usually made the code run faster) was to operate on an entire vector at a time instead of operating at a component at a time.

For example, the ray tracing code use a fast grid traversal algorithm to check which parts of the map each ray visits. The original algorithm considers each axis separately but it is possible to rewrite the algorithm so it considers all axes at the same time and does not need any branches. Rust doesn't really have a native vector type like glsl but you can use intrinsics to tell it to use SIMD instructions.

To use intrinsics I would convert the following code

        global_spheres[ CAMERA_ROT_IDX ][ 0 ] += camera_rot_speed[ 0 ]*camera_speed;
        global_spheres[ CAMERA_ROT_IDX ][ 1 ] += camera_rot_speed[ 1 ]*camera_speed;
        global_spheres[ CAMERA_ROT_IDX ][ 2 ] += camera_rot_speed[ 2 ]*camera_speed;

into

        let mut dst:x86::__m128 = core::arch::x86::_mm_load_ps(global_spheres[ CAMERA_ROT_IDX ].as_mut_ptr());
        let mut src:x86::__m128 = core::arch::x86::_mm_load_ps(camera_rot_speed.as_mut_ptr());
        dst = core::arch::x86::_mm_add_ps( dst, src);
        core::arch::x86::_mm_store_ss( (&mut global_spheres[ CAMERA_ROT_IDX ]).as_mut_ptr(), dst );

which would be quite a bit smaller ( but a lot less readable ). Sadly, for some reason this broke the debug build while working perfectly on the release build. Clearly, this is a problem with my intrinsics knowledge and not a problem with Rust. This is something I would spend more time on for my next 4K intro as the space saving were significant.

Using OpenGL

There are lot of standard Rust crates for loading OpenGL functions but by default they all load a very large set of OpenGL functions. Each loaded function takes up some space because the loader has to know its name. Crinkler does a very good job of compressing this kind of code but it is not able to completely get rid of the overhead so I had to create my own version gl.rs that only includes the OpenGL functions that are used in the code.

Conclusion

My first objective was to write a competitive proper 4K intro to prove that language was suitable for scenarios where every single byte counts and you really need low level control. Typically this has been the sole domain of assembler and C. The secondary objective was to write it using idiomatic Rust as much possible.

I think I was fairly successful on the first objective. At no point during the development did I feel that Rust was holding me back in any way or that I was sacrificing performance or capabilities because I was using Rust rather than C.

I was less successful on the second objective. There is far too much unsafe code that doesn't really need to be there. Unsafe has a corrupting effect; it is very easy to use unsafe code to quickly accomplish something (like using mutable statics) but once the unsafe code is there it begets more unsafe code and suddenly it is everywhere. In the future I think I would be far more cautious about using unsafe and only use it when there really is no alternative.

Rust Game Development

  Ever since I started learning Rust my intention has been to use it for writing games. In many ways it seems like the obvious choice but th...