Neurons For Coders Part 2 – Back propagation

In this series I’m working through some of the basics from Michael Nielson’s excellent book on neural networks, and intend to then proceed with some more advanced bits afterwards. Along the way, I’m going to attempt to translate some of the more math heavy aspects of neural networks into a form more targeted at engineers.

The git hub repo for this series is here: https://github.com/chriscummings100/neuralfun

So, back propagation – the magic of neural networks, and the process that makes deep neural networks (i.e. lots of layers) practical. Michael refers to the equations involved as quite beautiful, and I wholeheartedly agree, but even without the actual maths, the elegant simplicity in the process of training a neural network is something quite special!

As with my previous post, before going any further, here’s a screenshot to get you incredibly excited!

The Only Bit Of Maths

There is one mathematical concept that you really need to be a little comfortable with in the world of back propagation, and that is the concept of a gradient, otherwise known (roughly) as a derivative, or rate of change. I don’t want to dig too deep into explaining math principles here – there’s plenty of books about it, but this is so important that I’ll at least have a go!

A gradient (or derivative) of a function can be thought of in lots of different ways – the steepness of a graph, its rate of change, a tangent or a slope. The way that matters to back propagation is that it defines how the parameters of a function are affecting its output at a given point. For example, with the function y=2x, a change in x always results in double the change in y.

With the multi-variable function z=y+2x we can say that a change in y always results in the same change in z, but a change in x always results in double the change in z. As such, we’ve got 2 different ‘gradient values’, one for x and one for y. Mathematically these are known as the partial derivatives. In the case of that simple function:

  • The partial derivative of z=y+2x with respect to y is 1.
  • The partial derivative of z=y+2x with respect to x is 2

In these 2 examples the partial derivatives are fairly intuitive and also constant – they are same regardless of the values for x or y. However in most cases, it’s more complex

  • The partial derivative of z=y2+2x with respect to y is 2y
  • The partial derivative of z=y2+2x with respect to x is 2

Proofs for the above reside in basic calculus, but we don’t need to worry about that here. What matters to us is that we can still work out constant values at specific points:

  • The partial derivative of z=y2+2x with respect to y when y=2 is 4
  • The partial derivative of z=y2+2x with respect to x when x=anything is 2

There is an important point to note here. In the case of y in the above equation, the gradient is continually changing. When y=2, the derivative of z with respect to y is 4, suggesting a change in y would result in 4 times the change in z. Whilst this is clearly not universally true (just try adding a big number to y and see what it does to the result), calculus tells us that for tiny changes in y, the gradient can be used as a good approximation to calculating corresponding changes in z.

A neural network can actually be thought of as a function – one where every weight and bias value is a parameter. However where you might have been happy picturing slopes or triangles in the above examples, with a neural network that has around 30000 weights and 40 biases (our earlier network) you’re into thousand dimensional spaces. My advice is don’t try to picture that – it’ll hurt your head and it won’t get you anywhere. However you can still ask, for a given set of inputs, how much did any specific bias or weight anywhere in the network affect the output.

The reason this question is important is because of a technique known as gradient descent. The idea is that if we know how a given parameter (eg weight 7 of neuron 2 in layer 1) is affecting the output of the network, and we know how wrong the output is, we can adjust the parameter to make the output less wrong! This is what happens with back propagation. We fire loads of different inputs into the network for which we know the desired output. For each one we calculate how wrong the output is, work out the gradients (or derivatives) with respect to every weight and bias, and then nudge them slightly in the opposite direction, resulting in an output that is less wrong. Rince/repeat a few million times and you have a trained neural network!

Now obviously there’s a load of maths to the above – especially if you want to prove it. The principles can be written with a few fairly simple equations, but in my experience, an extremely skilled but not-math-trained-brain is easily thrown off with a new symbol, or a T in an unexpected place, or a new version of multiplication that’s named after somebody…. Every game developer has been there at some point! Fortunately, the rest is fairly intuitive, and can be well understood with good old code. So let’s begin…

The Training Loop

I think this is all clearest starting from the outside and working in to the meaty bit, so we’ll start with a function called ‘DoTrainEpoch’, which does a single pass (or epoch) of training the network.

public IEnumerator DoTrainEpoch(Dataset dataset)
{
    m_epoch_progress = 0;
    
    //get randomly shuffled training data
    var training_data = GetShuffledTrainingData(dataset);
    
    //divide into batches, and train network off each batch 1 at a time
    int first_idx = 0;
    int batch_size = 10;
    while (first_idx < training_data.Length)
    {
        var batch = new ArraySegment<Dataset.Data>(
            training_data, first_idx, batch_size);
        TrainBatch(batch);
        first_idx += batch_size;
        m_epoch_progress = first_idx / (float)training_data.Length;
        yield return null;
    }
    
    //evaluate network after all batches done
    Evaluate(dataset);
}

This calls a function to get the training data from the dataset in a randomly shuffled array, then simply walks over the data in batches of 10. Each batch is passed to the TrainBatch function which modifies the network to get better at identifying that batch. The reason the data is shuffled is so that if we run another ‘epoch’, the batches passed in are different, so the network doesn’t just get very good with specific batches.

Note: the reason the above is in an enumerator with a yield is simply so we can get a pretty progress bar in Unity using a coroutine, and the call to Evaluate simply counts and logs how many records in the test data the network gets correct.

Whilst our app does have a button to train a single ‘epoch’, to wrap up the process of doing multiple ones, we also have another coroutine enumerator to do a full ‘training pass’:

public IEnumerator DoTrain(Dataset dataset, int epochs)
{
    m_train_progress = 0;
    for (int i = 0; i < epochs; i++)
    {
        yield return DoTrainEpoch(dataset);
        m_train_progress = (i+1) / (float)(epochs);
    }
}

As you can probably see, all we’re doing here is calling the training function multiple times.

Training a single batch

The batch is the gradient descent bit I mentioned earlier. It doesn’t actually calculate the gradients (we’ll do those next). Instead, its job is as follows:

  • Calculate the weight and bias gradients for every entry in the batch and add them all up.
  • The summed gradients are then scaled by a magic number (the learning rate) and divided by the batch size
  • Finally, the scaled gradients are subtracted from their corresponding weights and biases.

We’ll cover TrainBatch in a few chunks as its simple but big!

//these are gradients of the cost function with respect to the weights and biases
//they all start out by 0, and are adjusted by the backpropagation algorithm
var dweight_by_dcost = new double[NumLayers][][];
var dbias_by_dcost = new double[NumLayers][];
for (int layer_idx = 1; layer_idx < NumLayers; layer_idx++)
{
    int layersize = LayerSize(layer_idx);
    dbias_by_dcost[layer_idx] = new double[layersize];
    dweight_by_dcost[layer_idx] = new double[layersize][];
    int numweights = LayerSize(layer_idx - 1);
    for(int neuron = 0; neuron < layersize; neuron++)
        dweight_by_dcost[layer_idx][neuron] = new double[numweights];
}

This first bit is nice and simple! All we’re doing here is creating some arrays to contain the summed up weight and bias gradients, all initialized to 0.

Next is the meat of the function, summing up the gradients

foreach (var datapoint in batch)
{
    //run back prop for this data point, which gives us
    //a dweight_by_cost and dbias_by_dcost just for this data point
    CalculateBackPropGradients(datapoint, 
        out double[][][] datapoint_dweight_by_dcost, 
        out double[][] datapoint_dbias_by_dcost);
    
    //now add the gradients for this data point to the running total
    for (int layer_idx = 1; layer_idx < NumLayers; layer_idx++)
    {
        for(int neuron = 0; neuron < LayerSize(layer_idx); neuron++)
        {
            dbias_by_dcost[layer_idx][neuron] += 
                datapoint_dbias_by_dcost[layer_idx][neuron];
            
            for (int weight_idx = 0; 
                weight_idx < dweight_by_dcost[layer_idx][neuron].Length; 
                weight_idx++)
            {
                dweight_by_dcost[layer_idx][neuron][weight_idx] += 
                    datapoint_dweight_by_dcost[layer_idx][neuron][weight_idx];
            }
        }
    }
}

So long as you trust my CalculateBackPropGradients function, this code is doing a pretty simple job. It’s getting the gradients for a specific data point, then simply adding them to the end result.

By the time the above is done, we’ll have all our weight and bias gradients summed up. For example, dbias_by_dcost[1][7] will contain the sum of all layer 1, neuron 7 bias gradients for each datapoint in the batch.

And finally, the weights and biases for the whole network are adjusted accordingly (the gradient descent)

//finally, we can apply 'gradient descent' to the
//weights and biases - i.e. adjust them in the 
//opposite direction to the gradient, scaled
//by constant learning rate over batch size
{
    double batch_learning_rate = 3.0 / (double) batch.Count;
    
    //now add the gradients for this data point to the running total
    for (int layer_idx = 1; layer_idx < NumLayers; layer_idx++)
    {
        for (int neuron = 0; neuron < LayerSize(layer_idx); neuron++)
        {
            m_biases[layer_idx][neuron] -= 
                batch_learning_rate * dbias_by_dcost[layer_idx][neuron];
            
            for (int weight_idx = 0; 
                weight_idx < m_weights[layer_idx][neuron].Length; 
                weight_idx++)
            {
                m_weights[layer_idx][neuron][weight_idx] -= 
                    batch_learning_rate * dweight_by_dcost[layer_idx][neuron][weight_idx];
            }
        }
    }
}

Here, once again we’re looping over all neuron biases and weights, this time subtracting the calculated gradients, scaled by a learning rate (a magic number) and divided by the batch size. Why subtract? The gradients are derivatives with respect to the cost – in other words, they represent how to adjust the weights and biases if we want the network to get worse! We want to go in the opposite direction, so we subtract.

Modifying The Forward Feedback

Before diving into the actual back propagation function, we need to modify the FeedForward process from part 1. Back propagation works by running the neural network forward, evaluating the result, and then using it to tune the network’s parameters. However, to do so, it needs to know about the state of all the neurons in the network.

In this modified version, we’re outputting 2 lists – one is the activations of all neurons in each layer, and another is a new concept – the weighted inputs:

public void FeedForward(
    Dataset.Data data, 
    out List<double[]> weighted_inputs, 
    out List<double[]> activations)
{
    weighted_inputs = new List<double[]>();
    activations = new List<double[]>();
    
    double[] a = data.m_inputs;
    
    //add activations of input layer, and a null set of weighted
    //inputs, as it had none!
    weighted_inputs.Add(null);
    activations.Add(a);
    
    //feed forward through the network, outputting weighted
    //inputs and activations at each step
    for (int layer = 1; layer < m_sizes.Length; layer++)
    {
        double[] weighted_input = new double[m_sizes[layer]];
        double[] biases = m_biases[layer];
        double[][] weights = m_weights[layer];
        for (int neuron = 0; neuron < m_sizes[layer]; neuron++)
        {
            weighted_input[neuron] = biases[neuron] + Dot(weights[neuron], a);
        }
        a = Sigmoid(weighted_input);
        
        weighted_inputs.Add(weighted_input);
        activations.Add(a);
    }
}

This isn’t too different to the original, however now we have added:

  • A new concept – the weighted inputs, which are simply the values that get generated by combining a neuron’s inputs and biases before being fed into the sigmoid activation function
  • At every layer in the network, we’re outputting the full set weighted inputs and activation

This version literally just outputs more info than the old version. In fact, to help maintain the old functionality, I’ve actually wrapped the above in one with the same interface as the original:

public double[] FeedForward(Dataset.Data data)
{
    FeedForward(data, out _, out List<double[]> activations);
    return activations[activations.Count - 1];
}

All we’re doing is running the complicated one and returning the activations from the final layer.

Back Propagating!

Ok! It’s time for the back propagation function, who’s job it is to calculate gradients for all the weights and biases in the network. To do so, first a nice easy bit – we’re going to run the feed forward process. I’ve included the function signature here too so you can see it:

public void CalculateBackPropGradients(Dataset.Data data, out double[][][] dweight_by_dcost, out double[][] dbias_by_dcost)
{
    //these are gradients of the cost function with respect to the weights and biases
    dweight_by_dcost = new double[NumLayers][][];
    dbias_by_dcost = new double[NumLayers][];
    
    //do feed forward step and record all zs and activations
    FeedForward(data, out List<double[]> weighted_inputs, out List<double[]> activations);

At this point we now have 2 lists – the weighted inputs for each layer, and the activations for each layer.

If you read about training neural networks a term you’ll hear a lot is the cost function. This is ultimately a single measure of how right or wrong the output of your network was. As in Michael’s book, we’re going to start with what’s known as the ‘quadratic error’ function – the sum of the squares of the error for each output neuron.

Up until now I’ve talked vaguely about gradients for weights and biases etc. But what I’m really referring to when I say the gradient for the bias of neuron 3 in layer 2 is the derivative of the cost function with respect to the bias of neuron 3 in layer 2. In other words, if the cost function is a measure of how wrong our network was, we’re looking to calculate how a given parameter contributed to that wrongness.

Important note: saying derivative with respect to repeatedly is annoying and very mathy, so from here on I’m just going to refer to weight gradients or activation gradients etc. When I say weight gradient, I mean the derivative of the cost function with respect to a weight value.

We’ve also only talked about gradients for weights and biases, but we can ask ‘how much did it affect the result?’ for aspect of the network. How much did the final output of neuron 2 in the output layer affect the result? How much did the weighted input for weight 2 on neuron 4 of layer 1 affect the result? With back propagation we start by asking this question about the activations of the output layer, then use some equations based in calculus to work the results backwards through the network.

So let’s start working backwards!

int layer_idx = NumLayers - 1;
double[] weighted_input_gradient;
{
    double[] activation_gradient = Sub(activations[layer_idx], 
                                       data.m_vectorized_output);

This first step calculates the activation gradient for each of the output neurons. It’s at this point that our choice of the ‘quadratic error’ cost function comes in handy – a little calculus shows that the derivative of the cost function with respect to a given output activation (aka the activation gradient) is just the actual result minus the desired result. More complex cost functions result in a little more complex math, but the code wouldn’t change a great deal.

Next, we’re going to take that activation gradient, and use it to calculate the weighted input gradient:


    weighted_input_gradient = Mul(activation_gradient, 
                                SigmoidDerivative(weighted_inputs[layer_idx]));

This is the first taste of how the elegant process of back propagation works. We started out by answering the question ‘how does a given output activation affect the cost function’? We are now going to use the result to answer another question – ‘how does the corresponding weighted input affect the cost function?’.

The answer that calculus provides is relatively intuitive – the weighted input gradient is equal to the corresponding activation gradient multiplied by the sigmoid function’s gradient. Why is this intuitive? Remember the shape of the sigmoid function:

The sigmoid function converts weighted inputs to activations. For very large weighted inputs (values of x in the above graph), the sigmoid function is very flat, so a change in the weighted input will have a minimal effect on the output activation (f(x) in the above graph). However, for small weighted inputs, the sigmoid function is very steep, so a change in weighted input will have a signficant affect on the activation. So, by calculating how the activation affects the cost, then scaling it according to how flat the sigmoid function is, we get how the corresponding weighted input affects the cost.

The bias for the neuron is next, and it’s pretty easy:

    //bias gradient is equal to the weighted input error 
    dbias_by_dcost[layer_idx] = weighted_input_gradient;

This might look odd at first, but consider that the bias is just a constant we add on to the sum of the inputs from previous neurons. i.e. a change of 10 in the bias of the neuron corresponds to a change of 10 in the weighted input. So if we know the weighted input has a given effect on the cost function, we can say the bias has exactly the same effect.

The weights are only slightly more complex:

    //each weight gradient for each neuron is the activation of the corresponding
    //connected neuron in the previous layer times the weighted input error
    dweight_by_dcost[layer_idx] = new double[LayerSize(layer_idx)][];
    for (int neuron = 0; neuron < LayerSize(layer_idx); neuron++)
    {
        dweight_by_dcost[layer_idx][neuron] = 
            Mul(activations[layer_idx - 1], weighted_input_gradient[neuron]);
    }

Again, if you think about it, this isn’t too strange. The effect of a given weight on the weighted input depends on the activation of the neuron it links to. If the neuron it links to had a huge activation, then the weight of the link will greatly affect the weighted input (and hence the activation, and hence the cost…). Conversely, if the neuron it links to had an activation of 0, then changing the weight would have no effect whatsoever!

So where are we so far? We’ve managed to work out:

  • The activation gradient for each output neuron
  • From that, the weighted input gradient for each output neuron
  • From that, the bias gradient for each output neuron
  • And the all the weight gradients for each output neuron

Now, having reduced layer_idx by 1, comes the final, beautiful bit of magic

while (layer_idx >= 1)
{
    //this is the most special bit! here we're taking the weighted input error from the next layer and
    //using it to calculate the weighted input error for this layer. aka back propagation!
    {
        double[] next_weighted_input_error = new double[LayerSize(layer_idx)];

        for (int neuron = 0; neuron < LayerSize(layer_idx); neuron++)
        {
            //calculated weighted sum of the weighted input gradients we've
            //already calculated for the connected neurons
            double next_activation_gradient = 0;
            for (int recv_neuron = 0; 
                 recv_neuron < LayerSize(layer_idx + 1); 
                 recv_neuron++)
            {
                next_activation_gradient += 
                    m_weights[layer_idx + 1][recv_neuron][neuron] * 
                    weighted_input_gradient[recv_neuron];
            }

            //with the sum, we can now calculate the weighted input
            //error in this layer for this neuron
            next_weighted_input_error[neuron] = 
                next_activation_gradient * 
                SigmoidDerivative(weighted_inputs[layer_idx])[neuron];
        }

        weighted_input_gradient = next_weighted_input_error;
    }

We’re entering a loop now, working back through the layers. At the start of each loop, we do the beautiful maths above. A simple sum uses the weights that link the neuron in this layer to those of the layer after it (i.e. the ones we just did the work for) to calculate the activation gradient for each neuron in this layer.

Once again, when you think about it, this makes sense. If we know the weighted input for a connected neuron in the next layer had a huge effect on the cost, and the weight of the link connecting this neuron to it is large, then it stands to reason the activation for this neuron will have had a huge effect on the cost as well. Conversely, if the weight of the link is 0, then the activation of this neuron is irrelevant. Similarly, if the connected neuron’s weighted input had no effect on the output, then the activation of this neuron must also have had no effect.

Having made that final leap, exactly the same math as we used earlier is applied to convert this new activation gradient to a new weighted input gradient.

And finally, all we have to do is repeat the previous process for the weights and biases in this layer.

    //repeat the process of calculating the bias and weight gradients
    //for this layer just as we did for the first layer
    dbias_by_dcost[layer_idx] = weighted_input_gradient;
    dweight_by_dcost[layer_idx] = new double[LayerSize(layer_idx)][];
    for (int neuron = 0; neuron < LayerSize(layer_idx); neuron++)
    {
        dweight_by_dcost[layer_idx][neuron] = 
            Mul(activations[layer_idx - 1], weighted_input_gradient[neuron]);
    }
    layer_idx--;
}

That’s it! Basic back propagation in a nutshell. No matter how deep the network, this loop can continue to step back, layer by layer, repeating the same calculations to produce values for the weight and bias gradients. The core of training a neural network comes down to this wonderful combination of the following tools:

  • The math to calculate the output layer’s activation gradients
  • The ability to convert any activation gradient into a weighted input gradient
  • The ability to convert any weighted input gradient into corresponding bias and weight gradients
  • And finally, the ability to step back, and calculate the previous layer’s activation gradients from the weighted input gradients we already have

If you do care about the maths, it’s some low-to-medium level calculus – above high school level, but only just above high school level, involving a bit of practiced manipulation of equations and a mathematical tool known as the chain rule.

Putting it together

Ok! Enough text. Here’s the updated Unity scene in action. I’ve added a little boiler plate code to the project to support making some progress bars etc, but the vast majority is in the ‘Part2’ scene and script files.

Hopefully that’s pretty clear – after loading up the app, I randomize the network and it becomes demonstrably useless, clearly failing to recognise digits and producing an accuracy of 12% (so, luck basically – scores roughly 1/10 in a game where the odds of getting it right by chance are 1/10!). However even after a small amount of training, it begins to match characters well, and after a single epoch hits 93%. As the code is worse than unoptimized, it takes a while, back I can confirm that after 30 epochs it sits around 96%.

Well, I hope that’s helped somebody grasp the innards of network training – an aspect of AI that I think is a bit of a black box for many. In my next post, I’m going to try to switch to something a little more fun and do some experiments with it!

Neurons For Coders Part 1 – A Basic Neural Network

I’ve recently decided it’s time to get my understanding of neural networks from theoretical to solid, and have decided the best way to do that is to write some neural stuff from first principles, starting with help from the digital book by Michael Neilson here http://neuralnetworksanddeeplearning.com/, and implementing things in a simple unity app. Plus, whilst I am a long way from an expert in AI, I am pretty good at translating math speak to programmer speak. So, with a little luck, I might be able to give a little clarity to some engineers wanting to take their first steps into AI along the way!

My goal here is to get a neural network running, and then eventually training, initially using the principles from the book (without pure copy-paste-porting), and then perhaps expanding them using some more advanced tools such as the modern ADAM optimizer. Alongside, I want to put a bit of game developer in and maybe make some cool visualisations or something.

The git hub repo for this series is here: https://github.com/chriscummings100/neuralfun

The git hub repo for the book by Michael Neilson is here: https://github.com/MichalDanielDobrzanski/DeepLearningPython

Dumping out the book data

Whilst I want to eventually move well past any need for the code from the book’s git hub, it’ll help to get started if we have a simple pretrained network and easy access to the MNIST database. As such, my first step is simply modifying the Python files from the repo above to dump the network and training data out to simple binary files that I can parse in C#.

Adding this to the network.py file lets me dump out the network layer sizes, biases and weights (more on this later).

        #write the num layers, sizes, biases and weights to a binary file
        with open('network.bin', 'wb') as f:
            f.write(self.num_layers.to_bytes(4, byteorder='little'))
            for x in self.sizes:
                f.write(x.to_bytes(4, byteorder='little'))
            for x in self.biases:
                f.write(x.tobytes())
            for x in self.weights:
                f.write(x.tobytes())

The MNIST database, which both the book and this blog will work with, contains a huge set of 28×28 pixel grayscale handwritten digits, each correctly labelled. I found it a little tricky to download in a form that wasn’t very python oriented, so I modified the mnist_loader.py to dump out things in binary:

    training_data, validation_data, test_data = load_data()
    with open('mnist.bin', 'wb') as f:

        datalen = len(training_data[0])
        f.write(datalen.to_bytes(4, byteorder='little'))
        for idx in range(0,datalen):
            f.write(training_data[0][idx].tobytes())
            f.write(training_data[1][idx].astype(np.uint8).tobytes())

        <same stuff for validation + test data here>

With that done, I zipped up each file so they can be checked into git hub without using a gazillion bytes, and am ready to start in C#!

Into C# and Unity!

To get you super excited, this is target 1! A simple UI that lets me flick through the different images in the training set, see them along with their label, and also see what the trained neural network outputs.

So the meat of the work is going to be 1 component, with a few internal classes. The first key one, which will store (and eventually run/train) the neural network itself is this:

public class Network
{
    public int[] m_sizes;
    public double[][][] m_weights;
    public double[][] m_biases;

That is:

  • The size of each layer
  • The weights for each layer (not including a set for the first layer, as it’s the input)
  • The biases for each layer (ditto)

More on these later!

Our Dataset is a little more complex. It consists of a core ‘Data’ structure which represents a single known input/output:

public class Data
{
    //data from MNIST dataset
    public double[] m_inputs;
    public byte m_output;
    
    //10-element vector with 1.0 in the position of the output label
    public double[] m_vectorized_output;

Each record in the MNIST database is an array of doubles (a value for each pixel in the 28×28 grid), and a single byte to indicate which digit it should be (0-9). However we want to setup a network that has 10 outputs, where each represents the probability of a given digit. i.e. if output 3 is the highest, we assume it’s a 3. So if we know the expected character is 3, the vectorized output would be [0,0,0,1,0,0,0,0,0,0].

public class Dataset
{
public Data[] m_training_data;
public Data[] m_validation_data;
public Data[] m_test_data;

I won’t post the code to actually load the above arrays – it’s just some boring walking through a buffer, but you can read it in the git hub repo.

Drawing the data

I’m pretty happy with my maths, but nonetheless I like to visualize, plus pretty pictures are nice. So let’s write a simple function in the Data class to populate a gray scale texture so characters can be shown on screen:

public Texture2D CreateTexture()
{
    var tex = new Texture2D(28, 28, TextureFormat.RGB24, false);
    for (int y = 0; y < 28; y++)
    {
        for (int x = 0; x < 28; x++)
        {
            byte v = (byte)(m_inputs[y * 28 + x] * 255);
            tex.SetPixel(x, 27-y, new Color32(v, v, v, 255));
        }
    }
    tex.Apply();
    return tex;
}

Putting all the above together with a couple of buttons and a text box to show the label, we can now build ourselves a little Unity app to step through the different values in the database.

No actual neural network running yet, but we’ve got everything loaded and a simple UI that let’s us view and verify the data. As expected, when stepping through the database, the character and label match. That said, you’d be forgiven for thinking there was a bug given some people’s handwriting…

Running the network

Whilst I’m not an expert, I’ll try to give my programmers-eye-view of getting neural networks running (and eventually, training).

The idea is that for each layer we have a list of neurons that are linked to neurons in the previous layer. The structure of the network dumped out from the python file is as follows:

Network structure, http://neuralnetworksanddeeplearning.com/ chapter 1

After the input layer, each neuron requires 2 pieces of information:

  • A weight value per neuron from the previous layer
  • A single bias value

So in the above diagram, neuron 3 in layer 1 (the hidden layer) will contain 784 weights, plus a single bias value.

A given neuron’s ‘activation’ (the value it outputs) is then calculated using the following psuedo-code:

//add up weight*activation from all links to 
//neurons in previous layer
sum_of_inputs = 0
for(int i = 0; i < num_previous_layer_neurons; i++)
	sum_of_inputs += weight[i] * previous_layer_activation[i]

//neuron activation is the summed up inputs plus bias fed
//into activation function
activation = ActivationFunc(sum_of_inputs + bias)

This can be written much more simply however, if we think of the weights and activation arrays as really big vectors. If so, that first loop simply boils down to a dot product between 2 vectors, so we end up with:

activation = ActivationFunc(Dot(weights, prev_activations) 
                            + bias)

The activations for the first layer are the inputs to the network (our 28×28 grid of gray-scale pixel values). We run this process for each neuron for each layer in sequence, and end up with activations for the final layer, which is it’s outputs.

The activation function is most commonly the ‘Sigmoid’ function, which when plotted on a graph looks like this:

Ultimately it’s a ‘soft’ version of outputting 1 when the input is positive, and 0 when the input is negative. In code, it’s pretty simple:

public static double Sigmoid(double z)
{
    return 1.0 / (1.0 + Math.Exp(-z));
}

We’re almost ready to start neural networking, but need one extra helper function first. Earlier I mentioned everything is simpler if we can think of the weight and activation arrays as vectors. To do so, we’ll write a dot product function (the sum of all the components multiplied together):

public static double Dot(double[] a, double[] b)
{
    double res = 0;
    for (int i = 0; i < a.Length; i++)
        res += a[i]*b[i];
    return res;
}

Ok! All the tools ready, let’s write the feed forward function that runs the network:

public double[] FeedForward(Dataset.Data data)
{
    //start with activations from the inputs
    double[] current_activations = data.m_inputs;
    
    //feed forward through the network
    for (int layer = 0; layer < m_sizes.Length - 1; layer++)
    {
        //get biases and weights for this layer
        double[] biases = m_biases[layer];
        double[][] weights = m_weights[layer];
        
        //calculate the new activations
        int layer_size = m_sizes[layer + 1];
        double[] new_activations = new double[layer_size];
        for (int neuron = 0; neuron < layer_size; neuron++)
        {
            new_activations[neuron] = Sigmoid(
               biases[neuron] + 
               Dot(weights[neuron], current_activations)
            );
        }

        //store them
        current_activations = new_activations;
    }

    return current_activations;
}

This function is pretty simple, and is really just running the earlier per neuron pseudo code for every neuron in every layer sequentially. It starts by taking the inputs from the data as the activations for the first layer. These are fed into the 2nd, then the 3rd (and this could continue if we had lots more layers).

Note: point of confusion here is that m_weights[0] contains the weights for layer 1 and m_biases[0] contains the biases for layer 1. This is because the input layer has none.

That’s actually it for the neurons! The feed forward function takes the inputs, runs them through the network and generates the outputs. In the case of our network, it takes 784 inputs, runs them through the network, and generates 10 outputs – one for each possible digit.

Showing some results

The above is great, but not exactly readable. To get it on screen, we just need one final step – to convert those 10 outputs into a single digit, we need to find the biggest one. If output 3 is biggest, the digit is a 3. Simple!

    public static int MaxIndex(double[] a)
    {
        double max_val = -double.MaxValue;
        int max_idx = -1;
        for(int i = 0; i < a.Length; i++)
        {
            if (a[i] > max_val)
            {
                max_val = a[i];
                max_idx = i;
            }
        }
        return max_idx;
    }

Armed with that simple function we can now display which digit the neural network decided on. My full function that updates the image texture, label and result is called ‘SetData’ and looks like this:

    void SetData(Dataset.Data data)
    {
        if(m_current_texture)
            Destroy(m_current_texture);
        m_current_texture = data.CreateTexture();
        m_data_image.texture = m_current_texture;
        m_data_label.text = data.m_output.ToString();

        double[] results = m_network.FeedForward(data);
        m_result_label.text = MaxIndex(results).ToString();
    }

Job done!

Pretty impressive for the network generated in chapter 1!

All the code for this series can be found at https://github.com/chriscummings100/neuralfun, and the majority of the work for this post is in the Part1 folder.

Next time I’ll probably get training in so we’re no longer reliant on a network trained via the Python scripts provided by Michael’s book, though I may get side tracked by making pretty visualizations or something!

Proceed to my next: Neurons For Coders Part 2 – Back propagation for an introduction to the training process of neural networks.

Extending Boot Strap In Type Script

Intro

Hi! Over the past year or 2 whilst building multiverseonline.io (the web site for our app, Multiverse) I’ve had to add React to the list-of-stuff-I-kind-of-know. The site is driven by React, using React-Bootstrap on top, and all written in TypeScript. Unfortunately however, in my quest for knowledge on the subject, I’ve often found stack overflow and other sources can be summed up with the phrase:,

“there’s a million opinions on your question, but unless you know the right answer to your question, you can’t tell which is the right opinion

My issue in this post is a classic example of said questions. There’s plenty of answers, but identifying the correct one is none-trivial!

My Bootstrap Problem

React-Bootstrap has all sorts of handy stuff for creating nice React components, such as the Button

<Button variant='primary'>
    Some text
</Button>

Giving us results like this (from the Bootstrap documentation):

However, recently I wanted to create a nice reusable version of the light button that had an info icon inside it:

This sort of thing is exactly what React is designed to do, so with a pretty icon (from react-icons) and a tiny bit of code:

<Button variant="light" className="mv-circular-button">
    <BsInfo/>
</Button>

Along with some simple CSS:

.mv-circular-button {
    position: relative;
    height: 40px;
    width: 40px;
    padding: 0;
    border-radius: 50%;
    border: none;
    box-shadow: none !important;
}

.mv-circular-button svg {
    width: 75%;
    height: 75%
}

We have a simple button consisting of the following:

  • The Bootstrap Button with the light variant creates a nicely styled button element.
  • The BsInfo component comes from react-icons and is an SVG with that pretty ‘i’ inside it.
  • The button CSS makes it a fixed width/height, sets the border to 50% so it becomes circular, and removes the annoying shadow effect Bootstrap adds
  • The SVG CSS just enforces a size for the icon

All good so far, but react is all about encapsulation and reuse of components, and I want to use that same info button in multiple places.

A simple functional component

Well the obvious (and I’m told the 2021 way of doing things) is to create a simple functional component like this one:

export const MyInfoButton = () => (
    <Button variant="light" className="mv-circular-button">
        <BsInfo/>
    </Button>
)

It’s in a separate TSX file, and is called MyInfoButton. Now within my page I simply have to write:

<MyInfoButton/>

And we have a button! Of course a button isn’t much use unless it does something when you click it, so let’s add an onClick event…

<MyInfoButton
    onClick={() => console.log("hello")}
/>

As anybody with a bit of TypeScript/React experience will point out, this immediately fails to compile. We haven’t defined any properties at all for the info button yet, so the onClick property is unrecognised. To make our original component work, we need to define and pass in some properties:

type MyInfoButtonProperties = {
    onClick?: ()=>void
}
export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button onClick={props.onClick} variant="light" className="mv-circular-button" >
        <BsInfo/>
    </Button>
)

The MyInfoButton component now takes a set of properties defined by MyInfoButtonProperties. The only property for now is onClick, which is passed directly through to the Button component.

More properties

Now what if I want to start expanding on this? Maybe I want to be able to add a mouse move event and specify whether the button is disabled. Each of these are properties Button already supports, so I could just add them to MyInfoButtonProperties and pass them through as before:

type MyInfoButtonProperties = {
    onClick?: ()=>void;
    disabled?: boolean;
    onMouseMove?: ()=>void;
}
export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button 
        onClick={props.onClick} 
        onMouseMove={props.onMouseMove} 
        disabled={props.disabled} 
        variant="light" 
        className="mv-circular-button" >
        <BsInfo/>
    </Button>
)

But we’re just duplicating properties of the internal button here, which feels a little messy. To clean things up a little, we can use the ‘…’ syntax to help out:

type MyInfoButtonProperties = {
    onClick?: ()=>void;
    disabled?: boolean;
    onMouseMove?: ()=>void;
}
export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button {...props} 
        variant="light" 
        className="mv-circular-button" >
        <BsInfo/>
    </Button>
)

This simply passes anything in props through to the internal Button component. Much cleaner!

What about ALL the properties?

The model we’ve taken thus far is to work out what functionality from the internal Button component we want to expose, and provide properties to do exactly that – no more, and no less. This very constrained model makes perfect sense a lot of the time. Tightly constrained interfaces like this are a great way to keep your code understandable and reduce bugs. However, despite many an angry stack overflow comment, there are exceptions, and this is not ‘always the right thing to do’ (I hate that phrase)!

In this case I’m really just trying to add a small bit of functionality to the existing very powerful button. I specifically want a user of MyInfoButton to have immediate access to the 100s of properties any normal button has.

For a normal html button element (as apposed to a Bootstrap one), we can use the types supplied by React:

type MySimpleInfoButtonProperties = {
    //any custom bits
} & ButtonHTMLAttributes<HTMLButtonElement>
export const MySimpleInfoButton:FunctionComponent<MySimpleInfoButtonProperties> = (props) => (
    <button {...props} 
        className="mv-circular-button" >
        <BsInfo/>
    </button>
)

Here ButtonHTMLAttributes contains the list of all properties a standard HTML button can have. The generic HTMLButtonElement type argument is used in the definition of various event properties such as onClick.

Unfortunately for us, Bootstrap is not this simple!

Bootstrap defines its button like this:

import * as React from 'react';

import { BsPrefixComponent } from './helpers';

export interface ButtonProps {
  active?: boolean;
  block?: boolean;
  variant?:
    | 'primary'
    | 'secondary'
    | 'success'
    | 'danger'
    | 'warning'
    | 'info'
    | 'dark'
    | 'light'
    | 'link'
    | 'outline-primary'
    | 'outline-secondary'
    | 'outline-success'
    | 'outline-danger'
    | 'outline-warning'
    | 'outline-info'
    | 'outline-dark'
    | 'outline-light';
  size?: 'sm' | 'lg';
  type?: 'button' | 'reset' | 'submit';
  href?: string;
  disabled?: boolean;
}

declare class Button<
  As extends React.ElementType = 'button'
> extends BsPrefixComponent<As, ButtonProps> {}

export default Button;

Based on the above, the obvious starting point is a new set of properties as follows:

type MyInfoButtonProperties = {

    //any custom bits
} & ButtonProps & ButtonHTMLAttributes<HTMLButtonElement>

export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
)

Here I’ve extended the property type so it can include:

  • Any extra properties I might want to add later as my info button gets fancier
  • The core HTML attributes that any normal button needs, as per the previous pure react example
  • The bootstrap ButtonProps which include things like variant, size, disabled etc

I’ve also tweaked the 2 additional properties I pass to the internal button so that:

  • The default variant is light
  • The class name starts with “mv-circular-button” but can include additional classes passed in

So far we’re all good. My button still works!

The ‘as’ property

All good so far, but we’ve lost something important. Bootstrap supports a property named “as” on all its types, that allows you to override the HTML element used for the button. This functionality varies from pointless to useful to essential with Bootstrap, and when trying to create a generic, reusable info button, it’s not something I want to design out.

The problem is that our properties currently use ButtonHTMLAttributes<HTMLButtonElement>. This isn’t going to work for anything other than a button element!

Starting from the Bootstrap button, we can see it defines a generic type variable called“As”:

declare class Button<
  As extends React.ElementType = 'button'
> extends BsPrefixComponent<As, ButtonProps> {}

What we need to do is build the exact set of properties a Bootstrap Button needs, taking into account the “As” variable, so let’s follow things through. Button simply extends something called BsPrefixComponent<As,ButtonProps>. In the Bootstrap helpers file we find:

export class BsPrefixComponent<
  As extends React.ElementType,
  P = {}
> extends React.Component<ReplaceProps<As, BsPrefixProps<As> & P>> {}

BsPrefixComponent is a React class component (not a functional component like our info button), and it takes the rather confusing set of properties:

ReplaceProps<As, BsPrefixProps<As> & P>

In the context of our button, which defaults “As” to be ‘button’, and passes through ButtonProps as P, this resolves to:

ReplaceProps<'button', BsPrefixProps<'button'> & ButtonProps>

BsPrefixProps isn’t that interesting:

export interface BsPrefixProps<As extends React.ElementType> {
  as?: As;
  bsPrefix?: string;
}

It defines the “as” property and something called bsPrefix, so we’re creating a component that uses these 2 properties plus any in ButtonProps. But where do the 100s of other useful HTML ones come from? Well let’s look at the helpfully named ReplaceProps:

export type Omit<T, U> = Pick<T, Exclude<keyof T, keyof U>>;

export type ReplaceProps<Inner extends React.ElementType, P> = Omit<
  React.ComponentPropsWithRef<Inner>,
  P
> &
  P;

Note: ES5 for TypeScript defines its own version of Omit, which is different to the bootstrap version!

The “Inner” type variable is passed in from the “As” type variable which in our case defaults to ‘button’. Omit removes all the fields in one type that exist in another. In this case we’re:

  • Taking all the fields of the defined by React.ComponentPropsWithRef<'button'>
  • Removing any that are defined in P, which is currently BsPrefixProps<As,ButtonProps>
  • Then adding back all those that were removed!

So why do this? Well the important detail is that the removal doesn’t care about the types of fields – it simply deals with their names. Here Bootstrap is stripping out of the React type any fields defined in P, and replacing them with the versions from P.

For a simpler example of this type stripping, take the following code:

type TestType1 = {
    name: string;
    age: string;
}

type TestType2 = {
    age: number;
    address: string;
}

type CombinedType = TestType1 & TestType2;

let x: CombinedType = {
    name: "hello",
    age: 10,
    address: "somewhere"
};

Here we create 2 types, both of which define an ‘age’ field, but the first defines it as a string, and the second defines it as number. Because they don’t agree, CombinedType will end up with:

type CombinedType = {
    name: string;
    age: never;
    address: string;
}

The ‘age’ field has been assigned the type ‘never’, as it can never be both a number and a string. The result is that the final line in which we set age to 10 will result in the following TypeScript error:

Type ‘number’ is not assignable to type ‘never’.

The bootstrap technique uses Omit as follows:

type TestType1 = {
    name: string;
    age: string;
}

type TestType2 = {
    age: number;
    address: string;
}

/*
type TestType1WithoutAgeField = {
    name: string
}
*/
type TestType1WithoutAgeField = Omit<TestType1,TestType2>

/*
type CombinedType = {
    name: string;
    age: number;
    address: string;
}
*/
type CombinedType = TestType1WithoutAgeField & TestType2;

Long story short, Bootstrap’s type magic is giving us a new type with:

  • All those from P
  • 2 fields called ‘as’ and ‘bsPrefix’ from BsPrefixProps
  • Any remaining fields that the normal HTML element would have

In the case of Button this is:

  • The fields from ButtonProps
  • ‘as’ and ‘bsPrefix’ from BsPrefixProps
  • By default, any additional fields a button HTML element has

You might wonder how the text ‘button’ gets converted to “all the fields a button element would have”. If you follow through the react code for React.ComponentPropsWithRef what you’ll find is that ‘button’ is used as a key of the JSX.IntrinsicElements interface, defined as follows:

        interface IntrinsicElements {
            // ... lots of fields ...
            button: React.DetailedHTMLProps<React.ButtonHTMLAttributes<HTMLButtonElement>, HTMLButtonElement>;
            // ... lots more fields ...

        }
    }

With all this work, we can say that Button is a class component that takes the properties:

ReplaceProps<As, BsPrefixProps<As> & ButtonProps>

Where ‘As’ defaults to ‘button’.

From there, it’s a short step to conclude that the properties for the extended info button should be as follows:

type NewButtonProps = {
    //any new properties we want to add go here
} & ButtonProps

type MyInfoButtonProperties<As extends React.ElementType = 'button'> =
    ReplaceProps<As, BsPrefixProps<As> & NewButtonProps>

Creating the class component

So we’ve setup a new simple type that contains any new bits we want to add, plus the basic properties defined for the Bootstrap button. Then used the ReplaceProps type magic to add in any extras from the core HTML element. As with the rest of Bootstrap, the new MyInfoButtonProperties takes an ‘As’ type argument that can be used to override the element it represents in the DOM.

Before diving back into function world, let’s copy the Bootstrap pattern of a fully fledged class component

export class MyInfoButton<As extends React.ElementType = 'button'> 
    extends React.Component<MyInfoButtonProperties<As>> 
{
    render = () => {
        return (
            <Button {...this.props} 
                variant={this.props.variant || "light"}
                className={"mv-circular-button " + (this.props.className || "")}>
                <BsInfo/>
            </Button>
        )
    }
}

Armed with this new component, we can now go back to the original page and do something along the lines of…

<MyInfoButton
    as='a'
    target="bla"
    onClick={() => console.log("hello")}
/>

Note how the ‘as’ parameter is now available to us. Looking at it in the browser, you can see React is inserting an anchor (‘a’) element into the DOM instead of a button element:

Back to a functional component

Wonderful! We could stop there, but all the cool React kids say functional components are where it’s at and I’m not one to ignore peer pressure. The syntax for an arrow function definition is a little ugly but not too complex:

export const MyInfoButton = <As extends React.ElementType = 'button'>(props: MyInfoButtonProperties<As>) => (
    <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
)

Or as a standard javascript function (which I personally find a little easier on the eyes):

export function MyInfoButton<As extends React.ElementType = 'button'>(props: MyInfoButtonProperties<As>) {
    return <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
}

Putting it all together (with a handy little type for wrapping the bootstrap bits together):

//imports
import React from "react";
import { Button, ButtonProps } from "react-bootstrap";
import { BsPrefixProps, ReplaceProps } from "react-bootstrap/helpers"
import { BsInfo } from 'react-icons/bs'
import './circular-button.css'

//handy type defintion to wrap up the replace+bsprefix bits of bootstrap
type BootstrapComponentProps<As extends React.ElementType, P> = ReplaceProps<As, BsPrefixProps<As> & P>

//our extended button properties
type NewButtonProps = {
    //any new properties we want to add go here
} & ButtonProps

//boot-strap-ified full button properties with all the bells and whistles
type MyInfoButtonProperties<As extends React.ElementType = 'button'> =
    BootstrapComponentProps<As, NewButtonProps>

//our button!
export function MyInfoButton<As extends React.ElementType = 'button'>(props: MyInfoButtonProperties<As>) {
    return <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
}

Summary

An interesting little problem, and maybe a convincing answer to it, should anybody have the same question as me!

As I mentioned earlier, this kind of expose it all technique shouldn’t necessarily be avoided, but it should be approached with some thought. By blindly exposing everything in the new button type we may well have exposed some functionality that doesn’t quite work anymore due to our adjustments. This could clearly cause some confusion, but on the plus side we’ve ended up with something extremely flexible without lots of code to maintain.

Anyhoo, that’s it for my first React/TypeScript post. Stay tuned for more posts on this or other random things!

Understanding Determinism Part 1: Intro and Floating Points

Fairly recently I started researching the use of deterministic networking in a Unity game. With this technique, all users simply send each other the buttons they press on the controllers, and by running exactly the same simulation, experience exactly the same gameplay. This approach to networking is less common than the standard ‘keep the stuff that matters roughly in sync’ approach, but is very useful (and indeed necessary) if you have a game that relies on AI or physics doing exactly the same thing for everybody.

After a bit of research, I was surprised to discover that the internet is littered with inaccurate and often entirely incorrect information around determinism. Whilst I’ve not had to solve these problems with Unity before, I do have a lot of experience in this area. Little Big Planet had deterministic networking, and on No Man’s Sky I helped ensure the procedural generation was both deterministic and consistent between PC, PS4 and even the PS4 GPU.

In this series I’ll try to clear up some of the confusions and misnomers, starting with a quick review of determinism and how to test it. Then we’ll cover an area that quite unfairly but most often gets the blame – floating point numbers. Later, I intend to cover some of the other big and more common culprits, and finish off with some digging into Unity specific problems.

What Actually Is Determinism

A deterministic system can be defined by a very simple statement:

If a system is run multiple times with the same set of inputs, it should always generate the same set of outputs.

In computer terms, this means that if we provide a program with some given input data, we should be able to run that code as many times as we like, and always expect the same results. And by the same, I mean exactly the same – the same bytes, with the same bits, in the same order. In technical terms, bitwise identical.

A couple of examples:

  • If I write a function that adds the numbers 2 and 3, it should always output 5 no matter how many times I call it.
  • If I write an entire physics engine and provide it with 10000 rigid bodies all starting with a given position/orientation, and run it for 2 years, it should always result in the same set of new positions/orientations no matter how many times I run it.

Computers are in fact inherently deterministic – that’s simply how processors work. If the add instruction on an i9 Processor could take 2 values and sometimes output slightly different results, it would be defined as a broken PC, and you’d probably be needing to replace it soon!

So where does indeterminism come from, and why does it seem so prevalent when computers are naturally deterministic. Well, the reality is defining the inputs of a system is not always as simple as it sounds.

Testing Determinism

Before investigating it, we should come up with a good solid way of testing the determinism of our code. Fortunately, when I joined Media Molecule, the very clever team had already solved this problem, so here’s how…

public static class SyncTest
{
    static StreamWriter m_stream;

    //begin just opens a file to write to
    public static void Begin(string filename)
    {
        m_stream = new StreamWriter(filename);
    }

    //simple function that just writes a line with some text to the stream
    public static void Write(string description)
    {
        m_stream.Write(string.Format("{0}\n", description));
    }

    //writes a line with a description, an integer value, and the exact bytes
    //that make up the integer value, all padded nicely with spaces.
    public static void Write(string description, int value)
    {
        byte[] bytes = BitConverter.GetBytes(value);
        string hex = BitConverter.ToString(bytes).Replace("-", string.Empty);
        string txt = string.Format("{0,-30} {1,-15} 0x{2}\n", 
            description, value, hex);
        m_stream.Write(txt);
    }

    //closes the file
    public static void End()
    {
        m_stream.Close();
    }
}

This very simple class just opens up a file in Begin, then provides a few utilities to write text to that file. Specifically, we’ve added a function that writes an human readable integer value along with a description and the exact hexadecimal value of its bytes.

Now let’s write a very simple program that uses it…

    private void Start()
    {
        SyncTest.Begin("sync.txt");
        AddNumbers(2, 3);
        SyncTest.End();
    }

    int AddNumbers(int x, int y)
    {
        SyncTest.Write("AddNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        int res = x + y;
        SyncTest.Write("res", res);
        return res;
    }

All this does is add a couple of numbers together. However along the way we’re telling our SyncTest tool to dump out the inputs and outputs. The resulting lines of text are as follows:

AddNumbers
x                              2               0x02000000
y                              3               0x03000000
res                            5               0x05000000

So why do this? Well the beauty is, we can now make some more modifications to our program to run the same algorithm multiple times, outputting to 2 different files:

    private void Start()
    {
        SyncTest.Begin("sync1.txt");
        RunTests();
        SyncTest.End();
        SyncTest.Begin("sync2.txt");
        RunTests();
        SyncTest.End();
    }

    void RunTests()
    {
        AddNumbers(2, 3);
    }

    int AddNumbers(int x, int y)
    {
        SyncTest.Write("AddNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        int res = x + y;
        SyncTest.Write("res", res);
        return res;
    }

Having written the data out, we can use a simple text-comparison program to check for any differences. I use Beyond Compare for this purpose:

No differences = no indeterminism! Yay

Let’s mess with our test and introduce some indeterminsm:

        AddNumbers(2, Random.Range(0,10));

Now each run we’re going to pass a new random number in! Clearly the inputs now differ, and thus, so will our outputs. Viewing this in our logs:

Not only does our logging show the issues, but it tells where the first difference occurred (in this case the value of y was 1 in the 1st test, and 4 in the 2nd).

For this blog we’ll only be using a very simple sync tester like the one above, however with work they can be expanded to record more detailed info like stack traces, or even run as a replay in a debugger allowing you to analyse determinism problems with the full support of your programming tools!

So, armed with our determinism tester, let’s do some digging…

Floating Point Innacuracy

If you do a quick search, the first thing that gets the blame for indeterminism is floating point numbers, despite the fact that unless you’re doing cross platform stuff, they’re rarely guilty! That said, floating point precision comes up a lot, so let’s a look at a couple of examples:

  • Floating point numbers can represent all integer values within a certain range precisely, however they can not represent all fractional numbers precisely.
  • Some floating point operations, such as addition, will lose precision when adding together values of significantly different magnitudes

The first is pretty simple. The values 0.5, 0.25, 0.75, 0.875 can all be represented precisely by floating point numbers. This is because they can be expressed precisely as the sum of fractional powers of 2:

  • 0.5 = 1/2
  • 0.25 = 1/4
  • 0.75 = 1/2+1/4
  • 0.875 = 1/2+1/4+1/8

Other numbers however would require an infinite number of fractions of 2 to be represented this way, so can only be approximated as floating point values.

The latter point occurs because floating point numbers can only store so many (binary) digits. You could store both 1000000000000000 and 1 using a floating point number. However because they are of such drastically different size, on a 32 bit floating point unit you would find:

1000000000000000+1 = 1000000000000000

So how can a computer be both imprecise and deterministic? It’s quite simple really – a floating point unit is deterministically imprecise! It will lose precision, but it will always lose the same precision. Let’s test this by introducing some floating point numbers to our program:

    float AddNumbers(float x, float y)
    {
        SyncTest.Write("AddNumbers (float)");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        float res = x + y;
        SyncTest.Write("res", res);
        return res;
    }

And the test…

    void RunTests()
    {
        AddNumbers(2, 5);
        AddNumbers(1f, 1000000000000f);
        AddNumbers(2f/3f, 0.5f);
    }

And our results…

As expected, the first floating point test has added 1 to 1e12 (1000000000000), and just ended up with 1e12 again. Similarly, attempting to add 2/3 (which ends up represented inaccurately as 0.666..67 ends up with 1.666..7. However in both these cases, the errors occur in both runs, and the floating point representations of all inputs and outputs match.

The upshot of the above is that if you’re just working with one application on one platform, the odds are floating point won’t be your problem, because any imprecision is deterministic. However, when going for cross platform determinism you can hit some real issues, as described in the following few sections.

Floating Point Hardware Behaviour

Despite the inherent determinism of a floating point unit, it still has the potential to cause problems, specifically when attempting to create an application that performs identically on different bits of hardware. For example, on No Man’s Sky I attempted (and smugly eventually succeeded!) to get PS4 GPU compute shaders to deterministically run the team’s awesome planet making algorithms, and generate exactly the same results as a PC CPU. In determinism terms, you could say that if different hardware behaves differently, then the hardware becomes an input to your algorithm and thus affects the output.

Floating point numbers have to deal with some odd scenarios such as infinities, different types of zeros, what happens when rounding things etc etc. There’s a variety of these areas in which the floating point unit has to make some form of approximation or edge case decision.

These days, IEEE standards exist that say exactly how every floating point unit on every platform should deal with every scenario. However, not all platforms follow these guidelines to the letter, and others provide low level hardware options to disable the perfect behaviour in exchange for slightly better performance.

I’ll confess to not knowing exactly how prevalent this is these days. I think it is less of an issue with modern hardware, and have myself worked with PCs (AMD+Intel), NVidia GPUs, Macs, PS4 (CPU and GPU) and found the actual hardware rarely to be the cause of indeterminism. Really the only way to be certain if you’re going cross platform though is to test it, ideally with some fairly low level C code (with FastMath disabled – see below) so you know there’s no chance of any other factors interfering.

Floating Point FastMath

A much more prevalent cause of indeterminism comes not from hardware but from compilers. Specifically optimising compilers that use a technique often referred to as FastMath. When this compiler option is turned on (which in some cases it is by default!), it allows the compiler to assume a floating point unit obeys algebraic laws perfectly, and thus mess with your code if it thinks it can express your math in a more efficient way.

Take this code for example:

    float SumNumbers(float x, float y, float z, float w)
    {
        float res = x;
        res += y;
        res += z;
        res += w;
        return res;
    }

Simple enough! Add 4 numbers together. However the optimizers amongst you may have noticed this is a particularly inefficient way of doing things as it contains 4 dependent operations. A faster version would be this:

    float SumNumbers(float x, float y, float z, float w)
    {
        float xy = x + y;
        float zw = z + w;
        return xy+zw;
    }

Mathematically this function does the same job, however it executes 2 independent operations then combines the results. Why this is faster doesn’t really matter and is beyond the scope of this blog. The key is, the 2 functions do the same thing, so should produce the same result.

The FastMath option tells the compiler that if it finds your SumNumbers function written the first way, it is allowed to re-write it the 2nd way, on the basis that mathematically they should return the same result.

To simulate this, I’ll extend the program as follows:

    void RunTests(int test_idx)
    {
        float x = 0.3128364f;
        float y = 0.12412124f;
        float z = 0.8456272f;
        float w = 0.82154819f;
        if (test_idx == 0)
        {
            SumNumbers(x, y, z, w);
        }
        else
        {
            SumNumbersFast(x, y, z, w);
        }
    }

    float SumNumbers(float x, float y, float z, float w)
    {
        SyncTest.Write("SumNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        SyncTest.Write("z", z);
        SyncTest.Write("w", w);
        float res = x;
        res += y;
        res += z;
        res += w;
        SyncTest.Write("res", res);
        return res;
    }

    float SumNumbersFast(float x, float y, float z, float w)
    {
        SyncTest.Write("SumNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        SyncTest.Write("z", z);
        SyncTest.Write("w", w);
        float xy = x + y;
        float zw = z + w;
        float res = xy + zw;
        SyncTest.Write("res", res);
        return res;
    }

I’ve supplied my RunTests function with a test index, so I can run my first version on the first tests, and the second version on the 2nd test. Let’s see what the sync tests result in…

Shock horror! We have differences. A difference so tiny in fact that it’s not visible if you just print the value of the result to 6 decimal places. However because floating point units aren’t precise, doing the maths 2 different ways produced subtly different results. FastMath allows the compiler to choose which way to do it, and different compilers on different platforms may make different decisions. This is a very common reason for cross platform determinism issues.

For fun, let’s take the above example and add some rather contrived extra bits (note: numbers meaningless – I made em up by fiddlin!).

    void RunTests(int test_idx)
    {
        float x = 0.3128364f;
        float y = 0.12412124f;
        float z = 0.8456272f;
        float w = 0.82154819f;
        float res;
        if (test_idx == 0)
        {
            res = SumNumbers(x, y, z, w);
        }
        else
        {
            res = SumNumbersFast(x, y, z, w);
        }
        res *= 3212779;
        SyncTest.Write("bigres", res);

        if ( res >= 6760114.5f)
        {
            SyncTest.Write("Do 1 thing");
        }
        else
        {
            SyncTest.Write("Do another thing");
        }
    }

Looking at the sync results:

We can see how when plumbed through some big calculations and used in actual logical operations, these simple differences can result in dramatic changes in program behaviour. Whilst the above may seem a little contrived, applications like procedural content generators, physics engines or even full games execute large amounts of complex mathematics and branching code. All it takes is 1 single operation in that whole mess to exhibit indeterminism, and the end results could be utterly, completely different. Indeed, my first attempt on No Man’s Sky suffered similar issues and resulted in entire forests being different on PS4 (happily fixed pre-launch)! Just as with hardware, when FastMath is turned on, the compiler behaviour becomes an input because it affects the algorithm and thus the output.

As a quick side note, because FastMath is only used in optimized builds, it is often also responsible for determinism issues between debug and release versions of an app.

Trigonometry and other fancy functions

Many mathematical functions such as sin, cosin, tan, exp and pow do not exist as single instructions on hardware (or only exist on some hardware), and are instead implemented as part of maths libraries that come with whatever language you’re using. In C++ for example, this is the C standard lib, or in Unity it’s good old ‘Mathf.bla’.

Sadly, there seems to be absolutely no standardisation over the techniques or implementations that creators of these libraries use. As a result, different compilers on a given platform, or different platform, or different languages, or different versions of languages may or may not give the same results! Thus the math library implementation becomes an input to your algorithm!

As an example, I knocked together my own sin function using a minimax polynomial (no need to have any idea whatsoever what one is or how it works – that’s why we have math libraries and mathematicians!).

    void SinTest(int test_idx)
    {
        for (int angle_degrees = 0; angle_degrees <= 90; angle_degrees++)
        {
            float angle_radians = angle_degrees * Mathf.Deg2Rad;
            float res;
            if (test_idx == 0)
            {
                res = Mathf.Sin(angle_radians);
            }
            else
            {
                res = MiniMaxSin(angle_radians);
            }
            SyncTest.Write("Sin angle", angle_degrees);
            SyncTest.Write("Sin result", res);
            SyncTest.Write("");
        }
    }

    float MiniMaxSin(float x)
    {
        float x1 = x;
        float x2 = x * x;
        return 
            x1 * (0.99999999997884898600402426033768998f + 
            x2 * (-0.166666666088260696413164261885310067f + 
            x2 * (0.00833333072055773645376566203656709979f + 
            x2 * (-0.000198408328232619552901560108010257242f + 
            x2 * (2.75239710746326498401791551303359689e-6f - 
            x2 * 2.3868346521031027639830001794722295e-8f)))));
    }

This example compares the unity ‘Mathf.Sin’ to my own version and prints out the result for 90 different angles. If we look at the sync test though, at 43 degrees we get a slightly different result…

There’s nothing wrong with either version – they both approximate the mathematical sin function to a given level of precision. But just as with our FastMath problems, because they’re implemented differently, they give slightly different results that could turn into drastic changes in behaviour over time.

If you run into this, the only 100% robust approach is to download or write your own platform agnostic math libraries, and compile them yourself with things like FastMath turned off.

The last floating point resort

All of the issues we’ve discussed so far relate to floating point numbers. If developing for one platform, it’s relatively unlikely you’ll actually hit any of them, as they’re all dependent on different hardware, compiler or software library implementations.

If you do hit them however, they can be a real headache for all sorts of reasons. Even if you were prepared to ensure hardware settings were correct, fiddle with low level compiler options and implement custom math libraries, you may well be using fundamentally different hardware (such as CPU/GPU) or libraries / engines (such as Unity) in which you don’t have control over these features.

One fairly brutal option available as a last resort is simply to limit precision by chopping off some digits! Despite various stack overflow based opinions, in certain scenarios this approach can work perfectly well:

    float LoseSomePrecision(float x)
    {
        x = x * 65536f;
        x = Mathf.Round(x);
        x = x / 65536f;
        return x;
    }

Here we simply:

  • Multiply the floating point by a power of 2 constant
  • Round it to an integer
  • Divide the result by the same power of 2 constant

After plugging it in to our sin tests:

We can see how chopping off a few bits of precision has made both tests consistent (though less accurate).

Being a decimal society, it might have been tempting to just multiply by 1000000 and then divide by 1000000. However, floating point numbers work with binary, so if we want to actually reduce the number of bits required to represent our value, we must use a power of 2.

Of course, this technique won’t help you if your precompiled physics engine isn’t deterministic cross platform. In that situation you’ve really only got one option – don’t use the precompiled physics engine. But if after digging you’re finding it’s just a few math ops that throw the whole system off, limiting precision can be the solution.

Summary

In this post we’ve mainly looked at the low level issues that can crop up when dealing with floating point numbers, and how when dealing with multiple platforms or compilers, inconsistencies in low level behaviour can cause some real headaches.

In reality, variation in floating point behaviour is only likely to crop up when comparing multiple compilers / platforms. More often than not it’s issues with the application implementation (or that of libraries/engines it depends on) that result in determinsm problems. In the next post we’ll look at the basics of Application level determinism and one of the big culprits – global state.

Why Transparency Is Hard

I’ve worked with lots of brilliant content creators in my career, and one thing I’ve always hated is having to tell somebody that something they want is ‘impossible’. Worse than that though, is when I have to just declare it is impossible ‘for complicated programmer reasons’ rather than explain myself. This is something I’ve tried hard to avoid over the years, as it is always better when everybody on a team understands the tools and limitations they are working with.

Unfortunately, transparency comes up over and over again as an ‘impossible thing to do well for technical reasons I find it hard to explain’. It is a great effect that artists would always love to have, but breaks in all sorts of really annoying ways that are genuinely not possible to solve on current hardware.

So, this post is my attempt at clearly and concisely explaining, once and for all, why transparency is so hard without resorting to ‘for technical reasons’…

Rasterizing

To understand transparency, a little background in how scenes are actually rendered is needed. There are 2 general approaches to rendering:

  • Ray tracing: Draws a scene by firing rays through the scene to simulate how light bounces around.
  • Rasterizing: Draws a scene by taking a large number of primitives (normally triangles) and filling in the pixels on the screen that each primitive covers.

Methods of ray tracing range from the ‘too expensive to do in real time’ up to ‘way way way too expensive to do in real time’, and despite recent advances in GPUs (especially the new RTX cards), we aren’t going to be rendering entire game scenes this way for a few years yet.

So we’re left with rasterizing. A technique that in many ways is a beautiful hack. It is well suited to hardware, as it involves doing a few very simple tasks many many times over. Work out what pixels a primitive covers, then fill in each one.

Unfortunately, the massive gains in speed and simplicity we get with rasterizing come at a cost. As we are no longer truly simulating the behaviour of light, physical effects like shadows don’t just ‘fall out’ naturally – instead clever people had to work out how to bodge shadow effects into rasterized scenes, and they did it very well!

Depth

The first problem you hit with rasterizing is how to handle the situation in which 2 primitives cover the same pixel. Clearly you want the one in front to be visible, however unlike ray tracing, a rasterizer has to do extra work to make this happen.

Let’s start with this lovely simple spinning cube…

Looks great, but if we turn off the rasterizer features that handle depth, weird shit instantly happens…

Here you can see that sometimes the back faces of the cube seem to drawing over the front faces. This is because the graphics card doesn’t attempt to sort the triangles of the cube – all it knows is you’ve given it a bunch of triangles you want on screen as soon as possible. So, if it happens to draw the front faces first, and the back faces last, the back faces get drawn on top.

Fortunately, with the simple cube there is a simple solution – only draw the triangles that are facing the camera, and discard the others. This is known as back face culling.

But what if we have 2 cubes?

Here culling won’t necessesarilly help us. We have multiple front facing triangles (such as the blue on the rear cube and the pink on the front cube) that overlap on screen. In the above image we happen to be drawing stuff in the right order, but if we flip the order around…

Because the rear cube was drawn last it simply stomped on top of the front cube.

Again however, this one can be solved pretty easily. The cubes themselves aren’t actually overlapping, so by simply sorting the objects to render, and then ensuring we always draw starting from the back and working forwards, everything just works. This is known as the painters algorithm.

Unfortunately, there is another case that neither culling nor sorting can solve:

With only culling and sorting, we can either end up with this…

Or this…

And problems like this one don’t need some contrived situation with overlapping cubes, any concave mesh, which turns out to be most models, is going to suffer this sort of problem:

It is solved with the use of a depth buffer and depth testing. These techniques underpin modern rasterizing. The basic idea is that in addition to rendering your pixel colours you also render how far each pixel is from the camera, aka its depth.

Here we visualise the depth buffer for the torus. Red is close to the camera, blue is further away.

And again for the 2 cubes…

Using this additional information the rasterizer is able to check, before filling in a pixel, if it has already been filled in by another, closer triangle.

The depth buffer actually removes the need for sorting entirely. Indeed, in many renderers it is common to flip the sorting around and render front objects first, so they fill in the depth buffer as quickly as possible. That way the GPU doesn’t end up running expensive shaders for background objects only to have the results trampled later by foreground objects.

Transparency

Finally on to actual transparency! The big problem with transparency comes down to 2 points:

  • Transparent objects must be drawn back-to-front for the effect to look correct (as we will see imminently)
  • We have just proven above that it is not always possible to reliably draw objects back-to-front.

Let’s start with a simple quad, which in a game might easily be a nice flat glass window:

Lovely! But what if we want a green pane of glass as well?

Here you see the same scene, however on the left the green glass is in front, and on the right the blue glass is in front. The thing to note is that the central overlapping section is different in each image. Just to prove it, here they are sliced together:

This is because, as mentioned earlier, to colour a transparent pixel correctly, you must know the colour of whatever is behind it. This means that if you have multiple transparent objects, the draw order is very important.

Fortunately, the above situation is actually pretty easy to deal with by sorting, and indeed this is why many games comfortably have semi transparent glass windows

But what if we make our cube slightly transparent? How should it look? A first blast with all the depth testing and culling turned on would look like this:

This, however is pretty clearly not a cube made of transparent material, as we can’t see the back faces at all. It is simply a faded version of the original opaque cube. This is because all the work we did to get the opaque cube rendering correctly with culling and depth tests has hidden the back faces entirely. To see them, the culling and depth tests must be turned off:

Yay! So what’s the problem? It’s right back to where we started, as shown in this video:

If we’re lucky enough to draw the faces in the correct order, the cube looks right. However if they’re in the wrong order, the back faces trample the front, and the effect breaks.

Just as before, in the case of the cube there are some clever tricks we could do to get it working. In practice, you could first render all the back faces, then render all the front faces. However just like before, this technique falls apart when more complex or overlapping objects come into play:

Bleh!

So how do we solve this?

Sadly, right now, the truth is we really don’t, or at least can’t do it anywhere near fast enough 😦 Rasterizer based hardware simply doesn’t function in a transparency friendly way. For opaque objects, provided there are no bugs in shaders, we can guarantee they will always look correct, regardless of the scene. However for transparent objects, we must make scenes carefully constructed to hide transparency artefacts.

A common bug might be a semi-transparent object held by the player which looks fine until they pass their hand through some other semi transparent object such as a window or smoke. In this situation, at some point, the hand is going to trample the smoke, or the smoke is going to trample the hand. Something is going to break!

The classic glitch, familiar to anyone who’s been in games long enough, looks like this:

Note how the opaque cube (right) interacts correctly with the transparent green pane of glass. However when the transparent cube interacts with the transparent glass, it almost seems to ‘pop’ behind it.

Once we finally get to truly ray traced games, this issue will disappear along with host of others. A ray tracer ultimately simulates the physics of light, so stuff like transparency just works. However for now, we’re stuck with rasterizers and grumpy programmers!

So there you have it! My long winded but hopefully helpful to somebody explanation of exactly why transparency is a total PITA and why programmers typically grumble about technical reasons etc etc when anybody mentions something like transparent curtains…

Shaderoids 2

Right, I left off last time with the beginnings of a game, now its time to add some sounds and other shizzle. It’s up on git hub now here

Sounds

 

Unfortunately one thing that just isn’t possible right now is playing sounds from the GPU (booo). However, I can implement a system where our GPU side code can ‘request’ a given sound to be played. Then the CPU side will just act on those requests.

We’ll start with the simplest possible 1-shot ‘audio engine’ you can build in unity….

    //array of clips + list of allocated audio sources
    AudioClip[] _clips;
    List _audioSources;

    //load list of clips
    void LoadClips(params string[] names) {
        _audioSources = new List();
        _clips = names.Select(a => Resources.Load(a)).ToArray();
    }

    //find a free audio source, allocating if necessary, and play clip
    void PlayClip(int idx) {
        AudioSource src = null;
        for(int i = 0; i < _audioSources.Count; i++) {
            if(!_audioSources[i].isPlaying) {
                src = _audioSources[i];
                break;
            }
        }
        if (!src) {
            src = gameObject.AddComponent();
            _audioSources.Add(src);
        }
        src.clip = _clips[idx];
        src.Play();
    }

I’ve used that pattern a few times for very quick prototypes. It basically lets you load a set of audio files from resources, then ask to play them once.

Lets load 3 audio files I ripped out of the earlier you tube vid of the actual game…

LoadClips("fire", "explode", "blop");

Next, a simple structure that will be mirrored CPU and GPU side. This represents a request to play a sound, and for now just contains the integer id of the sound to be played:

    public struct SoundRequest {
        public int id;
    }

I allocate a compute buffer to contain an array of those requests, alongside a CPU buffer to copy the requests into. On top, the counters structure now has a ‘numSoundRequests’ variable. At the end of the game update we can now read out and process the requests:


    _soundRequestBuffer.GetData(_cpuSoundRequests);
    _countersBuffer.GetData(_cpuCounters);
    for (int i = 0; i < _cpuCounters[0].numSoundRequests; i++)
        PlayClip(_cpuSoundRequests[i].id);

And for the last bit of plumbing, a GPU side function in the compute side for making a request:

//matching ids of sounds loaded in game.cs
#define SND_FIRE 0
#define SND_EXPLODE 1
#define SND_BLOP 2

void PlaySound(int id) {
    int idx;
    InterlockedAdd(_counters[0].numSoundRequests, 1, idx);
    if (idx < _maxSoundRequests) {
        _soundRequestsRW[idx].id = id;
    }
}

Now I can play sounds from GPU code! For example, when the player hits the space bar to fire a bullet:

if (_keyStates[' '].pressed) {
    //...stuff for firing the bullet...
    PlaySound(SND_FIRE);
}

I hook this up to firing and exploding asteroids by now, and it’s already starting to feel gamey!

Text

Hmmmm.. text rendering on GPU. I want my GPU code to at least be able to ask to draw a given character – e.g. DrawCharacter(‘a’). I don’t really want to cheat and start using bitmap fonts, so… I guess I’m programming a vector font. I create Font.cginc, which is full of stuff like this….

    switch (character) {
    case 'a': 
    case 'A':
    {
        AddFontLine(idx++, float2(0, 0), float2(0, 0.5f));
        AddFontLine(idx++, float2(1, 0), float2(1, 0.5f));
        AddFontLine(idx++, float2(0.5f, 1), float2(0, 0.5f));
        AddFontLine(idx++, float2(0.5f, 1), float2(1, 0.5f));
        AddFontLine(idx++, float2(0, 0.5f), float2(1, 0.5f));

        break;
    }

Basically, I’m filling up a big buffer, which has 16 line slots for each character. i.e. for the character code 65 (‘A’), there are up to 16 lines stored in a font buffer (_font) at offset 16*65.

I’ve created a simple compute shader called ‘BuildFont’, which gets dispatched with 256 threads – 1 per ascii character, though I’ve obviously only filled in ‘A’-‘Z’ and a few other bits and pieces!

Now a simple test shader:

[numthreads(256, 1, 1)]
void DrawFont(uint3 id : SV_DispatchThreadID)
{
    int lineIdx = AllocLines(LINES_PER_CHARACTER);
    int charOffset = id.x * LINES_PER_CHARACTER;

    float2 pos = float2(id.x / 16, id.x % 16) * float2(20,25) + 100;
    float2 scl = float2(15,20);

    for (int i = 0; i < LINES_PER_CHARACTER; i++) {
        float2 a = _font[charOffset + i].a * scl + pos;
        float2 b = _font[charOffset + i].b * scl + pos;
        AddLine(lineIdx++, a, b);
    }
}

This renders a 16×16 grid of characters on screen. For each one, we draw all 16 lines (note: empty lines are degenerate so don’t do anything). The result:

vecfont

Woohoo! 1 hardcoded vector font. No better way to spend the first chunk of a 10 hour plane flight. Unfortunately I discovered the plane has wifi, so was able to check out the actual asteroids font, and discovered it is way less curvy than mine. Oh well.

Teeny bit of UI

To get started on the UI and game loop, I’ll add a single ‘UpdateGame’ kernel.

[numthreads(1,1,1)]
void UpdateGame(uint3 id : SV_DispatchThreadID)
{
    int gameMode = _counters[0].gameMode;
    if (gameMode == 0) {
        //init
        gameMode = 1;

    }
    else if (gameMode == 1)
    { 
        //main menu
        DrawText_OneCoinOnePlay();
    }

}

This is probably the closest bit to a classic game, as it’s a single function that gets called once per frame to update general game state. At this stage, we only have the game mode, which starts of as 0 (init) then proceeds to 1 (main menu). I’ve also added our first bit of UI:

void DrawText_OneCoinOnePlay() {
    float2 pos = float2(350,150);
    float2 scl = float2(18, 20);
    float2 spacing = float2(scl.x*1.25, 0);

    DrawCharacter('1', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('C', pos, scl); pos += spacing;
    DrawCharacter('O', pos, scl); pos += spacing;
    DrawCharacter('I', pos, scl); pos += spacing;
    DrawCharacter('N', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('1', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('P', pos, scl); pos += spacing;
    DrawCharacter('L', pos, scl); pos += spacing;
    DrawCharacter('A', pos, scl); pos += spacing;
    DrawCharacter('Y', pos, scl); pos += spacing;
}

At this point though I want to get some proper startup code in. Ideally when the player presses ‘start’ we would init the game state and fire them into the first level. This creates a need for compute shaders to trigger other compute shaders, which leads onto…

More indirect dispatching

Enter the indirect dispatch pattern. The basic concept of indirect dispatch is that sometimes you need a compute shader to control how a following compute shader will be dispatched. For example, our current ‘spawn asteroids’ shader is dispatched 10 times to spawn 10 asteroids, but what if we want to control how many asteroids are spawned from a compute shader instead?

For this purpose I create a new compute buffer called _dispatch, with the type ComputeBuffer.IndirectArguments. It’s just a buffer of uints, each of which will represent one of the arguments we’d normally provide to the Dispatch function.

For this simple purpose I’ll have a load of parameters that can be passed to shaders via what used to be the counters buffer, and is now the globals buffer. In addition, I’ll have a simple setup kernel that checks which kernel is to be dispatched, and configures the indirect buffer accordingly. This approach is not particularly scalable but simple and effective.

So our globals structure now has 2 new ‘request’ integers:

        public int requestClearAsteroids;
        public int requestSpawnAsteroids;

And the game update can make requests like this:

        //when 'c' is pressed, ask to clear and then spawn 4 new asteroids
        if (_keyStates['c'].pressed) {
            _globals[0].requestClearAsteroids = true;
            _globals[0].requestSpawnAsteroids = 4;
            gameMode = 1;
        }

A special ‘SetupDispatch’ shader now takes a predefined kernel id constant to run, and then uses the request values to fill out the dispatch settings:

//dispatch setup function that takes the kernel id being setup for and runs whatever logic
//is necessary for it
[numthreads(1,1,1)]
void SetupDispatch(uint3 id: SV_DispatchThreadID)
{
    switch (_kernelIdRequested) {
    case KID_CLEAR_ASTEROIDS: {
        //the CLEAR ASTEROIDS dispatch will dispatch across nothing if not requested, or
        //all asteroids if requested
        _dispatch[0] =  _globals[0].requestClearAsteroids ? (_maxAsteroids+255)/256 : 0;
        break;
    }
    case KID_SPAWN_ASTEROIDS: {
        //SPAWN_ASTEROIDS dispatches the number of requested asteroids
        _dispatch[0] = (_globals[0].requestSpawnAsteroids + 255) / 256;
        break;
    }
    default: {
        _dispatch[0] = 0; 
        break;
    }
    }
    _dispatch[1] = 1;
    _dispatch[2] = 1;
}

And functions like our SpawnAsteroids kernel look almost identical, with the small caveat that they may need to read the request counter to avoid overrunning:

//spawns 'n' requested asteroids
[numthreads(256,1,1)] 
void SpawnAsteroids(uint3 id : SV_DispatchThreadID) {
    //read count (either from uniform or indirect request)
    int maxCount = _threadCount < 0 ? _globals[0].requestSpawnAsteroids : _threadCount;
    _globals[0].requestSpawnAsteroids = 0;
    if (id.x < maxCount)
    { 
        //usual stuff
    }
}


The last bit of plumbing is a new DispatchIndirect helper CPU side, which first dispatches the setup kernel before dispatching the main one:

    //dispatches the 'setup dispatch' shader to configer the dispatch args for a given
    //dispatch type, then does an actual dispatch indirect on the requested kernel
    void DispatchIndirect(int kernel, int dispatchIndex) {
        _asteroidsShader.SetInt("_kernelIdRequested", dispatchIndex);
        DispatchOne(kernelSetupDispatch);

        BindEverything(kernel);
        _asteroidsShader.SetInt("_threadCount", -1);
        _asteroidsShader.DispatchIndirect(kernel, _dispatchBuffer);
    }

And finally the CPU game update can first dispatch UpdateGame, then dispatch both ClearAsteroids and SpawnAsteroids on the off chance they have work to do.

        DispatchOne(kernelUpdateGame);
        _playerState.GetData(players);
        DispatchIndirect(kernelClearAsteroids, KID_CLEAR_ASTEROIDS);
        DispatchIndirect(kernelSpawnAsteroids, KID_SPAWN_ASTEROIDS);

Phew! Lots of setup, but that’s a nice simple example of indirect dispatch being used in anger.

Spawning the player

One final pain going into this lot is spawning the player from compute. The game has to wait until there’s no asteroids in the way before spawning, which is actually none trivial on GPU. To do so, all asteroids must be checked against the player. Thus I had to write a couple of new kernels. The first checks players that want to spawn, and marks them as ‘can spawn’:

//prepares all players that want to spawn for spawning, and defaults their 'canSpawn'
//to true
[numthreads(256, 1, 1)]
void PreparePlayerSpawning(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        PlayerState player = _playersRW[id.x];
        if (player.wantsToSpawn) {
            player.position = float2(1024, 768) * 0.5;
            player.velocity = 0;
            player.rotation = 0;
            player.canSpawn = true;
        }
        _playersRW[id.x] = player;
    }
}

The 2nd is very similar to the original CollidePlayerAsteroid, which compares the players to all asteroids, and if they overlap, sets ‘can spawn’ back to false’:

//checks all players that want to spawn against asteroids, and kills the 'canSpawn' flag
//for any that are too close to asteroids
[numthreads(256, 1, 1)]
void UpdatePlayerSpawning(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        int playerIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (playerIdx*_maxAsteroids);

        PlayerState player = _playersRW[playerIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (player.wantsToSpawn && asteroid.alive) {
            if (length(player.position - asteroid.position) < (asteroid.radius + 10 + 50)) {
                _playersRW[playerIdx].canSpawn = false;
            }
        }
    }
}

Then with this cheeky code in UpdateAndDrawPlayer:

        PlayerState player = _playersRW[id.x];
        if (player.wantsToSpawn && player.canSpawn) {
            player.wantsToSpawn = false;
            player.canSpawn = false;
            player.alive = true;
        }

We can detect players that both want to and can spawn, and spawn em!

With some tweaks to the game loop (partly to track current level), and the ability to spawn players, it’s all starting to look pretty cool:

More text and scores and things

Somewhere along the way I felt dirty requesting loads of lines for every render, so added a 2nd buffer and rendering pipeline for requesting characters (just like you can request lines). The basic concept is that, just like with lines, I can submit a list of requests for characters by code, size and position. I then have another procedural pass in DrawLines.shader that uses this function:

        //reads character id, then looks it up in lines buffer which should contain font
        v2f charvert(uint id : SV_VertexID, uint inst : SV_InstanceID)
        {
            int vertsPerCharacter = (LINES_PER_CHARACTER * 2);
            int charIdx = id / vertsPerCharacter;

            int charVertex = id % vertsPerCharacter;
            int charLine = charVertex / 2;
            int charLineVertex = charVertex % 2;

            Character c = characters[charIdx];

            Line l = lines[c.id*LINES_PER_CHARACTER + charLine];
            float2 p = charLineVertex==0 ? l.a : l.b;

            p = p * c.scl + c.pos;

            v2f o;
            o.vertex = toScreen(p);
            return o;
        }

Just like the old lines one, it takes a vertex id, breaks it down into character index, chracter vertex, character line and line vertex, then fiddles with it and outputs it!

Now for some numbers. All numbers in asteroids have a 0 on the end, cos it looks cooler. So the new DrawNumberTimes10 function will do just that…

void DrawNumberTimes10(int number, float2 pos, float2 scl, bool pad) {
    float2 spacing = float2(scl.x*1.25, 0);

    int idx = AllocCharacters(8);

    bool onFirstChar = true;
    for (int base = 1000000; base > 0; base /= 10) {
        int val = clamp(number / base,0,9);
        number -= val * base;
        if (val != 0 || !onFirstChar || base==1) {
            //output character
            DrawCharacter(idx++, '0'+val, pos, scl); pos += spacing;
            onFirstChar = false;
        }
        else {
            //output space and increment pos if padding requested
            DrawCharacter(idx++, ' ', pos, scl);
            if(pad)
                pos += spacing;
        }
    }

    //final 0
    DrawCharacter(idx++, '0', pos, scl); pos += spacing;
}

Now by adding a score value, I can draw the nice numbers on screen:

    DrawNumberTimes10(_playersRW[0].score, float2(100, 700), float2(18, 20), true);
    DrawNumberTimes10(0, float2(800, 700), float2(18, 20), false);
    DrawNumberTimes10(0, float2(500, 680), float2(18, 20)*0.75, false);

Not sure what the other 2 numbers are supposed to be yet. Presumably credits and player 2 score. Oh well who cares – it looks cool:

Summary

Well, after that brain dump of crazy I have quite a nice thing goin on. It’s not asteroids yet, but any game logic is completely separate from CPU. Compute shaders are responsible for init, game loop and all general physics and spawning etc. Next time I’ll probably get some effects and flying saucers in or something…

Also – code:
https://github.com/chriscummings100/shaderoids

Shaderoids 1

Ok, time for something really productive. I have decided to put my pretty solid knowledge of GPU coding to the best possible use, and produce a perfect (as possible) clone of the 1979 game, Asteroids… entirely written on GPU.

Why you ask? Frankly you should be ashamed for asking that. Also, did you not notice the title of this web site?

First up, some rules:

  • All game code, graphics, physics and rendering entirely driven by GPU
  • CPU may only:
    • Read hardware (such as inputs), and pass to GPU
    • Fire off a fixed set of dispatches
    • Read requests from GPU to write to hardware (such as audio)
  • Pure line rendering

fyi, since writing, I’ve plonked all the code up here

Basic architecture / flow

The idea is to make the CPU side of things as minimal as possible. With that in mind, my basic flow is going to be a single MonoBehaviour in C# which:

  • Reads a set of input states (probably A-Z, 0-9 and a few others), and writes them into a compute buffer
  • Executes a series of dispatches to fire off various GPU compute kernels
  • Execute DrawProceduralIndirect to draw a load of lines generated by compute. (high bar: line renderer in compute!)
  • Read set of ‘audio requests’ written to buffer by GPU, and use to trigger some predefined sounds

So lets start with 3 files:

  • Game.cs for our monobehaviour
  • Asteroids.compute to contain all the compute jobs
  • DrawLines is a simple fragment shader that draws a set of lines
using UnityEngine;

public class Game : MonoBehaviour 
{
    public struct Line {
        public Vector3 a;
        public Vector3 b;
    }

    public const int LINE_BUFFER_SIZE = 10000;

    ComputeShader _asteroidsShader;
    Shader _drawLinesShader;
    Material _drawLinesMaterial;

    ComputeBuffer _linesBuffer;


    public void Awake() {
        _asteroidsShader = Resources.Load("asteroids");
        _drawLinesShader = Shader.Find("DrawLines");
        _drawLinesMaterial = new Material(_drawLinesShader);
        _drawLinesMaterial.hideFlags = HideFlags.HideAndDontSave;

        _linesBuffer = ComputeBufferUtils.Alloc(LINE_BUFFER_SIZE);
    }

    public void OnPostRender() {

        Line[] testLines = new Line[3];
        testLines[0] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(1, 0, 0) };
        testLines[1] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(1, 1, 0) };
        testLines[2] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(0, 1, 0) };
        _linesBuffer.SetData(testLines);

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProcedural(MeshTopology.Lines, testLines.Length*2);
    }
}    

This code starts by loading up some shaders and building a material, then sets up a compute buffer that’ll contain a set of Line structures to be rendered. ComputeBufferUtils is a handy class for allocating structured buffers that I’ll upload at some point.

The OnPostRender function simply plonks 3 lines into the lines buffer, passes it to the draw line material and finally calls DrawProcedural, requesting line topology, and specifying 2 vertices per line.

DrawProcedural is a tasty beast. It allows us to write a vertex shader takes just an integer index. It is then up to the shader to take this index and convert it into useful vertex positions, which is done using data in the provided compute buffer:

Shader "DrawLines"
{
    Properties
    {
    }
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 100

        Pass
        {
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            
            struct v2f
            {
                float4 vertex : SV_POSITION;
            };

            struct Line {
                float3 a;
                float3 b;
            };

            StructuredBuffer lines;
            
            
            v2f vert (uint id : SV_VertexID, uint inst : SV_InstanceID)
            {
                Line l = lines[id / 2];
                float3 p = (id & 1) ? l.a : l.b;
                
                v2f o;
                o.vertex = float4(p, 1);
                return o;
            }
            
            fixed4 frag (v2f i) : SV_Target
            {
                return 1;
            }
            ENDCG
        }
    }
}

This cheaky little shader uses the vertex id to access the lines buffer and work out whether the vertex represents the start or the end of the line.

Result…

firstlines

Woohoo! Nearly there….?

Compute and indirect drawing

Right, time to write that compute shader I loaded earlier!

Let’s start by adding a new structure to the game code, and allocating a couple of extra buffers:

    public struct Counters {
        public int numLines;
    }
    _countersBuffer = ComputeBufferUtils.Alloc(1);
    _dispatchBuffer = ComputeBufferUtils.Alloc(8, ComputeBufferType.IndirectArguments);

The compute shader (Asteroids.compute) mimics the CPU side structure and buffers:

//structures
struct Line {
    float3 a;
    float3 b;
};
struct Counters {
    int numLines;
};

//buffers
RWStructuredBuffer _linesRW;
RWStructuredBuffer _counters;
RWStructuredBuffer _dispatch;

//general use uniform to limit dispatch thread counts
int _threadCount;


Now for some compute kernels. The first, I’ll dispatch once at the start of each frame to clear the line counter:

[numthreads(1,1,1)]
void ClearLines(uint3 id : SV_DispatchThreadID)
{
    _counters[0].numLines = 0;
}

This is designed to be executed as a single thread on the GPU, and just does one tiny bit of ‘setup’ work. This pattern is very common in GPU code – tiny compute jobs to pass around / clear / tweak some data, then large parallel dispatches to do actual work.

The next kernel is a more classic compute job, and designed to run in a highly parallel manner.

//creates lines based on dispatch thread
[numthreads(256, 1, 1)]
void GenerateTestLines(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        //allocate space
        int lineIdx;
        InterlockedAdd(_counters[0].numLines, 1, lineIdx);

        //build line
        Line l;
        float ang = radians(id.x);
        float3 dir = float3(sin(ang), cos(ang), 0);
        l.a = dir * 0.1f;
        l.b = dir * 0.75f;
        _linesRW[lineIdx] = l;
    }
}

This uses InterlockedAdd to ‘allocate’ a slot in the lines buffer by atomically incrementing the numLines counter. A line is then generated based on the thread index, and written out to the correct slot. In future I’ll use this pattern for outputting all lines that need rendering.

Next, another simple setup kernel that takes the numLines counter and converts it into a set of arguments that can be passed to DrawProceduralIndirect:


//fills out indirect dispatch args 
[numthreads(1, 1, 1)]
void LineDispatchArgs(uint3 id : SV_DispatchThreadID)
{
    _dispatch[0] = _counters[0].numLines*2; //v count per inst (2 verts per line)
    _dispatch[1] = 1; //1 instance
    _dispatch[2] = 0; //verts start at 0
    _dispatch[3] = 0; //instances start at 0
}

This indirect technique is used with both drawing and dispatching, and allows the use of a compute job to setup the work for a following compute job or draw call. In this case I’m setting up the arguments required for DrawProcedural.

That little ‘*2’ in LineDispatchArgs took a while to spot! Interesting fun – if you’re on NVidia and force Gen lines to only build 64 lines, and remove the ‘*2’ you see it alternate between 2 groups of 32 lines. That’s cos you’re only rendering half the lines you generate. As an NVidia GPU works in warps of 32 threads you randomly get the 1st half or the 2nd half. Hardware showing its true form!

The Game.cs OnPostRender now looks roughly like this:

void OnPostRender()
{
        _asteroidsShader.SetBuffer(kernelClearLines, "_counters", _countersBuffer);
        DispatchOne(kernelClearLines);

        _asteroidsShader.SetBuffer(kernelGenerateTestLines, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernelGenerateTestLines, "_linesRW", _linesBuffer);
          DispatchItems(kernelGenerateTestLines, 360);

        _asteroidsShader.SetBuffer(kernelLineDispatchArgs, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernelLineDispatchArgs, "_dispatch", _dispatchBuffer);
        DispatchOne(kernelLineDispatchArgs);

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProcedural(MeshTopology.Lines, testLines.Length*2);
        Graphics.DrawProceduralIndirect(MeshTopology.Lines, _dispatchBuffer);
}

Where DispatchOne and DispatchItems are a couple of handy helpers for dispatching compute kernels:

    void DispatchOne(int kernel) {
        _asteroidsShader.Dispatch(kernel, 1, 1, 1);
    }
    void DispatchItems(int kernel, int items) {
        uint x,y,z;
        _asteroidsShader.GetKernelThreadGroupSizes(kernel, out x, out y, out z);
        _asteroidsShader.SetInt("_threadCount", items);
        _asteroidsShader.Dispatch(kernel, (items + (int)x - 1) / (int)x, 1, 1);
    }

Anyhoo, making it 2 degrees per line + rendering at 1024*768:

linecircleincompute

(oh yeah – I changed the clear colour to black too!).

Next, I think I need to correct for aspect ratio and stuff. Gonna operate at fixed res of 1024/768 then upscale / downscale points. Hmmm… spent a while trying to make this work with any resolution but got bored. Instead I’ll just convert from (1024,768) to screen space:

p = 2 * (p – float2(1024,768)*0.5) / float2(1024,768);

Also changed lines to be float2 whilst there!

A bit of game

It’s getting a bit big to post all the code up here, but I’ll eventually share it all on git hub!

Let’s get inputs in. To stay true to the goal, I want to avoid any understanding of ‘gameplay’ on CPU, so I’ll just push a big list of keys into game. CPU side, I’ll setup a KeyState structure and read a load of buttons into it each frame:


    public struct KeyState {
        public bool down;
        public bool pressed;
        public bool released;
    }
      for(int i = 0; i < 26; i++) {
            _cpuKeyStates['a' + i] = new KeyState {
                down = Input.GetKey(KeyCode.A + i),
                pressed = Input.GetKeyDown(KeyCode.A + i),
                released = Input.GetKeyUp(KeyCode.A + i),
            };
        }

On top of ‘A’-‘Z’, I also added some handy keys like ‘0’-‘9’, ‘ ‘ and a few other bits.

Now to define a new PlayerState (which also comes with a new _playerState buffer).

    public struct PlayerState {
        public Vector2 position;
        public float rotation;
        public Vector2 speed;
        public bool alive;
    }

For the moment, I’ll init this CPU side to get going

        PlayerState[] initPlayer = new PlayerState[1];
        initPlayer[0].position = new Vector2(1024f, 768f) * 0.5f;
        initPlayer[0].alive = true;
        _playerState.SetData(initPlayer);

The long term goal is to get a proper game loop going, so I don’t feel too dirty about a bit of CPU setup for now!

So compute side I’ve setup corresponding structures and buffers. Also added these few helper functions for drawing lines and manipulating vectors:

int AllocLines(int count) {
    int lineIdx;
    InterlockedAdd(_counters[0].numLines, count, lineIdx);
    return lineIdx;
}
void AddLine(int idx, float2 a, float2 b) {
    _linesRW[idx].a = a;
    _linesRW[idx].b = b;
}

float2 mulpoint(float3x3 trans, float2 p) {
    return mul(trans, float3(p, 1)).xy;
}
float2 mulvec(float3x3 trans, float2 p) {
    return mul(trans, float3(p, 0)).xy;
}

For now it’s 1 player, but I want to plan for a million, so I’ll work as though there’s more than 1. The basic compute kernel for player update looks like this:

[numthreads(256, 1, 1)]
void UpdateAndDrawPlayer(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        PlayerState player = _playersRW[id.x];
        if(player.alive) {
            //do stuff
        }
        _playersRW[id.x] = player;

    }
}

The core update code splits into 2 sections. First, standard asteroid style inputs:

            float rot = 0;
            float thrust = 0;
            float rotPerSecond = 1;
            float thrustPerSecond = 100;

            if (_keyStates['A'].down) {
                rot += rotPerSecond * _timeStep;
            }
            if (_keyStates['D'].down) {
                rot -= rotPerSecond * _timeStep;
            }
            if (_keyStates['W'].down) {
                thrust += thrustPerSecond * _timeStep;
            }
            player.rotation += rot;

            float2 worldy = float2(sin(player.rotation), cos(player.rotation));
            float2 worldx = float2(-worldy.y, worldy.x);

            player.velocity += worldy * thrust;
            player.position += player.velocity * _timeStep;

Then the second half calculates a transform matrix and generates some hard coded lines:

            worldx *= 50;
            worldy *= 50;
            float3x3 trans = {
                worldx.x, worldy.x, player.position.x,
                worldx.y, worldy.y, player.position.y,
                0, 0, 1
            };

            int lineIdx = AllocLines(5);
            
            float2 leftcorner = mulpoint(trans, float2(-0.7, -1));
            float2 rightcorner = mulpoint(trans, float2(0.7, -1));
            float2 tip = mulpoint(trans, float2(0, 1));
            float2 leftback = mulpoint(trans, float2(-0.2, -0.7f));
            float2 rightback = mulpoint(trans, float2(0.2, -0.7f));

            AddLine(lineIdx++, leftcorner, tip);
            AddLine(lineIdx++, rightcorner, tip);
            AddLine(lineIdx++, leftcorner, leftback);
            AddLine(lineIdx++, rightcorner, rightback);
            AddLine(lineIdx++, leftback, rightback);


Those vectors look roughly right I think. Side by side with the video:

shipsidebyside

After a bit of debugging, I’ve found the bool types in KeyState appeared to be causing problems. Seems odd, but I can fix it with ints and don’t fancy digging too deep right now. Once that’s sorted, we have a game sort of…

Quickly add screen wrapping…

            //wrap player (note: better version should handle overshoot amount)
            player.position = player.position >= 0 ? player.position : float2(1024, 768);
            player.position = player.position <= float2(1024,768) ? player.position  : 0;

This quick and dirty version could be better by handling the fact that if they overshoot by k pixels, they shoot come back k pixels from the other side. Pain in the ass though!

The thrust in the video looks like it’s just a flashing triangle, maybe with a bit of randomness thrown in. To achieve this, I’ll pass through the frame number, which is fed into a wang hash based random number generator. Then some cheeky code to add the extra triangle when thrusting:


            int thrustframe = (_frame / 4);
            if (thrust > 0 && (thrustframe &1)) {
                lineIdx = AllocLines(2);
                float2 thrustback = mulpoint(trans, float2(0.0f, -1.5f-wang_rand(thrustframe)*0.15f));
                AddLine(lineIdx++, leftback, thrustback);
                AddLine(lineIdx++, rightback, thrustback);
            }

Asteroids

Just like with players, I’ll define a structure for the asteroid state, with a bit of extra data in:

    public struct AsteroidState {
        public Vector2 position;
        public float rotation;
        public Vector2 velocity;
        public int alive;
        public float radius;
        public int level;
    }

As before, I create a buffer for this, and init a randomly placed, sensible set of asteroids:

        AsteroidState[] initAsteroids = new AsteroidState[MAX_ASTEROIDS];
        for(int i = 0; i < START_ASTEROIDS; i++) {             while(true) {                 initAsteroids[i].position = new Vector2(Random.Range(0f, 1024f), Random.Range(0f, 768f));                 if ((initAsteroids[i].position - initPlayer[0].position).magnitude > 200f)
                    break;
            }
            initAsteroids[i].alive = 1;
            initAsteroids[i].radius = 30;
            initAsteroids[i].rotation = Random.Range(-Mathf.PI, Mathf.PI);
            initAsteroids[i].velocity = Random.insideUnitCircle * 50f;
            initAsteroids[i].level = 0;
        }
        _asteroidState.SetData(initAsteroids);

A note at this point, after a few bugs I’ve swizzled a few bits around:

  • I’ve split update and render out into the unity Update and OnPostRender functions, as reading input seemed to break a bit when called from render
  • I’ve added a helpful BindEverything function that just takes a compute kernel and binds all my buffers and variables to it. This would be terrible behaviour in production, but this isn’t production.
  • I also changed ClearLines to BeginFrame, cos it’s starting to look like it’ll do more than clear out lines

The BindEverything code:

    void BindEverything(int kernel) {
        _asteroidsShader.SetInt("_maxPlayers", MAX_PLAYERS);
        _asteroidsShader.SetInt("_maxAsteroids", MAX_ASTEROIDS);
        _asteroidsShader.SetFloat("_time", Time.time);
        _asteroidsShader.SetFloat("_timeStep", Time.deltaTime);
        _asteroidsShader.SetInt("_frame", Time.frameCount);

        _asteroidsShader.SetBuffer(kernel, "_dispatch", _dispatchBuffer);
        _asteroidsShader.SetBuffer(kernel, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernel, "_linesRW", _linesBuffer);
        _asteroidsShader.SetBuffer(kernel, "_keyStates", _keyStates);
        _asteroidsShader.SetBuffer(kernel, "_playersRW", _playerState);
        _asteroidsShader.SetBuffer(kernel, "_asteroidsRW", _asteroidState);
    }

With the asteroids added, the dispatch now looks like this:

    private void Update() {
        //all my inputs are read into _cpyKeyStates here
        _keyStates.SetData(_cpuKeyStates);

        DispatchOne(kernelBeginFrame);
        DispatchItems(kernelUpdateAndDrawPlayer, MAX_PLAYERS);
        DispatchItems(kernelUpdateAndDrawAsteroid, MAX_ASTEROIDS);
        DispatchOne(kernelLineDispatchArgs);
    }

And render is still very simple:

    public void OnPostRender() {

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProceduralIndirect(MeshTopology.Lines, _dispatchBuffer);
    }

So, the asteroids update kernel. It’s pretty similar to player update in functionality. The first half updates its position, and the second renders it.

//updates player movement and outputs draw request
[numthreads(256, 1, 1)]
void UpdateAndDrawAsteroid(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)     {         AsteroidState asteroid = _asteroidsRW[id.x];         if (asteroid.alive) {             asteroid.position += asteroid.velocity * _timeStep;             asteroid.position = asteroid.position >= 0 ? asteroid.position : float2(1024, 768);
            asteroid.position = asteroid.position <= float2(1024, 768) ? asteroid.position : 0;

            float scl = asteroid.radius;

            float2 worldy = float2(sin(asteroid.rotation), cos(asteroid.rotation));
            float2 worldx = float2(-worldy.y, worldy.x);
            worldx *= scl;
            worldy *= scl;
            float3x3 trans = {
                worldx.x, worldy.x, asteroid.position.x,
                worldx.y, worldy.y, asteroid.position.y,
                0, 0, 1
            };

            //alloc edges
            const int NUM_EDGES = 9;
            int lineIdx = AllocLines(NUM_EDGES);

            //build first point then start iterating
            float randscl = 0.75f;
            float2 first;
            {
                int i = 0;
                float ang = 0;
                float2 pos = float2(sin(ang), cos(ang));
                pos += randscl * float2(wang_rand(id.x*NUM_EDGES + i), wang_rand(id.x*NUM_EDGES * 2 + i));
                first = mulpoint(trans, pos);
            }
            float2 prev = first;
            for (int i = 1; i < NUM_EDGES; i++) {

                //offset every other point using random number
                float ang = (i*3.1415927f*2.0f) / NUM_EDGES;
                float2 pos = float2(sin(ang), cos(ang));
                pos += randscl * float2(wang_rand(id.x*NUM_EDGES + i), wang_rand(id.x*NUM_EDGES * 2 + i));

                //add new line
                float2 curr = mulpoint(trans, pos);
                AddLine(lineIdx++, prev, curr); 
                prev = curr;
            }

            //add final line to joinn previous point to first point
            AddLine(lineIdx++, prev, first);

        }
        _asteroidsRW[id.x] = asteroid;
    }
}

I spent quite a while before I was happy with the look of the asteroids, and I’m still not fully satisfied. The above code basically generates a 9 edge circle, then randomly adjusts the position of each vertex. It also takes account of the asteroid’s radius, in anticipation of varying sizes.

firstasteroids

Bullets

Bullets are the first slightly cheeky one, as I need to be able to spawn them, which means  allocating/freeing of some form. This kind of model is tricky on GPU, so I’m going to go for the simple option of a large circular buffer of bullets. If the buffer overflows it’ll start recycling bullets, but I’ll just make it big enough not too!

Aside from a life time, the bullet state is very simple:

    public struct BulletState {
        public Vector2 position;
        public Vector2 velocity;
        public float lifetime;
    }

And the update equally so:

//updates player movement and outputs draw request
[numthreads(256, 1, 1)]
void UpdateAndDrawBullet(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)     {         BulletState bullet = _bulletsRW[id.x];         if (bullet.lifetime > 0) {

            bullet.position += bullet.velocity * _timeStep;
            if (any(bullet.position < 0) || any(bullet.position > float2(1024, 768))) {
                bullet.lifetime = -1;
                return;
            }
            bullet.lifetime -= _timeStep;

            float scl = 2;
            float3x3 trans = {
                scl, 0, bullet.position.x,
                0, scl, bullet.position.y, 
                0, 0, 1
            };

            //alloc edges
            const int NUM_EDGES = 6;
            int lineIdx = AllocLines(NUM_EDGES);

            //build first point then start iterating
            float2 first;
            {
                int i = 0;
                float ang = 0;
                float2 pos = float2(sin(ang), cos(ang));
                first = mulpoint(trans, pos);
            }
            float2 prev = first;
            for (int i = 1; i < NUM_EDGES; i++) {

                float ang = (i*3.1415927f*2.0f) / NUM_EDGES;
                float2 pos = float2(sin(ang), cos(ang));
                float2 curr = mulpoint(trans, pos);
                AddLine(lineIdx++, prev, curr);
                prev = curr;
            }

            //add final line to joinn previous point to first point
            AddLine(lineIdx++, prev, first);

        }
        _bulletsRW[id.x] = bullet;
    }
}

The main difference is that unlike the ship and asteroid, bullets have a life time and automatically die when it runs out. I couldn’t figure out whether bullets wrap in the original game, but it felt better when they didn’t, so they die when going off screen for the moment.

Now to implement the spawning. I’ll start with adding a nextBullet counter:

    public struct Counters {
        public int numLines;
        public int nextBullet;
    }

Now for the code in the player update to fire one:

            if (_keyStates[' '].pressed) {
                int nextBullet;
                InterlockedAdd(_counters[0].nextBullet, 1, nextBullet);
                BulletState b;
                b.position = player.position;
                b.velocity = worldy * 1000;
                b.lifetime = 3;
                _bulletsRW[nextBullet%_maxBullets] = b;
            }

When space is pressed, I increment the bullet counter, then a new bullet is created and written into the new slot. Note the ‘mod’ allows the nextBullet counter to get ever higher but still be used as an index into the _bulletsRW buffer.

Collision

Ok! We have stuff. Now to smash it. If I hit millions of players and asteroids, collision will have to start using optimisation structures, but GPUs are fast and like nothing more than doing 1000s of identical calculations. So collision will be a classic brute force ‘test everything against everything else’ situation.

Lets start with player vs asteroid collision:

[numthreads(256, 1, 1)]  
void CollidePlayerAsteroid(uint3 id : SV_DispatchThreadID) 
{
    if (id.x < _threadCount)
    {
        int playerIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (playerIdx*_maxAsteroids);

        PlayerState player = _playersRW[playerIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (player.alive && asteroid.alive) {
            if (length(player.position - asteroid.position) < (asteroid.radius+10)) {
                _playersRW[playerIdx].alive = 0;
                _asteroidsRW[asteroidIdx].alive = 0;
            }
        }
    }
}

The actual collision code here is a basic circle test. If the player is within 10 pixels of the asteroid, they die. The sneaky bit is at the top, where the player and asteroid indices are calculated. This makes more sense when you look at the dispatch CPU side:

        DispatchItems(kernelCollidePlayerAsteroid, MAX_PLAYERS * MAX_ASTEROIDS);

In effect, I dispatch 1 thread for every combination of player and asteroid. The funky calculations in CollidePlayerAsteroid simply decompose the thread id into player index and asteroid index, just like you might convert a pixel index to a pixel coordinate in an image.

Shockingly enough, for now the bullet vs asteroid is pretty similar:

[numthreads(256, 1, 1)]
void CollideBulletAsteroid(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        int bulletIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (bulletIdx*_maxAsteroids);

        BulletState bullet = _bulletsRW[bulletIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (bullet.lifetime > 0 && asteroid.alive) {
            if (length(bullet.position - asteroid.position) < (asteroid.radius + 2)) {
                _bulletsRW[bulletIdx].lifetime = -1;
                _asteroidsRW[asteroidIdx].alive = 0;
            }
        }
    }
}

There’s one ingredient before we can declare real progress though – asteroids smash! To achieve smashing asteroids I’ll take a similar approach to the bullets. The asteroids buffer will be large enough to contain all the asteroids a level will ever need (big, medium and small), and I’ll just use a counter to allocate into it. Once that’s added, I can create a splitting function:

void SplitAsteroid(int idx) {
    int nextIndex;
    InterlockedAdd(_counters[0].nextAsteroid, 2, nextIndex);

    AsteroidState asteroid = _asteroidsRW[idx];
    if (asteroid.level < 2) {

        float childSpeed = 50; 

        AsteroidState child;
        child.position = asteroid.position;
        child.velocity = asteroid.velocity + (float2(wang_rand(nextIndex), wang_rand(nextIndex * 2)) * 2 - 1) * childSpeed;
        child.alive = 1;
        child.radius = asteroid.radius * 0.5;
        child.rotation = (wang_rand(nextIndex * 3) * 2 - 1) * 3.1415927f;
        child.level = asteroid.level + 1;
        _asteroidsRW[nextIndex++] = child;

        child.position = asteroid.position;
        child.velocity = asteroid.velocity + (float2(wang_rand(nextIndex), wang_rand(nextIndex * 2)) * 2 - 1) * childSpeed;
        child.alive = 1;
        child.radius = asteroid.radius * 0.5;
        child.rotation = (wang_rand(nextIndex * 3) * 2 - 1) * 3.1415927f;
        child.level = asteroid.level + 1;
        _asteroidsRW[nextIndex++] = child;
    }


    _asteroidsRW[idx].alive = 0;
}

This code allocates 2 new slots by incrementing nextAsteroid by 2. Assuming the asteroid level < 2 (i.e. it isn’t too small), I then proceed to generate 2 new smaller asteroids. These start at the position of the larger one, but take a on randomly adjusted velocity. Finally, the asteroid is killed off. I also just noticed I probably shouldn’t allocate child asteroids unless I intend to use them, but it doesn’t matter.

A quick tweak to the bullet code:

        if (bullet.lifetime > 0 && asteroid.alive) {
            if (length(bullet.position - asteroid.position) < (asteroid.radius + 2)) {
                _bulletsRW[bulletIdx].lifetime = -1;
                SplitAsteroid(asteroidIdx);
            }
        }

And it’s asteroid time:

Yay! Ok – I have to get on a plane. Hopefully by the time I’ve landed there’ll be sound effects or something…

For UI, sounds and game loops, head to the next post here: Shaderoids 2

Signed Distance Fields Part 8: Gradients, bevels and noise

Over the course of this series, I’ve covered the principles, generation and some fun uses of the distances stored in SDFs. However they have another very useful property – the gradient. In this post I’ll cover the basics of a gradient, and demonstrate a few simple uses of it.

The Gradient

Hopefully after the past 7 posts it’s become clear that a signed distance field gives you the distance to the closest edge from any point. The gradient (sometimes referred to as the differential) points either directly towards or away from the closest edge:

gradients

This diagram shows the standard distance visualisation’ for 2 fields used earlier in this blog. Overlaid are arrows that show the gradient. Inside the field, where the distance is negative, the gradient points directly towards the edge. Outside it, the gradient points directly away from the edge. These directions can be expressed as a 2D vector, which is referred to as the gradient vector. In a signed distance field, that vector is always of unit length (more on this later).

The gradients for our cat can be visualised for in a couple of ways:

gradientcolours

On the left, gradient.x (-1 to 1) is shown in the green channel and gradient.y (-1 to 1) is shown in the blue channel. My favoured visualisation on the right converts the gradient to an angle between 0 and 360 degrees, then uses the result to calculate a hue.

This next bit is a little mathematical, so if you don’t care about it, just know that:

  • The gradient of a point represents the direction towards (if outside the geometry) or away from (if inside the geometry) the closest edge
  • The gradient of a signed distance field is always a unit vector
  • We’re updating SignedDistanceFieldGenerator.End() to calculate and store the gradient vector in the green and blue channels of the field texture
  • To improve the field quality, I will also briefly introduce (but not go over in full) an improvement to our existing sweeping algorithm called an Eikonal Sweep

If you like, skip to the next section about bevels, or read on for some theory and codez!

Some theory

Calculation of the gradient requires a new concept – the partial derivatives of the field in x and y. This complex sounding term actually just means the rate of change in the x direction, and the rate of change in the y direction. To begin, assume the following definitions:

  • The distance stored in the pixel at coordinate [i,j] is referred to as Ui,j
  • The left neighbour at [i-1,j] is referred to as Ui-1,j
  • The right neighbour at [i+1,j] is referred to as Ui+1,j
  • The bottom neighbour at [i,j-1] is referred to as Ui,j-1
  • The top neighbour at [i,j+1] is referred to as Ui,j+1

With these in mind, now consider the rate of change in X. To calculate it, we need to choose which neighbour (left or right) is closer to the edge (i.e. which is smaller).

  • The smallest horizontal neighbour will be referred to as UH
  • If the left neighbour (Ui-1,j) is smallest, UH = Ui-1,j and delta X (or dX) is -1
  • If the right neighbour (Ui+1,j) is smallest, UH = Ui+1,j and delta X (or dX) is +1
  • The change in distance, UH-Ui,j is called delta U, or dU
  • Finally, the partial derivative in X is dU divided by dX, written dU/dX

In pseudo code:

GetPartialDerivativeInX(X,Y)
    //read Uij, Ui-1,j and Ui+1,j
    distance = GetDistance(X,Y) //Uij
    leftNeighbour = GetDistance(X-1,Y) //Ui-1,j
    rightNeighbour = GetDistance(X+1,Y) //Ui+1,j
    
    //choose either left or right neighbour to calculate
    //deltaU and deltaX
    if leftNeighbour < rightNeighbour:
        dU = leftNeighbour - distance 
        dX = -1
    else
        dU = rightNeighbour - distance 
        dX = 1
    end 

    //return the result 
    return dU/dX
end

We can do exactly the same for the partial derivative in Y (dU/dY), by calculating the smallest vertical neighbour (UV) and executing the same logic. Pseudo code for the partial derivative in Y is thus:

GetPartialDerivativeInY(X,Y)
    //read Uij, Ui,j-1 and Ui,j+1
    distance = GetDistance(X,Y) //Uij
    bottomNeighbour = GetDistance(X,Y-1) //Ui,j-1
    topNeighbour = GetDistance(X,Y+1) //Ui,j+1
    
    //choose either top or bottom neighbour to calculate
    //deltaU and deltaX
    if bottomNeighbour < topNeighbour:
        dU = bottomNeighbour - distance 
        dY = -1
    else
        dU = topNeighbour - distance 
        dY = 1
    end 

    //return the result 
    return dU/dY
end

The gradient (or derivative) is simply a vector that contains [dU/dX, dU/dY]. For an SDF this should naturally be of unit length and not need to be normalized. Thus, in psuedo code:

GetGradient(X,Y)
    return [GetPartialDerivativeInX(X,Y),GetPartialDerivativeInY(X,Y)]
end

Hopefully that wasn’t too tricky, but if it was, don’t worry – as with most things in game dev, using it is more important than instantly getting the theory.

Calculating the gradient in code

Next, we’ll update SignedDistanceFieldGenerator.End() to iterate over every pixel, calculate the gradient vectors, and store them using the green and blue channels of the field texture:

for (int y = 0; y < m_y_dims; y++) {     for (int x = 0; x = 0 ? 1.0f : -1.0f;         float maxval = float.MaxValue * sign;         //read neighbour distances, ignoring border pixels         float x0 = x > 0 ? GetPixel(x - 1, y).distance : maxval;
        float x1 = x  0 ? GetPixel(x, y - 1).distance : maxval;
        float y1 = y < (m_y_dims - 1) ? GetPixel(x, y + 1).distance : maxval;

        //use the smallest neighbour in each direction to calculate the partial deriviates
        float xgrad = sign*x0 < sign*x1 ? -(x0-d) : (x1-d);
        float ygrad = sign*y0 < sign*y1 ? -(y0-d) : (y1-d);

        //combine partial derivatives to get gradient
        Vector2 grad = new Vector2(xgrad, ygrad);

        //store distance in red channel, and gradient in green/blue channels
        Color col = new Color();
        col.r = d;
        col.g = grad.x;
        col.b = grad.y;
        col.a = d < 999999f ? 1 : 0;
        cols[y * m_x_dims + x] = col;
    }
}

The code above is mostly a condensed version of the pseudo code to calculate the partial derivatives in X and Y, then combine them into a Vector2 gradient for every pixel. The only additions are:

  • We use the sign of the source pixel to ensure all comparisons work correctly for inner (-ve) pixels
  • Borders are handled by just assigning a ‘really big number’ to pixel coordinates that are out of bounds
  • As a shortcut for dividing dU by 1 or -1, we just do/don’t negate it respectively
  • And of course, the result, along with the distance is written into a colour for storage in a texture

Reminder: Find the full function in SignedDistanceFieldGenerator.cs

Improved field sweeping

One subtle issue I’ve mostly avoided up until now is that our current technique for sweeping isn’t perfect. When first introducing the 8PSSEDT sweeping algorithm, I showed this diagram:

SweepDist

It demonstrates how the approximation of A’s distance based on its neighbour (B) isn’t perfect. Mathematically, we end up with a gradient that points in slightly the wrong direction, and a distance that is slightly larger than it needs to be.

The sweep is most accurate close to the edge of the geometry (where the data is perfect) and gets worse as it moves outwards. Thus most of our distance effects haven’t really suffered. However if we render the contours of the field in blue, the issue is visible:

sweepcontourartifacts

Note how further from the edge of the geometry, the sweep introduces straight contours, rather than nice curvy ones that match the geometry.

Some gradient effects are very sensitive to field accuracy, so I’ve added a new sweeping algorithm for this post called an Eikonal Sweep. This yields accurate distances and gradients, resulting in much nicer contours:

eikonalcontours

Eikonal sweeping is a much more mathematical approach and requires a whole post in itself, so I won’t cover it here. For now, know that a new EikonalSweep() function has been added to SignedDistanceFieldGenerator, to replace the existing Sweep() function.

Bevels

The bevel effect gives an image a fake 3D effect, by adding borders that are shaded in such a way as to look like they are edges:

bevelcat

This rather fancy looking border can do wonders for UI, and if used cleverly can even fake interaction with the lighting from the game world. The basic idea, as shown in the code below, is to use the gradient of the field with a pretend ‘light direction’ to shade the border:


//sample distance as normal
float d = sdf.r + _Offset;

//choose a light direction (just [0.5,0.5]), then dot product
//with the gradient to get a brightness
float2 lightdir = normalize(float2(0.5,0.5));
float diffuse = saturate(dot(sdf.gb,-lightdir));

//by default diffuse is linear (flat edge). combine with border distance
//to fake curvy one if desired
float curvature = pow(saturate(d/_BorderWidth),_BevelCurvature);
diffuse = lerp(1,diffuse,curvature);

//calculate the border colour (diffuse contributes 75% of 'light')
float4 border_col = _Fill * (diffuse*0.75+0.25);
border_col.a = 1;

//choose output
if(d < 0)
{
    //inside the goemetry, just use fill
    res = _Fill;
}
else if(d < _BorderWidth)
{
    //inside border, use border_col with tiny lerp across 1 pixel 
    //to avoid aliasing
    res = lerp(_Fill,border_col,saturate(d));
}
else
{
    //outside border, use fill col, with tiny lerp across 1 pixel 
    //to avoid aliasing 
    res = lerp(border_col,_Background,saturate(d-_BorderWidth));
}

Starting with the first bit:

//sample distance as normal
float d = sdf.r + _Offset;

//choose a light direction (just [0.5,0.5]), then dot product
//with the gradient to get a brightness
float2 lightdir = normalize(float2(0.5,0.5));
float diffuse = saturate(dot(sdf.gb,-lightdir));

This little section is the key to the whole effect. We pick a pretend ‘light direction’ (which could have been passed in as a shader parameter if desired), and dot product it with the gradient of the field. If the gradient directly opposes the light direction, it is assumed to be facing towards the light. If the gradient is perfectly aligned to the light direction, it is facing away from the light. On completion, diffuse contains a value between 0 and 1 indicating how lit the border is at this point.

Skipping over the curvature section for the moment, we then calculate the actual colour of the border.

//calculate the border colour (diffuse contributes 75% of 'light')
float4 border_col = _Fill * (diffuse*0.75+0.25);
border_col.a = 1;

This simple code selects the border colour by applying our lighting to the fill colour. The lighting is effectively taking 25% ambient light, and 75% of the diffuse light.

Finally, probably familiar by now, we select whether to use background, border or fill colour:

//choose output
if(d < 0)
{
    //inside the goemetry, just use fill
    res = _Fill;
}
else if(d < _BorderWidth)
{
    //inside border, use border_col with tiny lerp across 1 pixel 
    //to avoid aliasing
    res = lerp(_Fill,border_col,saturate(d));
}
else
{
    //outside border, use fill col, with tiny lerp across 1 pixel 
    //to avoid aliasing 
    res = lerp(border_col,_Background,saturate(d-_BorderWidth));
}

This is simply choosing _Fill if inside the shape, border_col if within the border, or _Background otherwise. The lerps serve no purpose for the effect other than to avoid aliasing by transitioning cleanly between colours over 1 pixel.

If we were to just use this code, ignoring the curvature section, the result is already visibily ‘3D’:

bevelcatstraight

The extra ‘curvature’ section just tweaks the diffuse lighting value based on distance from the edge of the geometry:


//by default diffuse is linear (flat edge). combine with border distance
//to fake curvy one if desired
float curvature = pow(saturate(d/_BorderWidth),_BevelCurvature);
diffuse = lerp(1,diffuse,curvature);

If _BevelCurvature is 0, curvature will always equal 1. As a result, the lerp will never change the value of diffuse. However, as _BevelCurvature increases, the distance, d, the pixel is from the edge will have an increasingly large effect. This causes diffuse to be increasingly biased towards a value of 1 close to the edge of the geometry. Visually, this produces a curvy look, rather than an angular look:

bevelcatcurved

The one remaining problem you may have spotted is the unpleasant discontinuities in the shading:

discontinuity

These unfortunate artefacts are a result of the fact that our source image simply wasn’t of a high enough quality to get good gradients out. Despite our efforts, the conversion from image to field isn’t perfect and the issue shows up when using gradient based effects. A high curvature tends to hide the issue a little, but the artefact is always there.

The most obvious solution is to use a very high resolution source image and then downsample. However, if not practical, one option is to blur the field slightly. This will make it less accurate in terms of distances, but soften out ‘crinkles’ in the field. I have added to SignedDistanceFieldGenerator a very simple Soften() function that applies a dumb blur to every pixel to demo the effect:

soften

As you can see, the subtle blur in the middle has little effect on the shape of the geometry, but does get rid of the artefacts. On the right the blur is turned right up which results in a blobby field (a fun effect in itself!). Note: please ignore the the fact that the lighting is inverted on the left image – shader bug when taking screen shots!

Whether softening is the right solution for you will depend on your scenario. Ideally for high quality data you start from high quality input assets, but if that’s not practical, sacrificing field accuracy for smoothness may be a good way to go.

Note: if you do decide to rely on softening, I recommend looking up better algorithms than the supplied Soften function, as I knocked it together in 10 minutes and it isn’t very smart!

Finding The Edge

The addition of gradient info to the field yields another handy property – a given point, with both the distance and direction to/from the closest edge, it is easy to find out the actual location of the closest edge. Whilst this isn’t so much an effect, it’s a useful feature and can generate some fun visualisations.

edgedist (1)

In this diagram we start off with sampling a distance field at a location, p = [1.5,1.5]. At this location, the distance field tells us:

  • the closest edge is at a distance, d = 3.53 away
  • the gradient, which points directly away from the closest edge, is g =[-0.71,-0.71

Armed with this information, it is easy to see that to get the location of the closest edge point we multiply the gradient by distance and subtract from the input point. Mathematically:

edge point, ep = p – d * g

Or in psuedo code:

GetClosestEdgePoint(Vector2 SrcPoint)
    //read distance,d and gradient,g from field
    Float Distance,Gradient;
    SampleField(SrcPoint, out Distance, out Gradient)

    //calculate and return point
    return SrcPoint - Distance * Gradient
End

We can visualise the results of edge finding by:

  • Sampling the SDF at a given UV as normal
  • Working out the corresponding edge UV
  • Sampling the SDF at the edge UV

If our logic is valid the edge sample should return a distance very close to 0 (as it is the edge!). To begin, we’ll create a function to find and sample the edge:

void sampleedge(float4 sdf, float2 uv, out float4 edgesdf, out float2 edgeuv)
{
    edgeuv = uv - sdf.gb*sdf.r*_MainTex_TexelSize.xy;
    edgesdf = samplesdf(edgeuv);
}

Next, we’ll add a new edge finder visualization to the shader:

else if(_Mode == 12) //EdgeFind
{
    //use sampleedge to get the edge field values and uv
    float2 edgeuv;
    float4 edgesdf;
    sampleedge(sdf,uv,edgesdf,edgeuv);

    //visualize error threshold of 1 pixel in r, and highlight geometry edge with b
    float edged = edgesdf.x;
    res.r = abs(edged) > 1 ? 1 : 0;
    res.g = 0;
    res.b = 1-saturate(abs(sdf.r));
}

This simple shader samples the edge distance field, and outputs a red pixel if the error is greater than 1. As an extra tweak, it highlights the actual edge blue so we can still see the geometry:

edgefind

The result, as you can see, is OK but far from perfect. Black pixels represent areas that successfully found the edge. However we can see distinct lines, worst on the cat, where the test failed. Comparing the edge finder to the gradient view it is possible to see where these errors occurred:

edgeerror

Any SDF contains discontinuities – areas where 2 neighbouring pixels have different closest edges. On the right hand image where the gradient is visualised, these discontinuities show up as sharp changes in colour. In the left image these correspond precisely to areas where edge finding failed.

The simplest way to solve this is to simply keep stepping toward the edge, getting a little bit closer each time:

bool GetClosestEdgePoint(Vector2 Point, int MaxSteps, out Result)
    For(i = 1 to MaxSteps)
        Float Distance,Gradient;
        SampleField(Point, out Distance, out Gradient)
        if(abs(Distance) < 0.5)
            break
        Point -= Distance * Gradient
    End
    Result = Point
    return abs(Distance) < 0.5
End

This pseudo code will iterate until a point has been found within 0.5 pixels or reaches a predefined maximum number of steps.

To implement it in the shader, we’ll introduce a new function called steptowardsedge, that takes as an input a sample and UV, then updates them both with a sample and UV closer to the edge:

void steptowardsedge(inout float4 edgesdf, inout float2 edgeuv)
{
    edgeuv -= edgesdf.gb*edgesdf.r*_MainTex_TexelSize.xy;
    edgesdf = samplesdf(edgeuv);
}

The sampleedge function is then updated to keep sampling until the edge is reached:

bool sampleedge(float4 sdf, float2 uv, out float4 edgesdf, out float2 edgeuv, out int steps)
{
    edgesdf = sdf;
    edgeuv = uv;
    steps = 0;

    [unroll(8)]
    for(int i = 0; i < _EdgeFindSteps; i++)
    {
        if(abs(edgesdf.r) < 0.5)
            break;
        steptowardsedge(edgesdf,edgeuv);
        steps++;
    }

    return abs(edgesdf.r) < 0.5;
}

This new version also returns true/false to indicate whether the edge was found, and outputs the number of steps taken. Using this new data, we can update the edge find visualisation to render a hue that shows how many steps were taken, or simply shows black if the edge was never found:

else if(_Mode == 12) //EdgeFind
{
    //use sampleedge to get the edge field values and uv
    float2 edgeuv;
    float4 edgesdf;
    int edgesteps;
    bool success = sampleedge(sdf,uv,edgesdf,edgeuv,edgesteps);

    //visualize number of steps to reach edge (or black if didn't get there')
    res.rgb = success ? HUEtoRGB(0.75f*edgesteps/8.0f) : 0;
}

Running this visualisation on the cat field gives us this image:

edgefinditerative

Here we see:

  • Red pixels took 0 steps – i.e. they are already on the edge
  • Orange pixels (the majority) took 1 step
  • Yellow pixels took 2 steps
  • Green pixels took 3 steps
  • No black pixels are visible, as the edge never failed to be found

This technique can also be handy if you’re dealing with a less accurate field, having softened it to reduce artefacts as mentioned in the bevels section:

edgefinditerativesoft

Here you can see many more pixels had to take 2 steps (yellow) due to the less accurate field, however they still all got there eventually.

Noise

Noise based effects aren’t specific to SDFs, but they certainly work well together. In case you’re not familiar with the concept of noise, it typically refers to a way of generating a grid of values that whilst random, are still continuous (think clouds instead of static):

noise

Above is a noise texture, generated by layering Perlin Noise at different frequencies, named after the father of this technique – Ken Perlin. I don’t want to go into too much depth on how noise works for this blog though – the key is, we’ve got a texture like the one above! In fact, we have a 4 channel texture, with a different ‘cloud pattern’ stored in each channel.

For reference, the function that generates it works by sampling Unity’s built in Perlin noise in multiple octaves:

Color[] GenerateNoiseGrid(int w, int h, int octaves, float frequency, float lacunarity, float persistance)
{
    //calculate scalars for x/y dims
    float xscl = 1f / (w - 1);
    float yscl = 1f / (h - 1);

    //allocate colour buffer then iterate over x and y
    Color[] cols = new Color[w * h];
    for (int x = 0; x < w; x++)
    {
        for (int y = 0; y < h; y++)
        {
            //classic multi-octave perlin noise sampler
            //ends up with 4 octave noise samples
            Vector4 tot = Vector4.zero;
            float scl = 1;
            float sum = 0;
            float f = frequency;
            for (int i = 0; i < octaves; i++)
            {
                for (int c = 0; c < 4; c++)
                    tot[c] += Mathf.PerlinNoise(c * 64 + f * x * xscl, f * y * yscl) * scl;
                sum += scl;
                f *= lacunarity;
                scl *= persistance;
            }
            tot /= sum;

            //store noise value in colour
            cols[y * w + x] = new Color(tot.x, tot.y, tot.z, tot.w);
        }
    }
    return cols;
}

This function is used to generate the pixel colours for a texture, which is then stored in the SDF class as m_noise_texture, and passed into the shader as _NoiseTex.

We then have a tiny function in the shader that samples the 4 channel texture, and combines each channel into a single animated value.

//samples the animated noise texture
float samplenoise(float2 uv)
{
    float t = frac(_NoiseAnimTime)*2.0f*3.1415927f;
    float k = 0.5f*3.1415927f;
    float4 sc = float4(0,k,2*k,3*k)+t;
    float4 sn = (sin(sc)+1)*0.4;       
    return dot(sn,tex2D(_NoiseTex,uv));
}

Although it looks a little funky, all this function is really doing is taking a parameter called _NoiseAnimTime and using it to generate a vector, sn that contains 4 different values between 0 and 1. This is then combined with the sampled _NoiseTex to get an output noise value between 0 and 1 that animates over time:

A cleverer but more expensive approach might have been to fully generate noise within the shader. This is entirely achievable on modern GPUs, but impractical on lower end mobile devices.

Now we’re going to use the output of this samplenoise function within the samplesdf function to modify the sampled distance. Our sample function now looks like this:

//helper to perform the sdf sample 
float4 samplesdf(float2 uv)
{
      //sample distance field
    float4 sdf = tex2D(_MainTex, uv);

    //if we want to do the 'circle morph' effect, lerp from the sampled 
    //value to a circle value here
    if(_CircleMorphAmount > 0)
    {
        float4 circlesdf = centeredcirclesdf(uv,_CircleMorphRadius);
        sdf = lerp(sdf, circlesdf, _CircleMorphAmount);
    }

    //re-normalize gradient in gb components as bilinear filtering
    //and the morph can mess it up 
    sdf.gb = normalize(sdf.gb);
                
    //if edge based noise is on, adjust distance by sampling noise texture
    if(_EnableEdgeNoise)
    {
        sdf.r += lerp(_EdgeNoiseA,_EdgeNoiseB,samplenoise(uv));
    }
                
    return sdf;
}
    

Assuming _EnableEdgeNoise is true, the updated function now:

  • Reads a noise value from 0 to 1
  • Uses it to lerp between 2 input parameters: _EdgeNoiseA and _EdgeNoiseB
  • Adds the result to the sampled distance

As all our effects work by calling samplesdf, they will now all support reading the ‘noisy’ distances. Playing with different effects and noise values, the results can be very varied and quite pleasing:

noiseycats

And of course, because it’s animated…

A note on broken fields

It’s worth mentioning at this point that both the earlier morph effect, and our updated noise effect break the field a little. By this I mean that after fiddling with the sampled values they are no longer guaranteed to represent the distance to the closest edge. This is surprisingly well hidden with most of the effects so far, as they only really care about values very close to the edge of the geometry where the errors are smallest. However, we can spot the result in the bevel effect:

bevelnogradient

One result of the broken field is that our calculated gradient values no longer match those of the field with noisy edges applied. As a result, the wibbly edge is still shaded as though it were flat.

There are many approaches to solving this issue depending on your use case. At the extreme end, you might perform your morphing or noise on the CPU, then do a full re-sweep of the field every frame. On the other hand, if the problem isn’t particularly visible, you might do nothing at all!

For this particular situation, we’re going to ignore the distance errors, but attempt to fix the gradient on the fly so the bevel works nicely. To do so, we’ll add a new function to the shader designed to only sample the distance, but ignore gradient:

//cut down version of samplesdf that ignores gradient
float samplesdfnograd(float2 uv)
{
      //sample distance field
    float4 sdf = tex2D(_MainTex, uv);

    //if we want to do the 'circle morph' effect, lerp from the sampled 
    //value to a circle value here
    if(_CircleMorphAmount > 0)
    {
        float4 circlesdf = centeredcirclesdf(uv,_CircleMorphRadius);
        sdf = lerp(sdf, circlesdf, _CircleMorphAmount);
    }
                
    //if edge based noise is on, adjust distance by sampling noise texture
    if(_EnableEdgeNoise)
    {
        sdf.r += lerp(_EdgeNoiseA,_EdgeNoiseB,samplenoise(uv));
    }

    return sdf;      
}

That should look pretty familiar! Now, we’ll add the following lines to the main sdf sampling function:

    //if requested, overwrite sampled gradient with one calculated live in
    //shader that takes into account morphing and noise
    if(_FixGradient)
    {
        float d = sdf.r;
        float sign = d > 0 ? 1 : -1;
        float x0 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(-1,0));
        float x1 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(1,0));
        float y0 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(0,-1));
        float y1 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(0,1));
               
        float xgrad = sign*x0 < sign*x1 ? -(x0-d) : (x1-d);
        float ygrad = sign*y0 < sign*y1 ? -(y0-d) : (y1-d);
                
        sdf.gb = float2(xgrad,ygrad);
    }

This rather expensive function performs the same calculations within the shader that our earlier CPU code used to calculate the gradient by sampling neighbours. However, by being run in the shader and accounting for morphing / noise, it provides much more accurate gradients:

bevelfixedgradient

Note the shading on the wobbly edges now contains matching shadows.

Summary

That concludes what I think is probably the main section my intro to SDFs. Across the 7 posts you should have learnt the core principles of signed distance fields and seen how they can be used in 2D to generate some useful effects. Some future things to look at that I may cover, or you may wish to investigate are:

  • 3D fields. These work exactly like 2D fields in every way! Visualising them can be tricky though. A good place to start is to search for the marching cubes or marching tetrahedrons algorithms online.
  • Physics. Signed distance fields are ideally suited to collision detection and ray casting, as they make it very easy to find out the distance to an edge from any given point. If you’re looking at making a game with very bumpy or modifiable terrain, SDFs are ideal.
  • CSG (constructive solid geometry) is the technique of combining shapes in different ways (primarily ‘add’ and ‘subtract’) to get more complex geometry.
  • Lighting. Both in 3D and 2D, using distance fields to represent lighting volumes can be a very effective way of generating complex dynamic lighting in a scene.
  • Sparse fields. In 3D especially fields can get very memory hungry. However many have used a technique in which fields are stored in a tree, with high resolution data only maintained close to the surface of geometry.
  • Compression. Throughout this blog we’ve stored data in fairly expensive floating point textures. A simple modification is to store field distances in classic 1 byte per channel textures. This is achieved by storing distances as values between 0 and 1, then rescaling them to cover large signed ranges.

Maybe if I get some requests I’ll cover some of those at some point. Until then, enjoy the code here:

https://github.com/chriscummings100/signeddistancefields

Enjoy!

Signed Distance Fields Part 7: Some Simple Effects

Well we’re on number 7 and it’s time to start using the fields for some more interesting rendering. This post will focus exclusively on adding to the signed distance field shader some exciting new modes with which to render the field! All the code can be found on github here.

Soft Borders

The first and arguably one of the most useful tools in the SDF toolkit is the ability to render soft borders to your geometry. Thus far the most advanced aspect of our SDF shader is the ability to render a solid shape with coloured borders and a coloured background. However, zooming in on our cat:

nosoftborders

You can see that even though this is rendered using a 512×512 pixel field, we still get jaggy edges. This is no longer down to a quality issue with the field – we’ve simply hit the resolution of the screen, and, unable to blend, have had to choose for each pixel 1 of 3 colours (background, border or fill). Naturally we’ll see ‘stepping’ occurring along the edges, otherwise known as aliasing.

To address it, we’ll add a new mode to the sdffunc function in SignedDistanceField.shader:


else if (_Mode == 7) //SoftBorder
{
    float d = sdf.r + _Offset;

    if (d < -_BorderWidth)
    {
        //if inside shape by more than _BorderWidth, use pure fill
        res = _Fill;
    }
    else if (d < 0)
    {
        //if inside shape but within range of border, lerp from border to fill colour
        float t = -d / _BorderWidth;
        t = t * t;
        res = lerp(_Border, _Fill, t);
    }
    else if (d < _BorderWidth)
    {
        //if outside shape but within range of border, lerp from border to background colour
        float t = d / _BorderWidth;
        t = t * t;
        res = lerp(_Border, _Background, t);
    }
}

Here the distance is calculated as normal. Just like the ‘solid with border’ mode, we take the fill colour if d < -_BorderWidth, and stick with the background colour if d > _BorderWidth. However, within the border region we:

  • Calculate a value (t) between 0 and 1 based on signed distance (d) from the geometry
  • Multiply t by itself to get a curvy blend instead of a linear blend
  • Use t to lerp from the border colour to either the fill or background depending on whether inside or outside or the geometry

The result is a gentle blend from Background to Border to Fill at the edge of the field:

softborders

And of course, as usual, we can play with the border width and offset variables to make thinner/fatter the border or the whole cat:

softborderfun

Neon (aka bloom)

Our human brains assume something is emitting or reflecting bright light if it appears ‘saturated’, turning from coloured to white at the brightest point. A classic example of this is a neon sign, which appears coloured around the edges but almost-white at the centre:

neon

This fun effect was produced with some very simple code:


else if (_Mode == 8) //Neon
{
    float d = sdf.r + _Offset;

    //only do something if within range of border
    if (d > -_BorderWidth && d < _BorderWidth)          {                  //calculate a value of 't' that goes from 0->1->0
        //around the edge of the geometry
        float t = d / _BorderWidth; //[-1:0:1]
        t = 1 - abs(t);             //[0:1:0]

        //lerp between background and border using t
        res = lerp(_Background, _Border, t);

        //raise t to a high power and add in as white
        //to give bloom effect
        res.rgb += pow(t, _NeonPower)*_NeonBrightness;
    }
}

As with the soft border effect earlier, we utilise the distance value and the border with to get a value (t) called the interpolator. In this case we manipulate it mathematically to be a value that goes from 0 to 1 to 0 around the edge of the geometry. Again, similar to the soft border, we then use ‘t’ to interpolate from background colour to border colour.

The extra ingredient, using 2 new parameters, raises ‘t’ to a high power (make it a ‘sharp curve’), then simply adds it to the output colour. This is in effect adding some whiteness into the output which sharply curves up in brightness close to the edge of the geometry.

Technically speaking this is a simulation of the common ‘bloom’ effect, usually applied as a post effect in games to highlight bright areas of the screen. Whilst good quality bloom can be tricky to achieve on low end devices, if you just want some neon looking text or objects, the SDF approach can be extremely effective with no real GPU overhead.

Edge Textures

Up until now we’ve used some simple maths to blend between various colours at the edge of the field – first to provide simple softened edges, then to get a neon style effect that simulates bloom. However, to get a more flexible (and art driven) effect, we can use ‘edge textures’. Note: these are often referred to as ‘gradient textures’, but ‘gradient’ is an important term with different meanings in SDFs, so I’m reserving it for later!

gradientcat1

Here a similar algorithm to the earlier ones has been used, however the ‘t’ value calculated from distance-to-edge has been used to sample the following texture, created in Paint.Net:

gradient

The code is relatively simple, and very similar to the earlier effects


else if (_Mode == 9) //Edge Texture
{
    float d = sdf.r + _Offset;

    if (d < -_BorderWidth)
    {
        //if inside shape by more than _BorderWidth, use pure fill
        res = _Fill;
    }
    else if (d < _BorderWidth)
    {
        //if inside shape but within range of border, calculate a 
        //'t' from 0 to 1
        float t = d / _BorderWidth; //[-1:0:1]
        t = (t+1)*0.5;             //[0:1]

        //now use 't' as the 'u' coordinate in sampling _EdgeTex
        res = tex2D(_EdgeTex, float2(t,0.5));
    }
}

Just as with the soft border we’re taking the _Fill colour if inside the geometry by more than _BorderWidth, or sticking with the background colour if outside the geometry by more than _BorderWidth. Within the border, we calculate an interpolator (t), and use it to sample the new _EdgeTex texture parameter.

Note that this arguably only needs a ‘1D’ edge texture, but as they don’t really exist, the shader simply samples horizontally along the centre of a 2D texture (i.e. at v = 0.5).

One of the benefits of this effect is that with little effort we can get stylistic colouring that’d be a pain to reproduce mathematically. For example, using this edge texture:

gradient2

We get this rather stylish kitty:

gradientcat2

Drop shadows

This classic effect is incredibly useful for HUDs in games, as it helps deal with the fact that you can’t rely on your in game UI elements always being drawn on top of the same coloured background. Score text in the top left will be over a dark background in a night scene, and a bright background in a day scene. The solution is generally to add a ‘shadow’ to your text or HUD elements, so their outline can be made out whatever the background:

dropshadowcats

The basic effect is often achieved without SDFs by simply rendering the geometry twice, first the shadow at a slight offset, then the fill over the top. Our SDF version takes pretty much the same approach:


else if (_Mode == 10) //Drop shadow (Blog post 7)
{
    //sample distance as normal
    float d = sdf.r + _Offset;

    //take another sample, _ShadowDist texels up/right from the first
    float d2 = tex2D(_MainTex, uv+_ShadowDist*_MainTex_TexelSize.xy).r + _Offset;

    //calculate interpolators (go from 0 to 1 across border)
    float fill_t = 1-saturate((d-_BorderWidth)/_BorderWidth);
    float shadow_t = 1-saturate((d2-_ShadowBorderWidth)/_ShadowBorderWidth);

    //apply the shadow colour, then over the top apply fill colour
    res = lerp(res,_Border,shadow_t);
    res = lerp(res,_Fill,fill_t);                 
}

The sdffunc function is now being passed an extra argument- the uv from which it was read. We use this combined with the new _ShadowDist parameter and the texel size of the signed distance field to take a 2nd sample from the field, offset diagonally from the 1st. Note that unity sets things up such that _MainTex_TexelSize.xy = [1/_MainTex.width,1/_MainTex.height].

Next, some simple maths is applied to each of the distance samples to give an interpolator (t) that starts at 1 ‘inside’ the geometry, and transitions to 0 over the border. These are calculated for the fill as normal, and then for the shadow using the new _ShadowBorderWidth parameter. The purpose of these interpolators is to allow us to create soft transitions, as with the earlier soft border effect.

Finally, the shadow is applied using our _Border parameter for the colour, then the fill on top of it. The basic result with a small border width and identical shadow border width looks like this:

simpledropshadow

Nice! However this basic setup has 2 key issues that none SDF based approaches also suffer from. Firstly, a shadow only shows in one direction, meaning the full geometry wont always be outlined (sometimes but not always desirable). Secondly, too large a shadow distance shows ugly areas that highlight we’re just rendering 1 image on top of another:

dropshadowproblem

Fortunately, with the SDF approach, our ability to adjust the size of the shadow’s border means we can address both these issue at once, fattening the shadow to hide concave areas and give some subtle outline all round the shape:

dropshadownice

Morphing

The final basic effect for this section is the ability to morph between different fields very easily. Here, for example, is a circle morphing into a cat:

morphcats

First up, we’ll add a simple function into the shader that evaluates the field for a centered circle:


float4 centeredcirclesdf(float2 uv, float radius)
{
    //calculate offset from [0.5,0.5] in uv space, then multiply
    //by _MainTex_TexelSize.zw to get offset in texels
    float2 offset_from_centre = (uv - 0.5) * _MainTex_TexelSize.zw;

    //signed distance for a circle is |offset| - radius 
    float signeddist = length(offset_from_centre) - radius;

    //build result: [signeddist,0,0,1]
    float4 res = 0;
    res.x = signeddist;
    res.yz = 0;
    res.w = 1;
    return res;
}

Any field could be used, but for the sake of the demo, this saves us generating the field texture for a circle and passing it through as a parameter (call that an exercise for the reader!).

Next, we’ll create a new function called samplesdf. This will be a wrapper that replaces our existing tex2D calls:


float4 samplesdf(float2 uv)
{
      //sample distance field
    float4 sdf = tex2D(_MainTex, uv);

    //if we want to do the 'circle morph' effect, lerp from the sampled 
    //value to a circle value here
    if(_CircleMorphAmount > 0)
    {
        float4 circlesdf = centeredcirclesdf(uv,_CircleMorphRadius);
        sdf = lerp(sdf, circlesdf, _CircleMorphAmount);
    }
                
    return sdf;
}

The first line simply samples the input distance field at the given uv as normal. If _CircleMorphAmount > 0 we then go on to sample the circle distance field. The 2 distance samples are then combined with a lerp to get a result, which is returned.

Finally, existing calls to tex2D, such as that in the core fragment shader are replaced with calls to samplesdf:

//sample distance field
float4 sdf = samplesdf(i.uv);

All we are doing here is sampling the distance for 2 fields (in this case a cat and a circle) and lerping between them. The result can be fed into any of the existing effects and it just works:

Whilst you may not find many opportunities in games to morph between cats and circles, this simple effect can be a very efficient way of creating elegant UI transitions such as dialog boxes appearing / disappearing. In the past I’ve also made more creative use of it to animate lightning and particle based effects (shameless No Stick Shooter plug)!

Summary

This section has covered a variety of ways to use the distance sampled from a field to achieve simple effects, generally falling into a few categories:

  • Avoiding aliasing with soft borders
  • Creating border effects
  • Creating drop shadows
  • Transitions (aka morphing)

By combining and tweaking these in different ways, simple low resolution distance fields can create a huge range of effects, all of which can of course be animated and used either in game (if the style is right) or for very flexible UI elements.

However, an issue thus far is that all the effects above have been pretty ‘1 dimensional’. All they’ve really used is the distance, and as such the type of effects is fairly limited. In the next blog I’ll get on to how field gradients and noise can make things more interesting.

Signed Distance Fields Part 8: Gradients, bevels and noise

Signed Distance Fields Part 6: Images

If you haven’t found this series incredibly exciting then shame on you, but you might find this a little more intriguing! Up until now we’ve dealt with some pretty simple utilities to draw circles, lines and rectangles. These were great for showing the core concepts of fields, but aren’t particularly useful when actually making cool stuff. So, by the time we reach post number 6, it’s finally time to take ‘cool stuff step 1’, and generate fields from textures you can make in any standard art package. Excited now? I thought so. Code here if you need it!

Basic texture sweeping

With everything we’ve built so far, creating a field from a texture is incredibly simple. We start with a monochrome image – each pixel is either black or white. White represents a ‘solid’ bit, black an ’empty’ bit:

rectangles

I’ve added a default constructor, so we can create an empty SignedDistanceFieldGenerator, then set it up with an extra function call. Our new function will be called LoadFromTexture:


public void LoadFromTexture(Texture2D texture)
{
    Color[] texpixels = texture.GetPixels();
    m_x_dims = texture.width;
    m_y_dims = texture.height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        if (texpixels[i].r > 0.5f)
            m_pixels[i].distance = -99999f;
        else
            m_pixels[i].distance = 99999f;
    }
}

Here we read a unity texture that is assumed to have been loaded elsewhere. Note that to work, the texture must be marked as ‘read/write’ when importing, and for best results, uncompressed. The pixels are read, and the dimensions/pixel buffers are setup to match the texture:


    Color[] texpixels = texture.GetPixels();
    m_x_dims = texture.width;
    m_y_dims = texture.height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];

Next, we iterate over every pixel. If the input colour is greater than 0.5 (i.e. close to white), we treat it as solid. If it is less than 0.5, we treat it as empty. The solid pixel is interpreted as ‘internal’ geometry, and so is given a very large negative number. The empty pixel is interpreted as ‘external’ geometry, and so is given a very large positive number:


    for (int i = 0; i < m_pixels.Length; i++)
    {
        if (texpixels[i].r > 0.5f)
            m_pixels[i].distance = -99999f;
        else
            m_pixels[i].distance = 99999f;
    }

A simple extra button in SignedDistanceField.cs to load the ‘rectangle’ texture in the sample project completes this first step:


if (GUILayout.Button("Load texture"))
{
    SignedDistanceFieldGenerator generator = new SignedDistanceFieldGenerator();
    generator.LoadFromTexture(Resources.Load("rectangles"));
    field.m_texture = generator.End();
}

Visualising the distances for the field, we now see bright red (+ve) external pixels, or bright green (-ve) internal pixels:

rectangleunswept

Crazily, that’s the hard part! By writing in 1 of these 2 ‘extreme’ values into our field, we’ve generated an extremely imprecise field. However, the ‘0 boundary’ that denotes the edge of a solid bit is still correct. When rendering, the shader will blend between pixels to calculate a distance at any given point. Thus when it blends between a solid pixel (-99999) and an empty pixel (99999), there will be a tiny point right on the boundary where a distance of 0 is read. By doing some tinkering with the numbers in our signed distance shader (technically rendering in ‘border’ mode with a border size of 99998!), we can actually visualise this:

rectangleedges

The great thing is, half of the previous post was devoted to a specific task – taking a signed distance field for which only the edge points are valid, and sweeping it to get a completely valid one. All we have to do is run the sweep, with no changes whatsoever, and we convert the image to a field:

Once its a field, any standard signed distance effects such as our ‘solid-with-borders’ shader can be used:

rectangleborders

And there you have it. The truth is, we’d already done most of the work for textures, so this first step was really just loading them up.

Aliasing

Unfortunately, whenever images are involved, eventually the issue of aliasing pops up. I chose rectangles quite specifically for the above demo – they’re all nice vertical or horizontal lines that fit perfectly into a grid of pixels. If, however we take the following image of a line:

aliaslines

You can see how MS Paint has generated zig-zaggy shapes along the edges. The effect in our signed distance field is not pleasant:

aliaslinesfield

Note: if you spotted it, apologies for images inverted horizontally – shader bug whilst taking screen shots!

One way of solving this would be to simply use giant textures (lets say 4k by 4k), do the whole sweeping process, then scale down (aka downsample) the result. SDFs actually scale down very well, so this isn’t a crazy idea. However hi res input data isn’t always available, and even when it is, burning CPU processing it may not be desirable.

Many more advanced art packages will hide this problem very effectively using anti aliasing. Pixels that are only partially on the line will be only partially coloured in, making them look a lot more smooth. This very similar image from Paint.Net looks much nicer:

aliaslines2

Sadly though, it seems to make no difference to our distance field:

aliaslines2field

This is because our current image loading algorithm completely ignores the ‘solidity’ of the pixel. As far as it’s concerned, a pixel is either solid or empty. As a result, the anti-aliasing performed by more advanced drawing packages to make nicer images hasn’t helped.

Using a low resolution version of this cat (thanks to Tricia Moore) we can see just how much we lose from ignoring this anti aliasing:

aliascat2

To show off the problem, the original 512×512 cat was shrunk down to 128×128 pixels in Paint.Net, which cleverly used anti aliasing to get a slightly blurry but otherwise nice image. On the left some detail has been lost but the a soft edge maintains the illusion of curved edges. Once again though, ignoring this anti aliasing has resulted in a pretty much useless SDF on the right.

To solve the problem, lets first look at an image pixel by pixel and think about what the field values really should be:

antialiasedrectangle
Left: 3×2 rectangle shape user has attempted to draw, Right: the result when written into the 5×5 grid of pixels

In this diagram we imagine a user has opened up a package such as Paint.Net and managed to draw a rectangle exactly 3 pixels wide and 2 pixels high into a 5×5 texture. However, whilst they aligned it perfectly horizontally, the rectangle overlaps pixel borders vertically. The result, shown on the right, is Paint.Net’s best guess at representing the rectangle in pixel format. The pixels that were fully covered are fully filled, but the pixels that were half covered have been blended in with the background.

Now we ask the question, given the texture that Paint.Net generated, what should the corresponding signed distance field texture look like?

antialiasedrectanglefield
Left: the same 3×2 rectangle with signed distances overlaid, Right: the texture with the same signed distances overlaid

Here you can see the box (left) and the texture (right) with the desired signed distance values written in. The first clear thing that stands out is that a ‘solidity’ of 0.5 in the input image suggests a signed distance value of 0. As expected, we can also see the more ‘solid’ pixels are assigned a negative distance (aka inside the shape), and the less solid pixels are assigned a positive distance (aka outside the shape).

Unfortunately, in addition to this handy info, we can also see there is no ‘clear’ answer to what solidity corresponds to what distance. If we were to examine the vertical edges on the left/right sides, we’d assume that a solidity of 0 meant a distance of 0.5. However If we were to examine the horizontal edges at the top/bottom, we’d assume it meant a distance of 1.

Techniques to addressing this have certainly been developed (check out this for example), though they are not entirely trivial and beyond the scope of this post. For now we’ll take the relatively good results that can be obtained simply by compromising and assuming edge distances ranging from -0.75 to 0.75. This leads us to:


public void LoadFromTextureAntiAliased(Texture2D texture)
{
    Color[] texpixels = texture.GetPixels();
    m_x_dims = texture.width;
    m_y_dims = texture.height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        //r==1 means solid pixel, and r==0 means empty pixel and r==0.5 means half way between the 2
        //interpolate between 'a bit outside' and 'a bit inside' to get approximate distance
        float d = texpixels[i].r;
        m_pixels[i].distance = Mathf.Lerp(0.75f, -0.75f, d);
    }
}

This extremely simple version of the texture loader just reads a pixel as in our previous example, then uses it to lerp between distance values of 0.75 (outside) and -0.75 (inside). Testing it out:

antialiascat
Left: cat using anti aliased approach, Right: cat using the old approach

Not only do we now have a softer border, but it is also slightly thicker due to more accurate approximation of the edge pixels. Similarly, looking at the earlier aliased lines:

antialiaslines

Yummy! Whilst the result is still clearly not perfect, it is substantially better – especially given the source texture is only 128×128 pixels.

For reference, here’s 3 more cats (though I’m actually more of a dog person myself) showing the cat generated from different source data with the border width adjusted for comparison.

threecats
Left 128×128 source, Middle 256×256, Right 512×512 (orig data)

Downsampling

This is turning into a long post, but I want to cover some fun effects soon, and they won’t look cool unless we nail the quality of our fields first! Interpreting anti aliased textures from packages such as Paint.Net / Photoshop has improved the conversion from image to field, but we still get some artefacts along the edges that it’d be nice to clean up:

lumpyfield

These come from the fact that our simple approach to extracting the field from images still struggles to get something perfect. Even a cleverer one would have trouble attaining really high quality, as the simple fact is data was lost when the paint package had to convert some nice clean geometry into a blurry low res image.

To fix these artefacts, we’ll utilise down-sampling, in which a higher resolution image is loaded/swept, then scaled down to the desired field size.

This function builds a new field by scaling down the existing one by 50%:


public void Downsample()
{
    //to keep life simple, only downsample images that can be halfed in size!
    if ((m_x_dims % 2) != 0 || (m_y_dims % 2) != 0)
        throw new Exception("Dumb downsample only divides by 2 right now!");

    //calculate new field size, and allocate new buffer
    int new_x_dims = m_x_dims / 2;
    int new_y_dims = m_y_dims / 2;
    Pixel[] new_pixels = new Pixel[new_x_dims * new_y_dims];

    //iterate over all NEW pixels
    for (int y = 0; y < new_y_dims; y++) 
    {
        int srcy = y * 2;
        for (int x = 0; x < new_x_dims; x++) 
        {
            int srcx = x * 2;

            //combine the 4 pixels in the existing field that this one corresponds to
            float new_dist = 0;
            new_dist += GetPixel(srcx,srcy).distance * 0.25f;
            new_dist += GetPixel(srcx+1, srcy).distance * 0.25f;
            new_dist += GetPixel(srcx, srcy+1).distance * 0.25f;
            new_dist += GetPixel(srcx+1, srcy+1).distance * 0.25f;

            //also divide distance by 2, as we're shrinking the image by 2, and distances
            //are measured in pixels!
            new_dist /= 2;

            //store new pixel
            new_pixels[y * new_x_dims + x].distance = new_dist;
        }
    }

    //once done, overwrite existing pixel buffer with new one and store new dimensions
    m_pixels = new_pixels;
    m_x_dims = new_x_dims;
    m_y_dims = new_y_dims;
}

Here we allocate a new buffer, and calculate the new dimensions for a field that is exactly half the size of the existing one:


//calculate new field size, and allocate new buffer
int new_x_dims = m_x_dims / 2;
int new_y_dims = m_y_dims / 2;
Pixel[] new_pixels = new Pixel[new_x_dims * new_y_dims];

Next, we loop over all the new pixels, and for each one calculate the location of the corresponding pixel in the existing image. With this we end up with:

  • x,y: The coordinate of the new pixel in the new field
  • srcx,srcy: The coordinate of the existing pixel in the existing field

Now the key code in the loop:


//combine the 4 pixels in the existing field that this one corresponds to
float new_dist = 0;
new_dist += GetPixel(srcx,srcy).distance * 0.25f;
new_dist += GetPixel(srcx+1, srcy).distance * 0.25f;
new_dist += GetPixel(srcx, srcy+1).distance * 0.25f;
new_dist += GetPixel(srcx+1, srcy+1).distance * 0.25f;

//also divide distance by 2, as we're shrinking the image by 2, and distances
//are measured in pixels!
new_dist /= 2;

//store new pixel
new_pixels[y * new_x_dims + x].distance = new_dist;

This reads 4 distances from in the existing field in a square and combines them to create 1 new distance to be stored in the new field. The final bit divides the new distance by 2, as we are dividing the size of the field by 2.

By loading a higher resolution image than necessary, sweeping it as normal and then down-sampling it to the desired field resolution we get a much nicer result:

smoothfield

This field was still built from a relatively low res 256×256 image. However, after down-sampling to a 128×128 field the result is much more pleasing.

Summary

This post focused on building high quality fields from images, as I want to get onto some fun effects soon but fun effects need good fields! The images in this blog are typically around 512×512 pixels, so here’s our cat image loaded and swept at 1024×1024, then downsampled to 512×512:

hirescat

Pretty tasty! The one final step we could go into is the use of eikonal equations to normalize the field, but I’ll leave that for another post.

There’s lots more boring stuff to learn about compression, normalizing, more sweeping, csg operations etc etc, but now that the foundation exists, it’s time for some cool s**t. Hence, next post, we’ll look at some funky 2D effects!

And for the 6th time, the code for this blog can be found on git hub here!

Signed Distance Fields Part 7: Some Simple Effects