Extending Boot Strap In Type Script

Intro

Hi! Over the past year or 2 whilst building multiverseonline.io (the web site for our app, Multiverse) I’ve had to add React to the list-of-stuff-I-kind-of-know. The site is driven by React, using React-Bootstrap on top, and all written in TypeScript. Unfortunately however, in my quest for knowledge on the subject, I’ve often found stack overflow and other sources can be summed up with the phrase:,

“there’s a million opinions on your question, but unless you know the right answer to your question, you can’t tell which is the right opinion

My issue in this post is a classic example of said questions. There’s plenty of answers, but identifying the correct one is none-trivial!

My Bootstrap Problem

React-Bootstrap has all sorts of handy stuff for creating nice React components, such as the Button

<Button variant='primary'>
    Some text
</Button>

Giving us results like this (from the Bootstrap documentation):

However, recently I wanted to create a nice reusable version of the light button that had an info icon inside it:

This sort of thing is exactly what React is designed to do, so with a pretty icon (from react-icons) and a tiny bit of code:

<Button variant="light" className="mv-circular-button">
    <BsInfo/>
</Button>

Along with some simple CSS:

.mv-circular-button {
    position: relative;
    height: 40px;
    width: 40px;
    padding: 0;
    border-radius: 50%;
    border: none;
    box-shadow: none !important;
}

.mv-circular-button svg {
    width: 75%;
    height: 75%
}

We have a simple button consisting of the following:

  • The Bootstrap Button with the light variant creates a nicely styled button element.
  • The BsInfo component comes from react-icons and is an SVG with that pretty ‘i’ inside it.
  • The button CSS makes it a fixed width/height, sets the border to 50% so it becomes circular, and removes the annoying shadow effect Bootstrap adds
  • The SVG CSS just enforces a size for the icon

All good so far, but react is all about encapsulation and reuse of components, and I want to use that same info button in multiple places.

A simple functional component

Well the obvious (and I’m told the 2021 way of doing things) is to create a simple functional component like this one:

export const MyInfoButton = () => (
    <Button variant="light" className="mv-circular-button">
        <BsInfo/>
    </Button>
)

It’s in a separate TSX file, and is called MyInfoButton. Now within my page I simply have to write:

<MyInfoButton/>

And we have a button! Of course a button isn’t much use unless it does something when you click it, so let’s add an onClick event…

<MyInfoButton
    onClick={() => console.log("hello")}
/>

As anybody with a bit of TypeScript/React experience will point out, this immediately fails to compile. We haven’t defined any properties at all for the info button yet, so the onClick property is unrecognised. To make our original component work, we need to define and pass in some properties:

type MyInfoButtonProperties = {
    onClick?: ()=>void
}
export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button onClick={props.onClick} variant="light" className="mv-circular-button" >
        <BsInfo/>
    </Button>
)

The MyInfoButton component now takes a set of properties defined by MyInfoButtonProperties. The only property for now is onClick, which is passed directly through to the Button component.

More properties

Now what if I want to start expanding on this? Maybe I want to be able to add a mouse move event and specify whether the button is disabled. Each of these are properties Button already supports, so I could just add them to MyInfoButtonProperties and pass them through as before:

type MyInfoButtonProperties = {
    onClick?: ()=>void;
    disabled?: boolean;
    onMouseMove?: ()=>void;
}
export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button 
        onClick={props.onClick} 
        onMouseMove={props.onMouseMove} 
        disabled={props.disabled} 
        variant="light" 
        className="mv-circular-button" >
        <BsInfo/>
    </Button>
)

But we’re just duplicating properties of the internal button here, which feels a little messy. To clean things up a little, we can use the ‘…’ syntax to help out:

type MyInfoButtonProperties = {
    onClick?: ()=>void;
    disabled?: boolean;
    onMouseMove?: ()=>void;
}
export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button {...props} 
        variant="light" 
        className="mv-circular-button" >
        <BsInfo/>
    </Button>
)

This simply passes anything in props through to the internal Button component. Much cleaner!

What about ALL the properties?

The model we’ve taken thus far is to work out what functionality from the internal Button component we want to expose, and provide properties to do exactly that – no more, and no less. This very constrained model makes perfect sense a lot of the time. Tightly constrained interfaces like this are a great way to keep your code understandable and reduce bugs. However, despite many an angry stack overflow comment, there are exceptions, and this is not ‘always the right thing to do’ (I hate that phrase)!

In this case I’m really just trying to add a small bit of functionality to the existing very powerful button. I specifically want a user of MyInfoButton to have immediate access to the 100s of properties any normal button has.

For a normal html button element (as apposed to a Bootstrap one), we can use the types supplied by React:

type MySimpleInfoButtonProperties = {
    //any custom bits
} & ButtonHTMLAttributes<HTMLButtonElement>
export const MySimpleInfoButton:FunctionComponent<MySimpleInfoButtonProperties> = (props) => (
    <button {...props} 
        className="mv-circular-button" >
        <BsInfo/>
    </button>
)

Here ButtonHTMLAttributes contains the list of all properties a standard HTML button can have. The generic HTMLButtonElement type argument is used in the definition of various event properties such as onClick.

Unfortunately for us, Bootstrap is not this simple!

Bootstrap defines its button like this:

import * as React from 'react';

import { BsPrefixComponent } from './helpers';

export interface ButtonProps {
  active?: boolean;
  block?: boolean;
  variant?:
    | 'primary'
    | 'secondary'
    | 'success'
    | 'danger'
    | 'warning'
    | 'info'
    | 'dark'
    | 'light'
    | 'link'
    | 'outline-primary'
    | 'outline-secondary'
    | 'outline-success'
    | 'outline-danger'
    | 'outline-warning'
    | 'outline-info'
    | 'outline-dark'
    | 'outline-light';
  size?: 'sm' | 'lg';
  type?: 'button' | 'reset' | 'submit';
  href?: string;
  disabled?: boolean;
}

declare class Button<
  As extends React.ElementType = 'button'
> extends BsPrefixComponent<As, ButtonProps> {}

export default Button;

Based on the above, the obvious starting point is a new set of properties as follows:

type MyInfoButtonProperties = {

    //any custom bits
} & ButtonProps & ButtonHTMLAttributes<HTMLButtonElement>

export const MyInfoButton = (props: MyInfoButtonProperties) => (
    <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
)

Here I’ve extended the property type so it can include:

  • Any extra properties I might want to add later as my info button gets fancier
  • The core HTML attributes that any normal button needs, as per the previous pure react example
  • The bootstrap ButtonProps which include things like variant, size, disabled etc

I’ve also tweaked the 2 additional properties I pass to the internal button so that:

  • The default variant is light
  • The class name starts with “mv-circular-button” but can include additional classes passed in

So far we’re all good. My button still works!

The ‘as’ property

All good so far, but we’ve lost something important. Bootstrap supports a property named “as” on all its types, that allows you to override the HTML element used for the button. This functionality varies from pointless to useful to essential with Bootstrap, and when trying to create a generic, reusable info button, it’s not something I want to design out.

The problem is that our properties currently use ButtonHTMLAttributes<HTMLButtonElement>. This isn’t going to work for anything other than a button element!

Starting from the Bootstrap button, we can see it defines a generic type variable called“As”:

declare class Button<
  As extends React.ElementType = 'button'
> extends BsPrefixComponent<As, ButtonProps> {}

What we need to do is build the exact set of properties a Bootstrap Button needs, taking into account the “As” variable, so let’s follow things through. Button simply extends something called BsPrefixComponent<As,ButtonProps>. In the Bootstrap helpers file we find:

export class BsPrefixComponent<
  As extends React.ElementType,
  P = {}
> extends React.Component<ReplaceProps<As, BsPrefixProps<As> & P>> {}

BsPrefixComponent is a React class component (not a functional component like our info button), and it takes the rather confusing set of properties:

ReplaceProps<As, BsPrefixProps<As> & P>

In the context of our button, which defaults “As” to be ‘button’, and passes through ButtonProps as P, this resolves to:

ReplaceProps<'button', BsPrefixProps<'button'> & ButtonProps>

BsPrefixProps isn’t that interesting:

export interface BsPrefixProps<As extends React.ElementType> {
  as?: As;
  bsPrefix?: string;
}

It defines the “as” property and something called bsPrefix, so we’re creating a component that uses these 2 properties plus any in ButtonProps. But where do the 100s of other useful HTML ones come from? Well let’s look at the helpfully named ReplaceProps:

export type Omit<T, U> = Pick<T, Exclude<keyof T, keyof U>>;

export type ReplaceProps<Inner extends React.ElementType, P> = Omit<
  React.ComponentPropsWithRef<Inner>,
  P
> &
  P;

Note: ES5 for TypeScript defines its own version of Omit, which is different to the bootstrap version!

The “Inner” type variable is passed in from the “As” type variable which in our case defaults to ‘button’. Omit removes all the fields in one type that exist in another. In this case we’re:

  • Taking all the fields of the defined by React.ComponentPropsWithRef<'button'>
  • Removing any that are defined in P, which is currently BsPrefixProps<As,ButtonProps>
  • Then adding back all those that were removed!

So why do this? Well the important detail is that the removal doesn’t care about the types of fields – it simply deals with their names. Here Bootstrap is stripping out of the React type any fields defined in P, and replacing them with the versions from P.

For a simpler example of this type stripping, take the following code:

type TestType1 = {
    name: string;
    age: string;
}

type TestType2 = {
    age: number;
    address: string;
}

type CombinedType = TestType1 & TestType2;

let x: CombinedType = {
    name: "hello",
    age: 10,
    address: "somewhere"
};

Here we create 2 types, both of which define an ‘age’ field, but the first defines it as a string, and the second defines it as number. Because they don’t agree, CombinedType will end up with:

type CombinedType = {
    name: string;
    age: never;
    address: string;
}

The ‘age’ field has been assigned the type ‘never’, as it can never be both a number and a string. The result is that the final line in which we set age to 10 will result in the following TypeScript error:

Type ‘number’ is not assignable to type ‘never’.

The bootstrap technique uses Omit as follows:

type TestType1 = {
    name: string;
    age: string;
}

type TestType2 = {
    age: number;
    address: string;
}

/*
type TestType1WithoutAgeField = {
    name: string
}
*/
type TestType1WithoutAgeField = Omit<TestType1,TestType2>

/*
type CombinedType = {
    name: string;
    age: number;
    address: string;
}
*/
type CombinedType = TestType1WithoutAgeField & TestType2;

Long story short, Bootstrap’s type magic is giving us a new type with:

  • All those from P
  • 2 fields called ‘as’ and ‘bsPrefix’ from BsPrefixProps
  • Any remaining fields that the normal HTML element would have

In the case of Button this is:

  • The fields from ButtonProps
  • ‘as’ and ‘bsPrefix’ from BsPrefixProps
  • By default, any additional fields a button HTML element has

You might wonder how the text ‘button’ gets converted to “all the fields a button element would have”. If you follow through the react code for React.ComponentPropsWithRef what you’ll find is that ‘button’ is used as a key of the JSX.IntrinsicElements interface, defined as follows:

        interface IntrinsicElements {
            // ... lots of fields ...
            button: React.DetailedHTMLProps<React.ButtonHTMLAttributes<HTMLButtonElement>, HTMLButtonElement>;
            // ... lots more fields ...

        }
    }

With all this work, we can say that Button is a class component that takes the properties:

ReplaceProps<As, BsPrefixProps<As> & ButtonProps>

Where ‘As’ defaults to ‘button’.

From there, it’s a short step to conclude that the properties for the extended info button should be as follows:

type NewButtonProps = {
    //any new properties we want to add go here
} & ButtonProps

type MyInfoButtonProperties<As extends React.ElementType = 'button'> =
    ReplaceProps<As, BsPrefixProps<As> & NewButtonProps>

Creating the class component

So we’ve setup a new simple type that contains any new bits we want to add, plus the basic properties defined for the Bootstrap button. Then used the ReplaceProps type magic to add in any extras from the core HTML element. As with the rest of Bootstrap, the new MyInfoButtonProperties takes an ‘As’ type argument that can be used to override the element it represents in the DOM.

Before diving back into function world, let’s copy the Bootstrap pattern of a fully fledged class component

export class MyInfoButton<As extends React.ElementType = 'button'> 
    extends React.Component<MyInfoButtonProperties<As>> 
{
    render = () => {
        return (
            <Button {...this.props} 
                variant={this.props.variant || "light"}
                className={"mv-circular-button " + (this.props.className || "")}>
                <BsInfo/>
            </Button>
        )
    }
}

Armed with this new component, we can now go back to the original page and do something along the lines of…

<MyInfoButton
    as='a'
    target="bla"
    onClick={() => console.log("hello")}
/>

Note how the ‘as’ parameter is now available to us. Looking at it in the browser, you can see React is inserting an anchor (‘a’) element into the DOM instead of a button element:

Back to a functional component

Wonderful! We could stop there, but all the cool React kids say functional components are where it’s at and I’m not one to ignore peer pressure. The syntax for an arrow function definition is a little ugly but not too complex:

export const MyInfoButton = <As extends React.ElementType = 'button'>(props: MyInfoButtonProperties<As>) => (
    <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
)

Or as a standard javascript function (which I personally find a little easier on the eyes):

export function MyInfoButton<As extends React.ElementType = 'button'>(props: MyInfoButtonProperties<As>) {
    return <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
}

Putting it all together (with a handy little type for wrapping the bootstrap bits together):

//imports
import React from "react";
import { Button, ButtonProps } from "react-bootstrap";
import { BsPrefixProps, ReplaceProps } from "react-bootstrap/helpers"
import { BsInfo } from 'react-icons/bs'
import './circular-button.css'

//handy type defintion to wrap up the replace+bsprefix bits of bootstrap
type BootstrapComponentProps<As extends React.ElementType, P> = ReplaceProps<As, BsPrefixProps<As> & P>

//our extended button properties
type NewButtonProps = {
    //any new properties we want to add go here
} & ButtonProps

//boot-strap-ified full button properties with all the bells and whistles
type MyInfoButtonProperties<As extends React.ElementType = 'button'> =
    BootstrapComponentProps<As, NewButtonProps>

//our button!
export function MyInfoButton<As extends React.ElementType = 'button'>(props: MyInfoButtonProperties<As>) {
    return <Button {...props} 
        variant={props.variant || "light"}
        className={"mv-circular-button " + (props.className || "")}>
        <BsInfo/>
    </Button>
}

Summary

An interesting little problem, and maybe a convincing answer to it, should anybody have the same question as me!

As I mentioned earlier, this kind of expose it all technique shouldn’t necessarily be avoided, but it should be approached with some thought. By blindly exposing everything in the new button type we may well have exposed some functionality that doesn’t quite work anymore due to our adjustments. This could clearly cause some confusion, but on the plus side we’ve ended up with something extremely flexible without lots of code to maintain.

Anyhoo, that’s it for my first React/TypeScript post. Stay tuned for more posts on this or other random things!

Understanding Determinism Part 1: Intro and Floating Points

Fairly recently I started researching the use of deterministic networking in a Unity game. With this technique, all users simply send each other the buttons they press on the controllers, and by running exactly the same simulation, experience exactly the same gameplay. This approach to networking is less common than the standard ‘keep the stuff that matters roughly in sync’ approach, but is very useful (and indeed necessary) if you have a game that relies on AI or physics doing exactly the same thing for everybody.

After a bit of research, I was surprised to discover that the internet is littered with inaccurate and often entirely incorrect information around determinism. Whilst I’ve not had to solve these problems with Unity before, I do have a lot of experience in this area. Little Big Planet had deterministic networking, and on No Man’s Sky I helped ensure the procedural generation was both deterministic and consistent between PC, PS4 and even the PS4 GPU.

In this series I’ll try to clear up some of the confusions and misnomers, starting with a quick review of determinism and how to test it. Then we’ll cover an area that quite unfairly but most often gets the blame – floating point numbers. Later, I intend to cover some of the other big and more common culprits, and finish off with some digging into Unity specific problems.

What Actually Is Determinism

A deterministic system can be defined by a very simple statement:

If a system is run multiple times with the same set of inputs, it should always generate the same set of outputs.

In computer terms, this means that if we provide a program with some given input data, we should be able to run that code as many times as we like, and always expect the same results. And by the same, I mean exactly the same – the same bytes, with the same bits, in the same order. In technical terms, bitwise identical.

A couple of examples:

  • If I write a function that adds the numbers 2 and 3, it should always output 5 no matter how many times I call it.
  • If I write an entire physics engine and provide it with 10000 rigid bodies all starting with a given position/orientation, and run it for 2 years, it should always result in the same set of new positions/orientations no matter how many times I run it.

Computers are in fact inherently deterministic – that’s simply how processors work. If the add instruction on an i9 Processor could take 2 values and sometimes output slightly different results, it would be defined as a broken PC, and you’d probably be needing to replace it soon!

So where does indeterminism come from, and why does it seem so prevalent when computers are naturally deterministic. Well, the reality is defining the inputs of a system is not always as simple as it sounds.

Testing Determinism

Before investigating it, we should come up with a good solid way of testing the determinism of our code. Fortunately, when I joined Media Molecule, the very clever team had already solved this problem, so here’s how…

public static class SyncTest
{
    static StreamWriter m_stream;

    //begin just opens a file to write to
    public static void Begin(string filename)
    {
        m_stream = new StreamWriter(filename);
    }

    //simple function that just writes a line with some text to the stream
    public static void Write(string description)
    {
        m_stream.Write(string.Format("{0}\n", description));
    }

    //writes a line with a description, an integer value, and the exact bytes
    //that make up the integer value, all padded nicely with spaces.
    public static void Write(string description, int value)
    {
        byte[] bytes = BitConverter.GetBytes(value);
        string hex = BitConverter.ToString(bytes).Replace("-", string.Empty);
        string txt = string.Format("{0,-30} {1,-15} 0x{2}\n", 
            description, value, hex);
        m_stream.Write(txt);
    }

    //closes the file
    public static void End()
    {
        m_stream.Close();
    }
}

This very simple class just opens up a file in Begin, then provides a few utilities to write text to that file. Specifically, we’ve added a function that writes an human readable integer value along with a description and the exact hexadecimal value of its bytes.

Now let’s write a very simple program that uses it…

    private void Start()
    {
        SyncTest.Begin("sync.txt");
        AddNumbers(2, 3);
        SyncTest.End();
    }

    int AddNumbers(int x, int y)
    {
        SyncTest.Write("AddNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        int res = x + y;
        SyncTest.Write("res", res);
        return res;
    }

All this does is add a couple of numbers together. However along the way we’re telling our SyncTest tool to dump out the inputs and outputs. The resulting lines of text are as follows:

AddNumbers
x                              2               0x02000000
y                              3               0x03000000
res                            5               0x05000000

So why do this? Well the beauty is, we can now make some more modifications to our program to run the same algorithm multiple times, outputting to 2 different files:

    private void Start()
    {
        SyncTest.Begin("sync1.txt");
        RunTests();
        SyncTest.End();
        SyncTest.Begin("sync2.txt");
        RunTests();
        SyncTest.End();
    }

    void RunTests()
    {
        AddNumbers(2, 3);
    }

    int AddNumbers(int x, int y)
    {
        SyncTest.Write("AddNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        int res = x + y;
        SyncTest.Write("res", res);
        return res;
    }

Having written the data out, we can use a simple text-comparison program to check for any differences. I use Beyond Compare for this purpose:

No differences = no indeterminism! Yay

Let’s mess with our test and introduce some indeterminsm:

        AddNumbers(2, Random.Range(0,10));

Now each run we’re going to pass a new random number in! Clearly the inputs now differ, and thus, so will our outputs. Viewing this in our logs:

Not only does our logging show the issues, but it tells where the first difference occurred (in this case the value of y was 1 in the 1st test, and 4 in the 2nd).

For this blog we’ll only be using a very simple sync tester like the one above, however with work they can be expanded to record more detailed info like stack traces, or even run as a replay in a debugger allowing you to analyse determinism problems with the full support of your programming tools!

So, armed with our determinism tester, let’s do some digging…

Floating Point Innacuracy

If you do a quick search, the first thing that gets the blame for indeterminism is floating point numbers, despite the fact that unless you’re doing cross platform stuff, they’re rarely guilty! That said, floating point precision comes up a lot, so let’s a look at a couple of examples:

  • Floating point numbers can represent all integer values within a certain range precisely, however they can not represent all fractional numbers precisely.
  • Some floating point operations, such as addition, will lose precision when adding together values of significantly different magnitudes

The first is pretty simple. The values 0.5, 0.25, 0.75, 0.875 can all be represented precisely by floating point numbers. This is because they can be expressed precisely as the sum of fractional powers of 2:

  • 0.5 = 1/2
  • 0.25 = 1/4
  • 0.75 = 1/2+1/4
  • 0.875 = 1/2+1/4+1/8

Other numbers however would require an infinite number of fractions of 2 to be represented this way, so can only be approximated as floating point values.

The latter point occurs because floating point numbers can only store so many (binary) digits. You could store both 1000000000000000 and 1 using a floating point number. However because they are of such drastically different size, on a 32 bit floating point unit you would find:

1000000000000000+1 = 1000000000000000

So how can a computer be both imprecise and deterministic? It’s quite simple really – a floating point unit is deterministically imprecise! It will lose precision, but it will always lose the same precision. Let’s test this by introducing some floating point numbers to our program:

    float AddNumbers(float x, float y)
    {
        SyncTest.Write("AddNumbers (float)");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        float res = x + y;
        SyncTest.Write("res", res);
        return res;
    }

And the test…

    void RunTests()
    {
        AddNumbers(2, 5);
        AddNumbers(1f, 1000000000000f);
        AddNumbers(2f/3f, 0.5f);
    }

And our results…

As expected, the first floating point test has added 1 to 1e12 (1000000000000), and just ended up with 1e12 again. Similarly, attempting to add 2/3 (which ends up represented inaccurately as 0.666..67 ends up with 1.666..7. However in both these cases, the errors occur in both runs, and the floating point representations of all inputs and outputs match.

The upshot of the above is that if you’re just working with one application on one platform, the odds are floating point won’t be your problem, because any imprecision is deterministic. However, when going for cross platform determinism you can hit some real issues, as described in the following few sections.

Floating Point Hardware Behaviour

Despite the inherent determinism of a floating point unit, it still has the potential to cause problems, specifically when attempting to create an application that performs identically on different bits of hardware. For example, on No Man’s Sky I attempted (and smugly eventually succeeded!) to get PS4 GPU compute shaders to deterministically run the team’s awesome planet making algorithms, and generate exactly the same results as a PC CPU. In determinism terms, you could say that if different hardware behaves differently, then the hardware becomes an input to your algorithm and thus affects the output.

Floating point numbers have to deal with some odd scenarios such as infinities, different types of zeros, what happens when rounding things etc etc. There’s a variety of these areas in which the floating point unit has to make some form of approximation or edge case decision.

These days, IEEE standards exist that say exactly how every floating point unit on every platform should deal with every scenario. However, not all platforms follow these guidelines to the letter, and others provide low level hardware options to disable the perfect behaviour in exchange for slightly better performance.

I’ll confess to not knowing exactly how prevalent this is these days. I think it is less of an issue with modern hardware, and have myself worked with PCs (AMD+Intel), NVidia GPUs, Macs, PS4 (CPU and GPU) and found the actual hardware rarely to be the cause of indeterminism. Really the only way to be certain if you’re going cross platform though is to test it, ideally with some fairly low level C code (with FastMath disabled – see below) so you know there’s no chance of any other factors interfering.

Floating Point FastMath

A much more prevalent cause of indeterminism comes not from hardware but from compilers. Specifically optimising compilers that use a technique often referred to as FastMath. When this compiler option is turned on (which in some cases it is by default!), it allows the compiler to assume a floating point unit obeys algebraic laws perfectly, and thus mess with your code if it thinks it can express your math in a more efficient way.

Take this code for example:

    float SumNumbers(float x, float y, float z, float w)
    {
        float res = x;
        res += y;
        res += z;
        res += w;
        return res;
    }

Simple enough! Add 4 numbers together. However the optimizers amongst you may have noticed this is a particularly inefficient way of doing things as it contains 4 dependent operations. A faster version would be this:

    float SumNumbers(float x, float y, float z, float w)
    {
        float xy = x + y;
        float zw = z + w;
        return xy+zw;
    }

Mathematically this function does the same job, however it executes 2 independent operations then combines the results. Why this is faster doesn’t really matter and is beyond the scope of this blog. The key is, the 2 functions do the same thing, so should produce the same result.

The FastMath option tells the compiler that if it finds your SumNumbers function written the first way, it is allowed to re-write it the 2nd way, on the basis that mathematically they should return the same result.

To simulate this, I’ll extend the program as follows:

    void RunTests(int test_idx)
    {
        float x = 0.3128364f;
        float y = 0.12412124f;
        float z = 0.8456272f;
        float w = 0.82154819f;
        if (test_idx == 0)
        {
            SumNumbers(x, y, z, w);
        }
        else
        {
            SumNumbersFast(x, y, z, w);
        }
    }

    float SumNumbers(float x, float y, float z, float w)
    {
        SyncTest.Write("SumNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        SyncTest.Write("z", z);
        SyncTest.Write("w", w);
        float res = x;
        res += y;
        res += z;
        res += w;
        SyncTest.Write("res", res);
        return res;
    }

    float SumNumbersFast(float x, float y, float z, float w)
    {
        SyncTest.Write("SumNumbers");
        SyncTest.Write("x", x);
        SyncTest.Write("y", y);
        SyncTest.Write("z", z);
        SyncTest.Write("w", w);
        float xy = x + y;
        float zw = z + w;
        float res = xy + zw;
        SyncTest.Write("res", res);
        return res;
    }

I’ve supplied my RunTests function with a test index, so I can run my first version on the first tests, and the second version on the 2nd test. Let’s see what the sync tests result in…

Shock horror! We have differences. A difference so tiny in fact that it’s not visible if you just print the value of the result to 6 decimal places. However because floating point units aren’t precise, doing the maths 2 different ways produced subtly different results. FastMath allows the compiler to choose which way to do it, and different compilers on different platforms may make different decisions. This is a very common reason for cross platform determinism issues.

For fun, let’s take the above example and add some rather contrived extra bits (note: numbers meaningless – I made em up by fiddlin!).

    void RunTests(int test_idx)
    {
        float x = 0.3128364f;
        float y = 0.12412124f;
        float z = 0.8456272f;
        float w = 0.82154819f;
        float res;
        if (test_idx == 0)
        {
            res = SumNumbers(x, y, z, w);
        }
        else
        {
            res = SumNumbersFast(x, y, z, w);
        }
        res *= 3212779;
        SyncTest.Write("bigres", res);

        if ( res >= 6760114.5f)
        {
            SyncTest.Write("Do 1 thing");
        }
        else
        {
            SyncTest.Write("Do another thing");
        }
    }

Looking at the sync results:

We can see how when plumbed through some big calculations and used in actual logical operations, these simple differences can result in dramatic changes in program behaviour. Whilst the above may seem a little contrived, applications like procedural content generators, physics engines or even full games execute large amounts of complex mathematics and branching code. All it takes is 1 single operation in that whole mess to exhibit indeterminism, and the end results could be utterly, completely different. Indeed, my first attempt on No Man’s Sky suffered similar issues and resulted in entire forests being different on PS4 (happily fixed pre-launch)! Just as with hardware, when FastMath is turned on, the compiler behaviour becomes an input because it affects the algorithm and thus the output.

As a quick side note, because FastMath is only used in optimized builds, it is often also responsible for determinism issues between debug and release versions of an app.

Trigonometry and other fancy functions

Many mathematical functions such as sin, cosin, tan, exp and pow do not exist as single instructions on hardware (or only exist on some hardware), and are instead implemented as part of maths libraries that come with whatever language you’re using. In C++ for example, this is the C standard lib, or in Unity it’s good old ‘Mathf.bla’.

Sadly, there seems to be absolutely no standardisation over the techniques or implementations that creators of these libraries use. As a result, different compilers on a given platform, or different platform, or different languages, or different versions of languages may or may not give the same results! Thus the math library implementation becomes an input to your algorithm!

As an example, I knocked together my own sin function using a minimax polynomial (no need to have any idea whatsoever what one is or how it works – that’s why we have math libraries and mathematicians!).

    void SinTest(int test_idx)
    {
        for (int angle_degrees = 0; angle_degrees <= 90; angle_degrees++)
        {
            float angle_radians = angle_degrees * Mathf.Deg2Rad;
            float res;
            if (test_idx == 0)
            {
                res = Mathf.Sin(angle_radians);
            }
            else
            {
                res = MiniMaxSin(angle_radians);
            }
            SyncTest.Write("Sin angle", angle_degrees);
            SyncTest.Write("Sin result", res);
            SyncTest.Write("");
        }
    }

    float MiniMaxSin(float x)
    {
        float x1 = x;
        float x2 = x * x;
        return 
            x1 * (0.99999999997884898600402426033768998f + 
            x2 * (-0.166666666088260696413164261885310067f + 
            x2 * (0.00833333072055773645376566203656709979f + 
            x2 * (-0.000198408328232619552901560108010257242f + 
            x2 * (2.75239710746326498401791551303359689e-6f - 
            x2 * 2.3868346521031027639830001794722295e-8f)))));
    }

This example compares the unity ‘Mathf.Sin’ to my own version and prints out the result for 90 different angles. If we look at the sync test though, at 43 degrees we get a slightly different result…

There’s nothing wrong with either version – they both approximate the mathematical sin function to a given level of precision. But just as with our FastMath problems, because they’re implemented differently, they give slightly different results that could turn into drastic changes in behaviour over time.

If you run into this, the only 100% robust approach is to download or write your own platform agnostic math libraries, and compile them yourself with things like FastMath turned off.

The last floating point resort

All of the issues we’ve discussed so far relate to floating point numbers. If developing for one platform, it’s relatively unlikely you’ll actually hit any of them, as they’re all dependent on different hardware, compiler or software library implementations.

If you do hit them however, they can be a real headache for all sorts of reasons. Even if you were prepared to ensure hardware settings were correct, fiddle with low level compiler options and implement custom math libraries, you may well be using fundamentally different hardware (such as CPU/GPU) or libraries / engines (such as Unity) in which you don’t have control over these features.

One fairly brutal option available as a last resort is simply to limit precision by chopping off some digits! Despite various stack overflow based opinions, in certain scenarios this approach can work perfectly well:

    float LoseSomePrecision(float x)
    {
        x = x * 65536f;
        x = Mathf.Round(x);
        x = x / 65536f;
        return x;
    }

Here we simply:

  • Multiply the floating point by a power of 2 constant
  • Round it to an integer
  • Divide the result by the same power of 2 constant

After plugging it in to our sin tests:

We can see how chopping off a few bits of precision has made both tests consistent (though less accurate).

Being a decimal society, it might have been tempting to just multiply by 1000000 and then divide by 1000000. However, floating point numbers work with binary, so if we want to actually reduce the number of bits required to represent our value, we must use a power of 2.

Of course, this technique won’t help you if your precompiled physics engine isn’t deterministic cross platform. In that situation you’ve really only got one option – don’t use the precompiled physics engine. But if after digging you’re finding it’s just a few math ops that throw the whole system off, limiting precision can be the solution.

Summary

In this post we’ve mainly looked at the low level issues that can crop up when dealing with floating point numbers, and how when dealing with multiple platforms or compilers, inconsistencies in low level behaviour can cause some real headaches.

In reality, variation in floating point behaviour is only likely to crop up when comparing multiple compilers / platforms. More often than not it’s issues with the application implementation (or that of libraries/engines it depends on) that result in determinsm problems. In the next post we’ll look at the basics of Application level determinism and one of the big culprits – global state.

Why Transparency Is Hard

I’ve worked with lots of brilliant content creators in my career, and one thing I’ve always hated is having to tell somebody that something they want is ‘impossible’. Worse than that though, is when I have to just declare it is impossible ‘for complicated programmer reasons’ rather than explain myself. This is something I’ve tried hard to avoid over the years, as it is always better when everybody on a team understands the tools and limitations they are working with.

Unfortunately, transparency comes up over and over again as an ‘impossible thing to do well for technical reasons I find it hard to explain’. It is a great effect that artists would always love to have, but breaks in all sorts of really annoying ways that are genuinely not possible to solve on current hardware.

So, this post is my attempt at clearly and concisely explaining, once and for all, why transparency is so hard without resorting to ‘for technical reasons’…

Rasterizing

To understand transparency, a little background in how scenes are actually rendered is needed. There are 2 general approaches to rendering:

  • Ray tracing: Draws a scene by firing rays through the scene to simulate how light bounces around.
  • Rasterizing: Draws a scene by taking a large number of primitives (normally triangles) and filling in the pixels on the screen that each primitive covers.

Methods of ray tracing range from the ‘too expensive to do in real time’ up to ‘way way way too expensive to do in real time’, and despite recent advances in GPUs (especially the new RTX cards), we aren’t going to be rendering entire game scenes this way for a few years yet.

So we’re left with rasterizing. A technique that in many ways is a beautiful hack. It is well suited to hardware, as it involves doing a few very simple tasks many many times over. Work out what pixels a primitive covers, then fill in each one.

Unfortunately, the massive gains in speed and simplicity we get with rasterizing come at a cost. As we are no longer truly simulating the behaviour of light, physical effects like shadows don’t just ‘fall out’ naturally – instead clever people had to work out how to bodge shadow effects into rasterized scenes, and they did it very well!

Depth

The first problem you hit with rasterizing is how to handle the situation in which 2 primitives cover the same pixel. Clearly you want the one in front to be visible, however unlike ray tracing, a rasterizer has to do extra work to make this happen.

Let’s start with this lovely simple spinning cube…

Looks great, but if we turn off the rasterizer features that handle depth, weird shit instantly happens…

Here you can see that sometimes the back faces of the cube seem to drawing over the front faces. This is because the graphics card doesn’t attempt to sort the triangles of the cube – all it knows is you’ve given it a bunch of triangles you want on screen as soon as possible. So, if it happens to draw the front faces first, and the back faces last, the back faces get drawn on top.

Fortunately, with the simple cube there is a simple solution – only draw the triangles that are facing the camera, and discard the others. This is known as back face culling.

But what if we have 2 cubes?

Here culling won’t necessesarilly help us. We have multiple front facing triangles (such as the blue on the rear cube and the pink on the front cube) that overlap on screen. In the above image we happen to be drawing stuff in the right order, but if we flip the order around…

Because the rear cube was drawn last it simply stomped on top of the front cube.

Again however, this one can be solved pretty easily. The cubes themselves aren’t actually overlapping, so by simply sorting the objects to render, and then ensuring we always draw starting from the back and working forwards, everything just works. This is known as the painters algorithm.

Unfortunately, there is another case that neither culling nor sorting can solve:

With only culling and sorting, we can either end up with this…

Or this…

And problems like this one don’t need some contrived situation with overlapping cubes, any concave mesh, which turns out to be most models, is going to suffer this sort of problem:

It is solved with the use of a depth buffer and depth testing. These techniques underpin modern rasterizing. The basic idea is that in addition to rendering your pixel colours you also render how far each pixel is from the camera, aka its depth.

Here we visualise the depth buffer for the torus. Red is close to the camera, blue is further away.

And again for the 2 cubes…

Using this additional information the rasterizer is able to check, before filling in a pixel, if it has already been filled in by another, closer triangle.

The depth buffer actually removes the need for sorting entirely. Indeed, in many renderers it is common to flip the sorting around and render front objects first, so they fill in the depth buffer as quickly as possible. That way the GPU doesn’t end up running expensive shaders for background objects only to have the results trampled later by foreground objects.

Transparency

Finally on to actual transparency! The big problem with transparency comes down to 2 points:

  • Transparent objects must be drawn back-to-front for the effect to look correct (as we will see imminently)
  • We have just proven above that it is not always possible to reliably draw objects back-to-front.

Let’s start with a simple quad, which in a game might easily be a nice flat glass window:

Lovely! But what if we want a green pane of glass as well?

Here you see the same scene, however on the left the green glass is in front, and on the right the blue glass is in front. The thing to note is that the central overlapping section is different in each image. Just to prove it, here they are sliced together:

This is because, as mentioned earlier, to colour a transparent pixel correctly, you must know the colour of whatever is behind it. This means that if you have multiple transparent objects, the draw order is very important.

Fortunately, the above situation is actually pretty easy to deal with by sorting, and indeed this is why many games comfortably have semi transparent glass windows

But what if we make our cube slightly transparent? How should it look? A first blast with all the depth testing and culling turned on would look like this:

This, however is pretty clearly not a cube made of transparent material, as we can’t see the back faces at all. It is simply a faded version of the original opaque cube. This is because all the work we did to get the opaque cube rendering correctly with culling and depth tests has hidden the back faces entirely. To see them, the culling and depth tests must be turned off:

Yay! So what’s the problem? It’s right back to where we started, as shown in this video:

If we’re lucky enough to draw the faces in the correct order, the cube looks right. However if they’re in the wrong order, the back faces trample the front, and the effect breaks.

Just as before, in the case of the cube there are some clever tricks we could do to get it working. In practice, you could first render all the back faces, then render all the front faces. However just like before, this technique falls apart when more complex or overlapping objects come into play:

Bleh!

So how do we solve this?

Sadly, right now, the truth is we really don’t, or at least can’t do it anywhere near fast enough 😦 Rasterizer based hardware simply doesn’t function in a transparency friendly way. For opaque objects, provided there are no bugs in shaders, we can guarantee they will always look correct, regardless of the scene. However for transparent objects, we must make scenes carefully constructed to hide transparency artefacts.

A common bug might be a semi-transparent object held by the player which looks fine until they pass their hand through some other semi transparent object such as a window or smoke. In this situation, at some point, the hand is going to trample the smoke, or the smoke is going to trample the hand. Something is going to break!

The classic glitch, familiar to anyone who’s been in games long enough, looks like this:

Note how the opaque cube (right) interacts correctly with the transparent green pane of glass. However when the transparent cube interacts with the transparent glass, it almost seems to ‘pop’ behind it.

Once we finally get to truly ray traced games, this issue will disappear along with host of others. A ray tracer ultimately simulates the physics of light, so stuff like transparency just works. However for now, we’re stuck with rasterizers and grumpy programmers!

So there you have it! My long winded but hopefully helpful to somebody explanation of exactly why transparency is a total PITA and why programmers typically grumble about technical reasons etc etc when anybody mentions something like transparent curtains…

Shaderoids 2

Right, I left off last time with the beginnings of a game, now its time to add some sounds and other shizzle. It’s up on git hub now here

Sounds

 

Unfortunately one thing that just isn’t possible right now is playing sounds from the GPU (booo). However, I can implement a system where our GPU side code can ‘request’ a given sound to be played. Then the CPU side will just act on those requests.

We’ll start with the simplest possible 1-shot ‘audio engine’ you can build in unity….

    //array of clips + list of allocated audio sources
    AudioClip[] _clips;
    List _audioSources;

    //load list of clips
    void LoadClips(params string[] names) {
        _audioSources = new List();
        _clips = names.Select(a => Resources.Load(a)).ToArray();
    }

    //find a free audio source, allocating if necessary, and play clip
    void PlayClip(int idx) {
        AudioSource src = null;
        for(int i = 0; i < _audioSources.Count; i++) {
            if(!_audioSources[i].isPlaying) {
                src = _audioSources[i];
                break;
            }
        }
        if (!src) {
            src = gameObject.AddComponent();
            _audioSources.Add(src);
        }
        src.clip = _clips[idx];
        src.Play();
    }

I’ve used that pattern a few times for very quick prototypes. It basically lets you load a set of audio files from resources, then ask to play them once.

Lets load 3 audio files I ripped out of the earlier you tube vid of the actual game…

LoadClips("fire", "explode", "blop");

Next, a simple structure that will be mirrored CPU and GPU side. This represents a request to play a sound, and for now just contains the integer id of the sound to be played:

    public struct SoundRequest {
        public int id;
    }

I allocate a compute buffer to contain an array of those requests, alongside a CPU buffer to copy the requests into. On top, the counters structure now has a ‘numSoundRequests’ variable. At the end of the game update we can now read out and process the requests:


    _soundRequestBuffer.GetData(_cpuSoundRequests);
    _countersBuffer.GetData(_cpuCounters);
    for (int i = 0; i < _cpuCounters[0].numSoundRequests; i++)
        PlayClip(_cpuSoundRequests[i].id);

And for the last bit of plumbing, a GPU side function in the compute side for making a request:

//matching ids of sounds loaded in game.cs
#define SND_FIRE 0
#define SND_EXPLODE 1
#define SND_BLOP 2

void PlaySound(int id) {
    int idx;
    InterlockedAdd(_counters[0].numSoundRequests, 1, idx);
    if (idx < _maxSoundRequests) {
        _soundRequestsRW[idx].id = id;
    }
}

Now I can play sounds from GPU code! For example, when the player hits the space bar to fire a bullet:

if (_keyStates[' '].pressed) {
    //...stuff for firing the bullet...
    PlaySound(SND_FIRE);
}

I hook this up to firing and exploding asteroids by now, and it’s already starting to feel gamey!

Text

Hmmmm.. text rendering on GPU. I want my GPU code to at least be able to ask to draw a given character – e.g. DrawCharacter(‘a’). I don’t really want to cheat and start using bitmap fonts, so… I guess I’m programming a vector font. I create Font.cginc, which is full of stuff like this….

    switch (character) {
    case 'a': 
    case 'A':
    {
        AddFontLine(idx++, float2(0, 0), float2(0, 0.5f));
        AddFontLine(idx++, float2(1, 0), float2(1, 0.5f));
        AddFontLine(idx++, float2(0.5f, 1), float2(0, 0.5f));
        AddFontLine(idx++, float2(0.5f, 1), float2(1, 0.5f));
        AddFontLine(idx++, float2(0, 0.5f), float2(1, 0.5f));

        break;
    }

Basically, I’m filling up a big buffer, which has 16 line slots for each character. i.e. for the character code 65 (‘A’), there are up to 16 lines stored in a font buffer (_font) at offset 16*65.

I’ve created a simple compute shader called ‘BuildFont’, which gets dispatched with 256 threads – 1 per ascii character, though I’ve obviously only filled in ‘A’-‘Z’ and a few other bits and pieces!

Now a simple test shader:

[numthreads(256, 1, 1)]
void DrawFont(uint3 id : SV_DispatchThreadID)
{
    int lineIdx = AllocLines(LINES_PER_CHARACTER);
    int charOffset = id.x * LINES_PER_CHARACTER;

    float2 pos = float2(id.x / 16, id.x % 16) * float2(20,25) + 100;
    float2 scl = float2(15,20);

    for (int i = 0; i < LINES_PER_CHARACTER; i++) {
        float2 a = _font[charOffset + i].a * scl + pos;
        float2 b = _font[charOffset + i].b * scl + pos;
        AddLine(lineIdx++, a, b);
    }
}

This renders a 16×16 grid of characters on screen. For each one, we draw all 16 lines (note: empty lines are degenerate so don’t do anything). The result:

vecfont

Woohoo! 1 hardcoded vector font. No better way to spend the first chunk of a 10 hour plane flight. Unfortunately I discovered the plane has wifi, so was able to check out the actual asteroids font, and discovered it is way less curvy than mine. Oh well.

Teeny bit of UI

To get started on the UI and game loop, I’ll add a single ‘UpdateGame’ kernel.

[numthreads(1,1,1)]
void UpdateGame(uint3 id : SV_DispatchThreadID)
{
    int gameMode = _counters[0].gameMode;
    if (gameMode == 0) {
        //init
        gameMode = 1;

    }
    else if (gameMode == 1)
    { 
        //main menu
        DrawText_OneCoinOnePlay();
    }

}

This is probably the closest bit to a classic game, as it’s a single function that gets called once per frame to update general game state. At this stage, we only have the game mode, which starts of as 0 (init) then proceeds to 1 (main menu). I’ve also added our first bit of UI:

void DrawText_OneCoinOnePlay() {
    float2 pos = float2(350,150);
    float2 scl = float2(18, 20);
    float2 spacing = float2(scl.x*1.25, 0);

    DrawCharacter('1', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('C', pos, scl); pos += spacing;
    DrawCharacter('O', pos, scl); pos += spacing;
    DrawCharacter('I', pos, scl); pos += spacing;
    DrawCharacter('N', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('1', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('P', pos, scl); pos += spacing;
    DrawCharacter('L', pos, scl); pos += spacing;
    DrawCharacter('A', pos, scl); pos += spacing;
    DrawCharacter('Y', pos, scl); pos += spacing;
}

At this point though I want to get some proper startup code in. Ideally when the player presses ‘start’ we would init the game state and fire them into the first level. This creates a need for compute shaders to trigger other compute shaders, which leads onto…

More indirect dispatching

Enter the indirect dispatch pattern. The basic concept of indirect dispatch is that sometimes you need a compute shader to control how a following compute shader will be dispatched. For example, our current ‘spawn asteroids’ shader is dispatched 10 times to spawn 10 asteroids, but what if we want to control how many asteroids are spawned from a compute shader instead?

For this purpose I create a new compute buffer called _dispatch, with the type ComputeBuffer.IndirectArguments. It’s just a buffer of uints, each of which will represent one of the arguments we’d normally provide to the Dispatch function.

For this simple purpose I’ll have a load of parameters that can be passed to shaders via what used to be the counters buffer, and is now the globals buffer. In addition, I’ll have a simple setup kernel that checks which kernel is to be dispatched, and configures the indirect buffer accordingly. This approach is not particularly scalable but simple and effective.

So our globals structure now has 2 new ‘request’ integers:

        public int requestClearAsteroids;
        public int requestSpawnAsteroids;

And the game update can make requests like this:

        //when 'c' is pressed, ask to clear and then spawn 4 new asteroids
        if (_keyStates['c'].pressed) {
            _globals[0].requestClearAsteroids = true;
            _globals[0].requestSpawnAsteroids = 4;
            gameMode = 1;
        }

A special ‘SetupDispatch’ shader now takes a predefined kernel id constant to run, and then uses the request values to fill out the dispatch settings:

//dispatch setup function that takes the kernel id being setup for and runs whatever logic
//is necessary for it
[numthreads(1,1,1)]
void SetupDispatch(uint3 id: SV_DispatchThreadID)
{
    switch (_kernelIdRequested) {
    case KID_CLEAR_ASTEROIDS: {
        //the CLEAR ASTEROIDS dispatch will dispatch across nothing if not requested, or
        //all asteroids if requested
        _dispatch[0] =  _globals[0].requestClearAsteroids ? (_maxAsteroids+255)/256 : 0;
        break;
    }
    case KID_SPAWN_ASTEROIDS: {
        //SPAWN_ASTEROIDS dispatches the number of requested asteroids
        _dispatch[0] = (_globals[0].requestSpawnAsteroids + 255) / 256;
        break;
    }
    default: {
        _dispatch[0] = 0; 
        break;
    }
    }
    _dispatch[1] = 1;
    _dispatch[2] = 1;
}

And functions like our SpawnAsteroids kernel look almost identical, with the small caveat that they may need to read the request counter to avoid overrunning:

//spawns 'n' requested asteroids
[numthreads(256,1,1)] 
void SpawnAsteroids(uint3 id : SV_DispatchThreadID) {
    //read count (either from uniform or indirect request)
    int maxCount = _threadCount < 0 ? _globals[0].requestSpawnAsteroids : _threadCount;
    _globals[0].requestSpawnAsteroids = 0;
    if (id.x < maxCount)
    { 
        //usual stuff
    }
}


The last bit of plumbing is a new DispatchIndirect helper CPU side, which first dispatches the setup kernel before dispatching the main one:

    //dispatches the 'setup dispatch' shader to configer the dispatch args for a given
    //dispatch type, then does an actual dispatch indirect on the requested kernel
    void DispatchIndirect(int kernel, int dispatchIndex) {
        _asteroidsShader.SetInt("_kernelIdRequested", dispatchIndex);
        DispatchOne(kernelSetupDispatch);

        BindEverything(kernel);
        _asteroidsShader.SetInt("_threadCount", -1);
        _asteroidsShader.DispatchIndirect(kernel, _dispatchBuffer);
    }

And finally the CPU game update can first dispatch UpdateGame, then dispatch both ClearAsteroids and SpawnAsteroids on the off chance they have work to do.

        DispatchOne(kernelUpdateGame);
        _playerState.GetData(players);
        DispatchIndirect(kernelClearAsteroids, KID_CLEAR_ASTEROIDS);
        DispatchIndirect(kernelSpawnAsteroids, KID_SPAWN_ASTEROIDS);

Phew! Lots of setup, but that’s a nice simple example of indirect dispatch being used in anger.

Spawning the player

One final pain going into this lot is spawning the player from compute. The game has to wait until there’s no asteroids in the way before spawning, which is actually none trivial on GPU. To do so, all asteroids must be checked against the player. Thus I had to write a couple of new kernels. The first checks players that want to spawn, and marks them as ‘can spawn’:

//prepares all players that want to spawn for spawning, and defaults their 'canSpawn'
//to true
[numthreads(256, 1, 1)]
void PreparePlayerSpawning(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        PlayerState player = _playersRW[id.x];
        if (player.wantsToSpawn) {
            player.position = float2(1024, 768) * 0.5;
            player.velocity = 0;
            player.rotation = 0;
            player.canSpawn = true;
        }
        _playersRW[id.x] = player;
    }
}

The 2nd is very similar to the original CollidePlayerAsteroid, which compares the players to all asteroids, and if they overlap, sets ‘can spawn’ back to false’:

//checks all players that want to spawn against asteroids, and kills the 'canSpawn' flag
//for any that are too close to asteroids
[numthreads(256, 1, 1)]
void UpdatePlayerSpawning(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        int playerIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (playerIdx*_maxAsteroids);

        PlayerState player = _playersRW[playerIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (player.wantsToSpawn && asteroid.alive) {
            if (length(player.position - asteroid.position) < (asteroid.radius + 10 + 50)) {
                _playersRW[playerIdx].canSpawn = false;
            }
        }
    }
}

Then with this cheeky code in UpdateAndDrawPlayer:

        PlayerState player = _playersRW[id.x];
        if (player.wantsToSpawn && player.canSpawn) {
            player.wantsToSpawn = false;
            player.canSpawn = false;
            player.alive = true;
        }

We can detect players that both want to and can spawn, and spawn em!

With some tweaks to the game loop (partly to track current level), and the ability to spawn players, it’s all starting to look pretty cool:

More text and scores and things

Somewhere along the way I felt dirty requesting loads of lines for every render, so added a 2nd buffer and rendering pipeline for requesting characters (just like you can request lines). The basic concept is that, just like with lines, I can submit a list of requests for characters by code, size and position. I then have another procedural pass in DrawLines.shader that uses this function:

        //reads character id, then looks it up in lines buffer which should contain font
        v2f charvert(uint id : SV_VertexID, uint inst : SV_InstanceID)
        {
            int vertsPerCharacter = (LINES_PER_CHARACTER * 2);
            int charIdx = id / vertsPerCharacter;

            int charVertex = id % vertsPerCharacter;
            int charLine = charVertex / 2;
            int charLineVertex = charVertex % 2;

            Character c = characters[charIdx];

            Line l = lines[c.id*LINES_PER_CHARACTER + charLine];
            float2 p = charLineVertex==0 ? l.a : l.b;

            p = p * c.scl + c.pos;

            v2f o;
            o.vertex = toScreen(p);
            return o;
        }

Just like the old lines one, it takes a vertex id, breaks it down into character index, chracter vertex, character line and line vertex, then fiddles with it and outputs it!

Now for some numbers. All numbers in asteroids have a 0 on the end, cos it looks cooler. So the new DrawNumberTimes10 function will do just that…

void DrawNumberTimes10(int number, float2 pos, float2 scl, bool pad) {
    float2 spacing = float2(scl.x*1.25, 0);

    int idx = AllocCharacters(8);

    bool onFirstChar = true;
    for (int base = 1000000; base > 0; base /= 10) {
        int val = clamp(number / base,0,9);
        number -= val * base;
        if (val != 0 || !onFirstChar || base==1) {
            //output character
            DrawCharacter(idx++, '0'+val, pos, scl); pos += spacing;
            onFirstChar = false;
        }
        else {
            //output space and increment pos if padding requested
            DrawCharacter(idx++, ' ', pos, scl);
            if(pad)
                pos += spacing;
        }
    }

    //final 0
    DrawCharacter(idx++, '0', pos, scl); pos += spacing;
}

Now by adding a score value, I can draw the nice numbers on screen:

    DrawNumberTimes10(_playersRW[0].score, float2(100, 700), float2(18, 20), true);
    DrawNumberTimes10(0, float2(800, 700), float2(18, 20), false);
    DrawNumberTimes10(0, float2(500, 680), float2(18, 20)*0.75, false);

Not sure what the other 2 numbers are supposed to be yet. Presumably credits and player 2 score. Oh well who cares – it looks cool:

Summary

Well, after that brain dump of crazy I have quite a nice thing goin on. It’s not asteroids yet, but any game logic is completely separate from CPU. Compute shaders are responsible for init, game loop and all general physics and spawning etc. Next time I’ll probably get some effects and flying saucers in or something…

Also – code:
https://github.com/chriscummings100/shaderoids

Shaderoids 1

Ok, time for something really productive. I have decided to put my pretty solid knowledge of GPU coding to the best possible use, and produce a perfect (as possible) clone of the 1979 game, Asteroids… entirely written on GPU.

Why you ask? Frankly you should be ashamed for asking that. Also, did you not notice the title of this web site?

First up, some rules:

  • All game code, graphics, physics and rendering entirely driven by GPU
  • CPU may only:
    • Read hardware (such as inputs), and pass to GPU
    • Fire off a fixed set of dispatches
    • Read requests from GPU to write to hardware (such as audio)
  • Pure line rendering

fyi, since writing, I’ve plonked all the code up here

Basic architecture / flow

The idea is to make the CPU side of things as minimal as possible. With that in mind, my basic flow is going to be a single MonoBehaviour in C# which:

  • Reads a set of input states (probably A-Z, 0-9 and a few others), and writes them into a compute buffer
  • Executes a series of dispatches to fire off various GPU compute kernels
  • Execute DrawProceduralIndirect to draw a load of lines generated by compute. (high bar: line renderer in compute!)
  • Read set of ‘audio requests’ written to buffer by GPU, and use to trigger some predefined sounds

So lets start with 3 files:

  • Game.cs for our monobehaviour
  • Asteroids.compute to contain all the compute jobs
  • DrawLines is a simple fragment shader that draws a set of lines
using UnityEngine;

public class Game : MonoBehaviour 
{
    public struct Line {
        public Vector3 a;
        public Vector3 b;
    }

    public const int LINE_BUFFER_SIZE = 10000;

    ComputeShader _asteroidsShader;
    Shader _drawLinesShader;
    Material _drawLinesMaterial;

    ComputeBuffer _linesBuffer;


    public void Awake() {
        _asteroidsShader = Resources.Load("asteroids");
        _drawLinesShader = Shader.Find("DrawLines");
        _drawLinesMaterial = new Material(_drawLinesShader);
        _drawLinesMaterial.hideFlags = HideFlags.HideAndDontSave;

        _linesBuffer = ComputeBufferUtils.Alloc(LINE_BUFFER_SIZE);
    }

    public void OnPostRender() {

        Line[] testLines = new Line[3];
        testLines[0] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(1, 0, 0) };
        testLines[1] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(1, 1, 0) };
        testLines[2] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(0, 1, 0) };
        _linesBuffer.SetData(testLines);

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProcedural(MeshTopology.Lines, testLines.Length*2);
    }
}    

This code starts by loading up some shaders and building a material, then sets up a compute buffer that’ll contain a set of Line structures to be rendered. ComputeBufferUtils is a handy class for allocating structured buffers that I’ll upload at some point.

The OnPostRender function simply plonks 3 lines into the lines buffer, passes it to the draw line material and finally calls DrawProcedural, requesting line topology, and specifying 2 vertices per line.

DrawProcedural is a tasty beast. It allows us to write a vertex shader takes just an integer index. It is then up to the shader to take this index and convert it into useful vertex positions, which is done using data in the provided compute buffer:

Shader "DrawLines"
{
    Properties
    {
    }
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 100

        Pass
        {
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            
            struct v2f
            {
                float4 vertex : SV_POSITION;
            };

            struct Line {
                float3 a;
                float3 b;
            };

            StructuredBuffer lines;
            
            
            v2f vert (uint id : SV_VertexID, uint inst : SV_InstanceID)
            {
                Line l = lines[id / 2];
                float3 p = (id & 1) ? l.a : l.b;
                
                v2f o;
                o.vertex = float4(p, 1);
                return o;
            }
            
            fixed4 frag (v2f i) : SV_Target
            {
                return 1;
            }
            ENDCG
        }
    }
}

This cheaky little shader uses the vertex id to access the lines buffer and work out whether the vertex represents the start or the end of the line.

Result…

firstlines

Woohoo! Nearly there….?

Compute and indirect drawing

Right, time to write that compute shader I loaded earlier!

Let’s start by adding a new structure to the game code, and allocating a couple of extra buffers:

    public struct Counters {
        public int numLines;
    }
    _countersBuffer = ComputeBufferUtils.Alloc(1);
    _dispatchBuffer = ComputeBufferUtils.Alloc(8, ComputeBufferType.IndirectArguments);

The compute shader (Asteroids.compute) mimics the CPU side structure and buffers:

//structures
struct Line {
    float3 a;
    float3 b;
};
struct Counters {
    int numLines;
};

//buffers
RWStructuredBuffer _linesRW;
RWStructuredBuffer _counters;
RWStructuredBuffer _dispatch;

//general use uniform to limit dispatch thread counts
int _threadCount;


Now for some compute kernels. The first, I’ll dispatch once at the start of each frame to clear the line counter:

[numthreads(1,1,1)]
void ClearLines(uint3 id : SV_DispatchThreadID)
{
    _counters[0].numLines = 0;
}

This is designed to be executed as a single thread on the GPU, and just does one tiny bit of ‘setup’ work. This pattern is very common in GPU code – tiny compute jobs to pass around / clear / tweak some data, then large parallel dispatches to do actual work.

The next kernel is a more classic compute job, and designed to run in a highly parallel manner.

//creates lines based on dispatch thread
[numthreads(256, 1, 1)]
void GenerateTestLines(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        //allocate space
        int lineIdx;
        InterlockedAdd(_counters[0].numLines, 1, lineIdx);

        //build line
        Line l;
        float ang = radians(id.x);
        float3 dir = float3(sin(ang), cos(ang), 0);
        l.a = dir * 0.1f;
        l.b = dir * 0.75f;
        _linesRW[lineIdx] = l;
    }
}

This uses InterlockedAdd to ‘allocate’ a slot in the lines buffer by atomically incrementing the numLines counter. A line is then generated based on the thread index, and written out to the correct slot. In future I’ll use this pattern for outputting all lines that need rendering.

Next, another simple setup kernel that takes the numLines counter and converts it into a set of arguments that can be passed to DrawProceduralIndirect:


//fills out indirect dispatch args 
[numthreads(1, 1, 1)]
void LineDispatchArgs(uint3 id : SV_DispatchThreadID)
{
    _dispatch[0] = _counters[0].numLines*2; //v count per inst (2 verts per line)
    _dispatch[1] = 1; //1 instance
    _dispatch[2] = 0; //verts start at 0
    _dispatch[3] = 0; //instances start at 0
}

This indirect technique is used with both drawing and dispatching, and allows the use of a compute job to setup the work for a following compute job or draw call. In this case I’m setting up the arguments required for DrawProcedural.

That little ‘*2’ in LineDispatchArgs took a while to spot! Interesting fun – if you’re on NVidia and force Gen lines to only build 64 lines, and remove the ‘*2’ you see it alternate between 2 groups of 32 lines. That’s cos you’re only rendering half the lines you generate. As an NVidia GPU works in warps of 32 threads you randomly get the 1st half or the 2nd half. Hardware showing its true form!

The Game.cs OnPostRender now looks roughly like this:

void OnPostRender()
{
        _asteroidsShader.SetBuffer(kernelClearLines, "_counters", _countersBuffer);
        DispatchOne(kernelClearLines);

        _asteroidsShader.SetBuffer(kernelGenerateTestLines, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernelGenerateTestLines, "_linesRW", _linesBuffer);
          DispatchItems(kernelGenerateTestLines, 360);

        _asteroidsShader.SetBuffer(kernelLineDispatchArgs, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernelLineDispatchArgs, "_dispatch", _dispatchBuffer);
        DispatchOne(kernelLineDispatchArgs);

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProcedural(MeshTopology.Lines, testLines.Length*2);
        Graphics.DrawProceduralIndirect(MeshTopology.Lines, _dispatchBuffer);
}

Where DispatchOne and DispatchItems are a couple of handy helpers for dispatching compute kernels:

    void DispatchOne(int kernel) {
        _asteroidsShader.Dispatch(kernel, 1, 1, 1);
    }
    void DispatchItems(int kernel, int items) {
        uint x,y,z;
        _asteroidsShader.GetKernelThreadGroupSizes(kernel, out x, out y, out z);
        _asteroidsShader.SetInt("_threadCount", items);
        _asteroidsShader.Dispatch(kernel, (items + (int)x - 1) / (int)x, 1, 1);
    }

Anyhoo, making it 2 degrees per line + rendering at 1024*768:

linecircleincompute

(oh yeah – I changed the clear colour to black too!).

Next, I think I need to correct for aspect ratio and stuff. Gonna operate at fixed res of 1024/768 then upscale / downscale points. Hmmm… spent a while trying to make this work with any resolution but got bored. Instead I’ll just convert from (1024,768) to screen space:

p = 2 * (p – float2(1024,768)*0.5) / float2(1024,768);

Also changed lines to be float2 whilst there!

A bit of game

It’s getting a bit big to post all the code up here, but I’ll eventually share it all on git hub!

Let’s get inputs in. To stay true to the goal, I want to avoid any understanding of ‘gameplay’ on CPU, so I’ll just push a big list of keys into game. CPU side, I’ll setup a KeyState structure and read a load of buttons into it each frame:


    public struct KeyState {
        public bool down;
        public bool pressed;
        public bool released;
    }
      for(int i = 0; i < 26; i++) {
            _cpuKeyStates['a' + i] = new KeyState {
                down = Input.GetKey(KeyCode.A + i),
                pressed = Input.GetKeyDown(KeyCode.A + i),
                released = Input.GetKeyUp(KeyCode.A + i),
            };
        }

On top of ‘A’-‘Z’, I also added some handy keys like ‘0’-‘9’, ‘ ‘ and a few other bits.

Now to define a new PlayerState (which also comes with a new _playerState buffer).

    public struct PlayerState {
        public Vector2 position;
        public float rotation;
        public Vector2 speed;
        public bool alive;
    }

For the moment, I’ll init this CPU side to get going

        PlayerState[] initPlayer = new PlayerState[1];
        initPlayer[0].position = new Vector2(1024f, 768f) * 0.5f;
        initPlayer[0].alive = true;
        _playerState.SetData(initPlayer);

The long term goal is to get a proper game loop going, so I don’t feel too dirty about a bit of CPU setup for now!

So compute side I’ve setup corresponding structures and buffers. Also added these few helper functions for drawing lines and manipulating vectors:

int AllocLines(int count) {
    int lineIdx;
    InterlockedAdd(_counters[0].numLines, count, lineIdx);
    return lineIdx;
}
void AddLine(int idx, float2 a, float2 b) {
    _linesRW[idx].a = a;
    _linesRW[idx].b = b;
}

float2 mulpoint(float3x3 trans, float2 p) {
    return mul(trans, float3(p, 1)).xy;
}
float2 mulvec(float3x3 trans, float2 p) {
    return mul(trans, float3(p, 0)).xy;
}

For now it’s 1 player, but I want to plan for a million, so I’ll work as though there’s more than 1. The basic compute kernel for player update looks like this:

[numthreads(256, 1, 1)]
void UpdateAndDrawPlayer(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        PlayerState player = _playersRW[id.x];
        if(player.alive) {
            //do stuff
        }
        _playersRW[id.x] = player;

    }
}

The core update code splits into 2 sections. First, standard asteroid style inputs:

            float rot = 0;
            float thrust = 0;
            float rotPerSecond = 1;
            float thrustPerSecond = 100;

            if (_keyStates['A'].down) {
                rot += rotPerSecond * _timeStep;
            }
            if (_keyStates['D'].down) {
                rot -= rotPerSecond * _timeStep;
            }
            if (_keyStates['W'].down) {
                thrust += thrustPerSecond * _timeStep;
            }
            player.rotation += rot;

            float2 worldy = float2(sin(player.rotation), cos(player.rotation));
            float2 worldx = float2(-worldy.y, worldy.x);

            player.velocity += worldy * thrust;
            player.position += player.velocity * _timeStep;

Then the second half calculates a transform matrix and generates some hard coded lines:

            worldx *= 50;
            worldy *= 50;
            float3x3 trans = {
                worldx.x, worldy.x, player.position.x,
                worldx.y, worldy.y, player.position.y,
                0, 0, 1
            };

            int lineIdx = AllocLines(5);
            
            float2 leftcorner = mulpoint(trans, float2(-0.7, -1));
            float2 rightcorner = mulpoint(trans, float2(0.7, -1));
            float2 tip = mulpoint(trans, float2(0, 1));
            float2 leftback = mulpoint(trans, float2(-0.2, -0.7f));
            float2 rightback = mulpoint(trans, float2(0.2, -0.7f));

            AddLine(lineIdx++, leftcorner, tip);
            AddLine(lineIdx++, rightcorner, tip);
            AddLine(lineIdx++, leftcorner, leftback);
            AddLine(lineIdx++, rightcorner, rightback);
            AddLine(lineIdx++, leftback, rightback);


Those vectors look roughly right I think. Side by side with the video:

shipsidebyside

After a bit of debugging, I’ve found the bool types in KeyState appeared to be causing problems. Seems odd, but I can fix it with ints and don’t fancy digging too deep right now. Once that’s sorted, we have a game sort of…

Quickly add screen wrapping…

            //wrap player (note: better version should handle overshoot amount)
            player.position = player.position >= 0 ? player.position : float2(1024, 768);
            player.position = player.position <= float2(1024,768) ? player.position  : 0;

This quick and dirty version could be better by handling the fact that if they overshoot by k pixels, they shoot come back k pixels from the other side. Pain in the ass though!

The thrust in the video looks like it’s just a flashing triangle, maybe with a bit of randomness thrown in. To achieve this, I’ll pass through the frame number, which is fed into a wang hash based random number generator. Then some cheeky code to add the extra triangle when thrusting:


            int thrustframe = (_frame / 4);
            if (thrust > 0 && (thrustframe &1)) {
                lineIdx = AllocLines(2);
                float2 thrustback = mulpoint(trans, float2(0.0f, -1.5f-wang_rand(thrustframe)*0.15f));
                AddLine(lineIdx++, leftback, thrustback);
                AddLine(lineIdx++, rightback, thrustback);
            }

Asteroids

Just like with players, I’ll define a structure for the asteroid state, with a bit of extra data in:

    public struct AsteroidState {
        public Vector2 position;
        public float rotation;
        public Vector2 velocity;
        public int alive;
        public float radius;
        public int level;
    }

As before, I create a buffer for this, and init a randomly placed, sensible set of asteroids:

        AsteroidState[] initAsteroids = new AsteroidState[MAX_ASTEROIDS];
        for(int i = 0; i < START_ASTEROIDS; i++) {             while(true) {                 initAsteroids[i].position = new Vector2(Random.Range(0f, 1024f), Random.Range(0f, 768f));                 if ((initAsteroids[i].position - initPlayer[0].position).magnitude > 200f)
                    break;
            }
            initAsteroids[i].alive = 1;
            initAsteroids[i].radius = 30;
            initAsteroids[i].rotation = Random.Range(-Mathf.PI, Mathf.PI);
            initAsteroids[i].velocity = Random.insideUnitCircle * 50f;
            initAsteroids[i].level = 0;
        }
        _asteroidState.SetData(initAsteroids);

A note at this point, after a few bugs I’ve swizzled a few bits around:

  • I’ve split update and render out into the unity Update and OnPostRender functions, as reading input seemed to break a bit when called from render
  • I’ve added a helpful BindEverything function that just takes a compute kernel and binds all my buffers and variables to it. This would be terrible behaviour in production, but this isn’t production.
  • I also changed ClearLines to BeginFrame, cos it’s starting to look like it’ll do more than clear out lines

The BindEverything code:

    void BindEverything(int kernel) {
        _asteroidsShader.SetInt("_maxPlayers", MAX_PLAYERS);
        _asteroidsShader.SetInt("_maxAsteroids", MAX_ASTEROIDS);
        _asteroidsShader.SetFloat("_time", Time.time);
        _asteroidsShader.SetFloat("_timeStep", Time.deltaTime);
        _asteroidsShader.SetInt("_frame", Time.frameCount);

        _asteroidsShader.SetBuffer(kernel, "_dispatch", _dispatchBuffer);
        _asteroidsShader.SetBuffer(kernel, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernel, "_linesRW", _linesBuffer);
        _asteroidsShader.SetBuffer(kernel, "_keyStates", _keyStates);
        _asteroidsShader.SetBuffer(kernel, "_playersRW", _playerState);
        _asteroidsShader.SetBuffer(kernel, "_asteroidsRW", _asteroidState);
    }

With the asteroids added, the dispatch now looks like this:

    private void Update() {
        //all my inputs are read into _cpyKeyStates here
        _keyStates.SetData(_cpuKeyStates);

        DispatchOne(kernelBeginFrame);
        DispatchItems(kernelUpdateAndDrawPlayer, MAX_PLAYERS);
        DispatchItems(kernelUpdateAndDrawAsteroid, MAX_ASTEROIDS);
        DispatchOne(kernelLineDispatchArgs);
    }

And render is still very simple:

    public void OnPostRender() {

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProceduralIndirect(MeshTopology.Lines, _dispatchBuffer);
    }

So, the asteroids update kernel. It’s pretty similar to player update in functionality. The first half updates its position, and the second renders it.

//updates player movement and outputs draw request
[numthreads(256, 1, 1)]
void UpdateAndDrawAsteroid(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)     {         AsteroidState asteroid = _asteroidsRW[id.x];         if (asteroid.alive) {             asteroid.position += asteroid.velocity * _timeStep;             asteroid.position = asteroid.position >= 0 ? asteroid.position : float2(1024, 768);
            asteroid.position = asteroid.position <= float2(1024, 768) ? asteroid.position : 0;

            float scl = asteroid.radius;

            float2 worldy = float2(sin(asteroid.rotation), cos(asteroid.rotation));
            float2 worldx = float2(-worldy.y, worldy.x);
            worldx *= scl;
            worldy *= scl;
            float3x3 trans = {
                worldx.x, worldy.x, asteroid.position.x,
                worldx.y, worldy.y, asteroid.position.y,
                0, 0, 1
            };

            //alloc edges
            const int NUM_EDGES = 9;
            int lineIdx = AllocLines(NUM_EDGES);

            //build first point then start iterating
            float randscl = 0.75f;
            float2 first;
            {
                int i = 0;
                float ang = 0;
                float2 pos = float2(sin(ang), cos(ang));
                pos += randscl * float2(wang_rand(id.x*NUM_EDGES + i), wang_rand(id.x*NUM_EDGES * 2 + i));
                first = mulpoint(trans, pos);
            }
            float2 prev = first;
            for (int i = 1; i < NUM_EDGES; i++) {

                //offset every other point using random number
                float ang = (i*3.1415927f*2.0f) / NUM_EDGES;
                float2 pos = float2(sin(ang), cos(ang));
                pos += randscl * float2(wang_rand(id.x*NUM_EDGES + i), wang_rand(id.x*NUM_EDGES * 2 + i));

                //add new line
                float2 curr = mulpoint(trans, pos);
                AddLine(lineIdx++, prev, curr); 
                prev = curr;
            }

            //add final line to joinn previous point to first point
            AddLine(lineIdx++, prev, first);

        }
        _asteroidsRW[id.x] = asteroid;
    }
}

I spent quite a while before I was happy with the look of the asteroids, and I’m still not fully satisfied. The above code basically generates a 9 edge circle, then randomly adjusts the position of each vertex. It also takes account of the asteroid’s radius, in anticipation of varying sizes.

firstasteroids

Bullets

Bullets are the first slightly cheeky one, as I need to be able to spawn them, which means  allocating/freeing of some form. This kind of model is tricky on GPU, so I’m going to go for the simple option of a large circular buffer of bullets. If the buffer overflows it’ll start recycling bullets, but I’ll just make it big enough not too!

Aside from a life time, the bullet state is very simple:

    public struct BulletState {
        public Vector2 position;
        public Vector2 velocity;
        public float lifetime;
    }

And the update equally so:

//updates player movement and outputs draw request
[numthreads(256, 1, 1)]
void UpdateAndDrawBullet(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)     {         BulletState bullet = _bulletsRW[id.x];         if (bullet.lifetime > 0) {

            bullet.position += bullet.velocity * _timeStep;
            if (any(bullet.position < 0) || any(bullet.position > float2(1024, 768))) {
                bullet.lifetime = -1;
                return;
            }
            bullet.lifetime -= _timeStep;

            float scl = 2;
            float3x3 trans = {
                scl, 0, bullet.position.x,
                0, scl, bullet.position.y, 
                0, 0, 1
            };

            //alloc edges
            const int NUM_EDGES = 6;
            int lineIdx = AllocLines(NUM_EDGES);

            //build first point then start iterating
            float2 first;
            {
                int i = 0;
                float ang = 0;
                float2 pos = float2(sin(ang), cos(ang));
                first = mulpoint(trans, pos);
            }
            float2 prev = first;
            for (int i = 1; i < NUM_EDGES; i++) {

                float ang = (i*3.1415927f*2.0f) / NUM_EDGES;
                float2 pos = float2(sin(ang), cos(ang));
                float2 curr = mulpoint(trans, pos);
                AddLine(lineIdx++, prev, curr);
                prev = curr;
            }

            //add final line to joinn previous point to first point
            AddLine(lineIdx++, prev, first);

        }
        _bulletsRW[id.x] = bullet;
    }
}

The main difference is that unlike the ship and asteroid, bullets have a life time and automatically die when it runs out. I couldn’t figure out whether bullets wrap in the original game, but it felt better when they didn’t, so they die when going off screen for the moment.

Now to implement the spawning. I’ll start with adding a nextBullet counter:

    public struct Counters {
        public int numLines;
        public int nextBullet;
    }

Now for the code in the player update to fire one:

            if (_keyStates[' '].pressed) {
                int nextBullet;
                InterlockedAdd(_counters[0].nextBullet, 1, nextBullet);
                BulletState b;
                b.position = player.position;
                b.velocity = worldy * 1000;
                b.lifetime = 3;
                _bulletsRW[nextBullet%_maxBullets] = b;
            }

When space is pressed, I increment the bullet counter, then a new bullet is created and written into the new slot. Note the ‘mod’ allows the nextBullet counter to get ever higher but still be used as an index into the _bulletsRW buffer.

Collision

Ok! We have stuff. Now to smash it. If I hit millions of players and asteroids, collision will have to start using optimisation structures, but GPUs are fast and like nothing more than doing 1000s of identical calculations. So collision will be a classic brute force ‘test everything against everything else’ situation.

Lets start with player vs asteroid collision:

[numthreads(256, 1, 1)]  
void CollidePlayerAsteroid(uint3 id : SV_DispatchThreadID) 
{
    if (id.x < _threadCount)
    {
        int playerIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (playerIdx*_maxAsteroids);

        PlayerState player = _playersRW[playerIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (player.alive && asteroid.alive) {
            if (length(player.position - asteroid.position) < (asteroid.radius+10)) {
                _playersRW[playerIdx].alive = 0;
                _asteroidsRW[asteroidIdx].alive = 0;
            }
        }
    }
}

The actual collision code here is a basic circle test. If the player is within 10 pixels of the asteroid, they die. The sneaky bit is at the top, where the player and asteroid indices are calculated. This makes more sense when you look at the dispatch CPU side:

        DispatchItems(kernelCollidePlayerAsteroid, MAX_PLAYERS * MAX_ASTEROIDS);

In effect, I dispatch 1 thread for every combination of player and asteroid. The funky calculations in CollidePlayerAsteroid simply decompose the thread id into player index and asteroid index, just like you might convert a pixel index to a pixel coordinate in an image.

Shockingly enough, for now the bullet vs asteroid is pretty similar:

[numthreads(256, 1, 1)]
void CollideBulletAsteroid(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        int bulletIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (bulletIdx*_maxAsteroids);

        BulletState bullet = _bulletsRW[bulletIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (bullet.lifetime > 0 && asteroid.alive) {
            if (length(bullet.position - asteroid.position) < (asteroid.radius + 2)) {
                _bulletsRW[bulletIdx].lifetime = -1;
                _asteroidsRW[asteroidIdx].alive = 0;
            }
        }
    }
}

There’s one ingredient before we can declare real progress though – asteroids smash! To achieve smashing asteroids I’ll take a similar approach to the bullets. The asteroids buffer will be large enough to contain all the asteroids a level will ever need (big, medium and small), and I’ll just use a counter to allocate into it. Once that’s added, I can create a splitting function:

void SplitAsteroid(int idx) {
    int nextIndex;
    InterlockedAdd(_counters[0].nextAsteroid, 2, nextIndex);

    AsteroidState asteroid = _asteroidsRW[idx];
    if (asteroid.level < 2) {

        float childSpeed = 50; 

        AsteroidState child;
        child.position = asteroid.position;
        child.velocity = asteroid.velocity + (float2(wang_rand(nextIndex), wang_rand(nextIndex * 2)) * 2 - 1) * childSpeed;
        child.alive = 1;
        child.radius = asteroid.radius * 0.5;
        child.rotation = (wang_rand(nextIndex * 3) * 2 - 1) * 3.1415927f;
        child.level = asteroid.level + 1;
        _asteroidsRW[nextIndex++] = child;

        child.position = asteroid.position;
        child.velocity = asteroid.velocity + (float2(wang_rand(nextIndex), wang_rand(nextIndex * 2)) * 2 - 1) * childSpeed;
        child.alive = 1;
        child.radius = asteroid.radius * 0.5;
        child.rotation = (wang_rand(nextIndex * 3) * 2 - 1) * 3.1415927f;
        child.level = asteroid.level + 1;
        _asteroidsRW[nextIndex++] = child;
    }


    _asteroidsRW[idx].alive = 0;
}

This code allocates 2 new slots by incrementing nextAsteroid by 2. Assuming the asteroid level < 2 (i.e. it isn’t too small), I then proceed to generate 2 new smaller asteroids. These start at the position of the larger one, but take a on randomly adjusted velocity. Finally, the asteroid is killed off. I also just noticed I probably shouldn’t allocate child asteroids unless I intend to use them, but it doesn’t matter.

A quick tweak to the bullet code:

        if (bullet.lifetime > 0 && asteroid.alive) {
            if (length(bullet.position - asteroid.position) < (asteroid.radius + 2)) {
                _bulletsRW[bulletIdx].lifetime = -1;
                SplitAsteroid(asteroidIdx);
            }
        }

And it’s asteroid time:

Yay! Ok – I have to get on a plane. Hopefully by the time I’ve landed there’ll be sound effects or something…

For UI, sounds and game loops, head to the next post here: Shaderoids 2

Signed Distance Fields Part 8: Gradients, bevels and noise

Over the course of this series, I’ve covered the principles, generation and some fun uses of the distances stored in SDFs. However they have another very useful property – the gradient. In this post I’ll cover the basics of a gradient, and demonstrate a few simple uses of it.

The Gradient

Hopefully after the past 7 posts it’s become clear that a signed distance field gives you the distance to the closest edge from any point. The gradient (sometimes referred to as the differential) points either directly towards or away from the closest edge:

gradients

This diagram shows the standard distance visualisation’ for 2 fields used earlier in this blog. Overlaid are arrows that show the gradient. Inside the field, where the distance is negative, the gradient points directly towards the edge. Outside it, the gradient points directly away from the edge. These directions can be expressed as a 2D vector, which is referred to as the gradient vector. In a signed distance field, that vector is always of unit length (more on this later).

The gradients for our cat can be visualised for in a couple of ways:

gradientcolours

On the left, gradient.x (-1 to 1) is shown in the green channel and gradient.y (-1 to 1) is shown in the blue channel. My favoured visualisation on the right converts the gradient to an angle between 0 and 360 degrees, then uses the result to calculate a hue.

This next bit is a little mathematical, so if you don’t care about it, just know that:

  • The gradient of a point represents the direction towards (if outside the geometry) or away from (if inside the geometry) the closest edge
  • The gradient of a signed distance field is always a unit vector
  • We’re updating SignedDistanceFieldGenerator.End() to calculate and store the gradient vector in the green and blue channels of the field texture
  • To improve the field quality, I will also briefly introduce (but not go over in full) an improvement to our existing sweeping algorithm called an Eikonal Sweep

If you like, skip to the next section about bevels, or read on for some theory and codez!

Some theory

Calculation of the gradient requires a new concept – the partial derivatives of the field in x and y. This complex sounding term actually just means the rate of change in the x direction, and the rate of change in the y direction. To begin, assume the following definitions:

  • The distance stored in the pixel at coordinate [i,j] is referred to as Ui,j
  • The left neighbour at [i-1,j] is referred to as Ui-1,j
  • The right neighbour at [i+1,j] is referred to as Ui+1,j
  • The bottom neighbour at [i,j-1] is referred to as Ui,j-1
  • The top neighbour at [i,j+1] is referred to as Ui,j+1

With these in mind, now consider the rate of change in X. To calculate it, we need to choose which neighbour (left or right) is closer to the edge (i.e. which is smaller).

  • The smallest horizontal neighbour will be referred to as UH
  • If the left neighbour (Ui-1,j) is smallest, UH = Ui-1,j and delta X (or dX) is -1
  • If the right neighbour (Ui+1,j) is smallest, UH = Ui+1,j and delta X (or dX) is +1
  • The change in distance, UH-Ui,j is called delta U, or dU
  • Finally, the partial derivative in X is dU divided by dX, written dU/dX

In pseudo code:

GetPartialDerivativeInX(X,Y)
    //read Uij, Ui-1,j and Ui+1,j
    distance = GetDistance(X,Y) //Uij
    leftNeighbour = GetDistance(X-1,Y) //Ui-1,j
    rightNeighbour = GetDistance(X+1,Y) //Ui+1,j
    
    //choose either left or right neighbour to calculate
    //deltaU and deltaX
    if leftNeighbour < rightNeighbour:
        dU = leftNeighbour - distance 
        dX = -1
    else
        dU = rightNeighbour - distance 
        dX = 1
    end 

    //return the result 
    return dU/dX
end

We can do exactly the same for the partial derivative in Y (dU/dY), by calculating the smallest vertical neighbour (UV) and executing the same logic. Pseudo code for the partial derivative in Y is thus:

GetPartialDerivativeInY(X,Y)
    //read Uij, Ui,j-1 and Ui,j+1
    distance = GetDistance(X,Y) //Uij
    bottomNeighbour = GetDistance(X,Y-1) //Ui,j-1
    topNeighbour = GetDistance(X,Y+1) //Ui,j+1
    
    //choose either top or bottom neighbour to calculate
    //deltaU and deltaX
    if bottomNeighbour < topNeighbour:
        dU = bottomNeighbour - distance 
        dY = -1
    else
        dU = topNeighbour - distance 
        dY = 1
    end 

    //return the result 
    return dU/dY
end

The gradient (or derivative) is simply a vector that contains [dU/dX, dU/dY]. For an SDF this should naturally be of unit length and not need to be normalized. Thus, in psuedo code:

GetGradient(X,Y)
    return [GetPartialDerivativeInX(X,Y),GetPartialDerivativeInY(X,Y)]
end

Hopefully that wasn’t too tricky, but if it was, don’t worry – as with most things in game dev, using it is more important than instantly getting the theory.

Calculating the gradient in code

Next, we’ll update SignedDistanceFieldGenerator.End() to iterate over every pixel, calculate the gradient vectors, and store them using the green and blue channels of the field texture:

for (int y = 0; y < m_y_dims; y++) {     for (int x = 0; x = 0 ? 1.0f : -1.0f;         float maxval = float.MaxValue * sign;         //read neighbour distances, ignoring border pixels         float x0 = x > 0 ? GetPixel(x - 1, y).distance : maxval;
        float x1 = x  0 ? GetPixel(x, y - 1).distance : maxval;
        float y1 = y < (m_y_dims - 1) ? GetPixel(x, y + 1).distance : maxval;

        //use the smallest neighbour in each direction to calculate the partial deriviates
        float xgrad = sign*x0 < sign*x1 ? -(x0-d) : (x1-d);
        float ygrad = sign*y0 < sign*y1 ? -(y0-d) : (y1-d);

        //combine partial derivatives to get gradient
        Vector2 grad = new Vector2(xgrad, ygrad);

        //store distance in red channel, and gradient in green/blue channels
        Color col = new Color();
        col.r = d;
        col.g = grad.x;
        col.b = grad.y;
        col.a = d < 999999f ? 1 : 0;
        cols[y * m_x_dims + x] = col;
    }
}

The code above is mostly a condensed version of the pseudo code to calculate the partial derivatives in X and Y, then combine them into a Vector2 gradient for every pixel. The only additions are:

  • We use the sign of the source pixel to ensure all comparisons work correctly for inner (-ve) pixels
  • Borders are handled by just assigning a ‘really big number’ to pixel coordinates that are out of bounds
  • As a shortcut for dividing dU by 1 or -1, we just do/don’t negate it respectively
  • And of course, the result, along with the distance is written into a colour for storage in a texture

Reminder: Find the full function in SignedDistanceFieldGenerator.cs

Improved field sweeping

One subtle issue I’ve mostly avoided up until now is that our current technique for sweeping isn’t perfect. When first introducing the 8PSSEDT sweeping algorithm, I showed this diagram:

SweepDist

It demonstrates how the approximation of A’s distance based on its neighbour (B) isn’t perfect. Mathematically, we end up with a gradient that points in slightly the wrong direction, and a distance that is slightly larger than it needs to be.

The sweep is most accurate close to the edge of the geometry (where the data is perfect) and gets worse as it moves outwards. Thus most of our distance effects haven’t really suffered. However if we render the contours of the field in blue, the issue is visible:

sweepcontourartifacts

Note how further from the edge of the geometry, the sweep introduces straight contours, rather than nice curvy ones that match the geometry.

Some gradient effects are very sensitive to field accuracy, so I’ve added a new sweeping algorithm for this post called an Eikonal Sweep. This yields accurate distances and gradients, resulting in much nicer contours:

eikonalcontours

Eikonal sweeping is a much more mathematical approach and requires a whole post in itself, so I won’t cover it here. For now, know that a new EikonalSweep() function has been added to SignedDistanceFieldGenerator, to replace the existing Sweep() function.

Bevels

The bevel effect gives an image a fake 3D effect, by adding borders that are shaded in such a way as to look like they are edges:

bevelcat

This rather fancy looking border can do wonders for UI, and if used cleverly can even fake interaction with the lighting from the game world. The basic idea, as shown in the code below, is to use the gradient of the field with a pretend ‘light direction’ to shade the border:


//sample distance as normal
float d = sdf.r + _Offset;

//choose a light direction (just [0.5,0.5]), then dot product
//with the gradient to get a brightness
float2 lightdir = normalize(float2(0.5,0.5));
float diffuse = saturate(dot(sdf.gb,-lightdir));

//by default diffuse is linear (flat edge). combine with border distance
//to fake curvy one if desired
float curvature = pow(saturate(d/_BorderWidth),_BevelCurvature);
diffuse = lerp(1,diffuse,curvature);

//calculate the border colour (diffuse contributes 75% of 'light')
float4 border_col = _Fill * (diffuse*0.75+0.25);
border_col.a = 1;

//choose output
if(d < 0)
{
    //inside the goemetry, just use fill
    res = _Fill;
}
else if(d < _BorderWidth)
{
    //inside border, use border_col with tiny lerp across 1 pixel 
    //to avoid aliasing
    res = lerp(_Fill,border_col,saturate(d));
}
else
{
    //outside border, use fill col, with tiny lerp across 1 pixel 
    //to avoid aliasing 
    res = lerp(border_col,_Background,saturate(d-_BorderWidth));
}

Starting with the first bit:

//sample distance as normal
float d = sdf.r + _Offset;

//choose a light direction (just [0.5,0.5]), then dot product
//with the gradient to get a brightness
float2 lightdir = normalize(float2(0.5,0.5));
float diffuse = saturate(dot(sdf.gb,-lightdir));

This little section is the key to the whole effect. We pick a pretend ‘light direction’ (which could have been passed in as a shader parameter if desired), and dot product it with the gradient of the field. If the gradient directly opposes the light direction, it is assumed to be facing towards the light. If the gradient is perfectly aligned to the light direction, it is facing away from the light. On completion, diffuse contains a value between 0 and 1 indicating how lit the border is at this point.

Skipping over the curvature section for the moment, we then calculate the actual colour of the border.

//calculate the border colour (diffuse contributes 75% of 'light')
float4 border_col = _Fill * (diffuse*0.75+0.25);
border_col.a = 1;

This simple code selects the border colour by applying our lighting to the fill colour. The lighting is effectively taking 25% ambient light, and 75% of the diffuse light.

Finally, probably familiar by now, we select whether to use background, border or fill colour:

//choose output
if(d < 0)
{
    //inside the goemetry, just use fill
    res = _Fill;
}
else if(d < _BorderWidth)
{
    //inside border, use border_col with tiny lerp across 1 pixel 
    //to avoid aliasing
    res = lerp(_Fill,border_col,saturate(d));
}
else
{
    //outside border, use fill col, with tiny lerp across 1 pixel 
    //to avoid aliasing 
    res = lerp(border_col,_Background,saturate(d-_BorderWidth));
}

This is simply choosing _Fill if inside the shape, border_col if within the border, or _Background otherwise. The lerps serve no purpose for the effect other than to avoid aliasing by transitioning cleanly between colours over 1 pixel.

If we were to just use this code, ignoring the curvature section, the result is already visibily ‘3D’:

bevelcatstraight

The extra ‘curvature’ section just tweaks the diffuse lighting value based on distance from the edge of the geometry:


//by default diffuse is linear (flat edge). combine with border distance
//to fake curvy one if desired
float curvature = pow(saturate(d/_BorderWidth),_BevelCurvature);
diffuse = lerp(1,diffuse,curvature);

If _BevelCurvature is 0, curvature will always equal 1. As a result, the lerp will never change the value of diffuse. However, as _BevelCurvature increases, the distance, d, the pixel is from the edge will have an increasingly large effect. This causes diffuse to be increasingly biased towards a value of 1 close to the edge of the geometry. Visually, this produces a curvy look, rather than an angular look:

bevelcatcurved

The one remaining problem you may have spotted is the unpleasant discontinuities in the shading:

discontinuity

These unfortunate artefacts are a result of the fact that our source image simply wasn’t of a high enough quality to get good gradients out. Despite our efforts, the conversion from image to field isn’t perfect and the issue shows up when using gradient based effects. A high curvature tends to hide the issue a little, but the artefact is always there.

The most obvious solution is to use a very high resolution source image and then downsample. However, if not practical, one option is to blur the field slightly. This will make it less accurate in terms of distances, but soften out ‘crinkles’ in the field. I have added to SignedDistanceFieldGenerator a very simple Soften() function that applies a dumb blur to every pixel to demo the effect:

soften

As you can see, the subtle blur in the middle has little effect on the shape of the geometry, but does get rid of the artefacts. On the right the blur is turned right up which results in a blobby field (a fun effect in itself!). Note: please ignore the the fact that the lighting is inverted on the left image – shader bug when taking screen shots!

Whether softening is the right solution for you will depend on your scenario. Ideally for high quality data you start from high quality input assets, but if that’s not practical, sacrificing field accuracy for smoothness may be a good way to go.

Note: if you do decide to rely on softening, I recommend looking up better algorithms than the supplied Soften function, as I knocked it together in 10 minutes and it isn’t very smart!

Finding The Edge

The addition of gradient info to the field yields another handy property – a given point, with both the distance and direction to/from the closest edge, it is easy to find out the actual location of the closest edge. Whilst this isn’t so much an effect, it’s a useful feature and can generate some fun visualisations.

edgedist (1)

In this diagram we start off with sampling a distance field at a location, p = [1.5,1.5]. At this location, the distance field tells us:

  • the closest edge is at a distance, d = 3.53 away
  • the gradient, which points directly away from the closest edge, is g =[-0.71,-0.71

Armed with this information, it is easy to see that to get the location of the closest edge point we multiply the gradient by distance and subtract from the input point. Mathematically:

edge point, ep = p – d * g

Or in psuedo code:

GetClosestEdgePoint(Vector2 SrcPoint)
    //read distance,d and gradient,g from field
    Float Distance,Gradient;
    SampleField(SrcPoint, out Distance, out Gradient)

    //calculate and return point
    return SrcPoint - Distance * Gradient
End

We can visualise the results of edge finding by:

  • Sampling the SDF at a given UV as normal
  • Working out the corresponding edge UV
  • Sampling the SDF at the edge UV

If our logic is valid the edge sample should return a distance very close to 0 (as it is the edge!). To begin, we’ll create a function to find and sample the edge:

void sampleedge(float4 sdf, float2 uv, out float4 edgesdf, out float2 edgeuv)
{
    edgeuv = uv - sdf.gb*sdf.r*_MainTex_TexelSize.xy;
    edgesdf = samplesdf(edgeuv);
}

Next, we’ll add a new edge finder visualization to the shader:

else if(_Mode == 12) //EdgeFind
{
    //use sampleedge to get the edge field values and uv
    float2 edgeuv;
    float4 edgesdf;
    sampleedge(sdf,uv,edgesdf,edgeuv);

    //visualize error threshold of 1 pixel in r, and highlight geometry edge with b
    float edged = edgesdf.x;
    res.r = abs(edged) > 1 ? 1 : 0;
    res.g = 0;
    res.b = 1-saturate(abs(sdf.r));
}

This simple shader samples the edge distance field, and outputs a red pixel if the error is greater than 1. As an extra tweak, it highlights the actual edge blue so we can still see the geometry:

edgefind

The result, as you can see, is OK but far from perfect. Black pixels represent areas that successfully found the edge. However we can see distinct lines, worst on the cat, where the test failed. Comparing the edge finder to the gradient view it is possible to see where these errors occurred:

edgeerror

Any SDF contains discontinuities – areas where 2 neighbouring pixels have different closest edges. On the right hand image where the gradient is visualised, these discontinuities show up as sharp changes in colour. In the left image these correspond precisely to areas where edge finding failed.

The simplest way to solve this is to simply keep stepping toward the edge, getting a little bit closer each time:

bool GetClosestEdgePoint(Vector2 Point, int MaxSteps, out Result)
    For(i = 1 to MaxSteps)
        Float Distance,Gradient;
        SampleField(Point, out Distance, out Gradient)
        if(abs(Distance) < 0.5)
            break
        Point -= Distance * Gradient
    End
    Result = Point
    return abs(Distance) < 0.5
End

This pseudo code will iterate until a point has been found within 0.5 pixels or reaches a predefined maximum number of steps.

To implement it in the shader, we’ll introduce a new function called steptowardsedge, that takes as an input a sample and UV, then updates them both with a sample and UV closer to the edge:

void steptowardsedge(inout float4 edgesdf, inout float2 edgeuv)
{
    edgeuv -= edgesdf.gb*edgesdf.r*_MainTex_TexelSize.xy;
    edgesdf = samplesdf(edgeuv);
}

The sampleedge function is then updated to keep sampling until the edge is reached:

bool sampleedge(float4 sdf, float2 uv, out float4 edgesdf, out float2 edgeuv, out int steps)
{
    edgesdf = sdf;
    edgeuv = uv;
    steps = 0;

    [unroll(8)]
    for(int i = 0; i < _EdgeFindSteps; i++)
    {
        if(abs(edgesdf.r) < 0.5)
            break;
        steptowardsedge(edgesdf,edgeuv);
        steps++;
    }

    return abs(edgesdf.r) < 0.5;
}

This new version also returns true/false to indicate whether the edge was found, and outputs the number of steps taken. Using this new data, we can update the edge find visualisation to render a hue that shows how many steps were taken, or simply shows black if the edge was never found:

else if(_Mode == 12) //EdgeFind
{
    //use sampleedge to get the edge field values and uv
    float2 edgeuv;
    float4 edgesdf;
    int edgesteps;
    bool success = sampleedge(sdf,uv,edgesdf,edgeuv,edgesteps);

    //visualize number of steps to reach edge (or black if didn't get there')
    res.rgb = success ? HUEtoRGB(0.75f*edgesteps/8.0f) : 0;
}

Running this visualisation on the cat field gives us this image:

edgefinditerative

Here we see:

  • Red pixels took 0 steps – i.e. they are already on the edge
  • Orange pixels (the majority) took 1 step
  • Yellow pixels took 2 steps
  • Green pixels took 3 steps
  • No black pixels are visible, as the edge never failed to be found

This technique can also be handy if you’re dealing with a less accurate field, having softened it to reduce artefacts as mentioned in the bevels section:

edgefinditerativesoft

Here you can see many more pixels had to take 2 steps (yellow) due to the less accurate field, however they still all got there eventually.

Noise

Noise based effects aren’t specific to SDFs, but they certainly work well together. In case you’re not familiar with the concept of noise, it typically refers to a way of generating a grid of values that whilst random, are still continuous (think clouds instead of static):

noise

Above is a noise texture, generated by layering Perlin Noise at different frequencies, named after the father of this technique – Ken Perlin. I don’t want to go into too much depth on how noise works for this blog though – the key is, we’ve got a texture like the one above! In fact, we have a 4 channel texture, with a different ‘cloud pattern’ stored in each channel.

For reference, the function that generates it works by sampling Unity’s built in Perlin noise in multiple octaves:

Color[] GenerateNoiseGrid(int w, int h, int octaves, float frequency, float lacunarity, float persistance)
{
    //calculate scalars for x/y dims
    float xscl = 1f / (w - 1);
    float yscl = 1f / (h - 1);

    //allocate colour buffer then iterate over x and y
    Color[] cols = new Color[w * h];
    for (int x = 0; x < w; x++)
    {
        for (int y = 0; y < h; y++)
        {
            //classic multi-octave perlin noise sampler
            //ends up with 4 octave noise samples
            Vector4 tot = Vector4.zero;
            float scl = 1;
            float sum = 0;
            float f = frequency;
            for (int i = 0; i < octaves; i++)
            {
                for (int c = 0; c < 4; c++)
                    tot[c] += Mathf.PerlinNoise(c * 64 + f * x * xscl, f * y * yscl) * scl;
                sum += scl;
                f *= lacunarity;
                scl *= persistance;
            }
            tot /= sum;

            //store noise value in colour
            cols[y * w + x] = new Color(tot.x, tot.y, tot.z, tot.w);
        }
    }
    return cols;
}

This function is used to generate the pixel colours for a texture, which is then stored in the SDF class as m_noise_texture, and passed into the shader as _NoiseTex.

We then have a tiny function in the shader that samples the 4 channel texture, and combines each channel into a single animated value.

//samples the animated noise texture
float samplenoise(float2 uv)
{
    float t = frac(_NoiseAnimTime)*2.0f*3.1415927f;
    float k = 0.5f*3.1415927f;
    float4 sc = float4(0,k,2*k,3*k)+t;
    float4 sn = (sin(sc)+1)*0.4;       
    return dot(sn,tex2D(_NoiseTex,uv));
}

Although it looks a little funky, all this function is really doing is taking a parameter called _NoiseAnimTime and using it to generate a vector, sn that contains 4 different values between 0 and 1. This is then combined with the sampled _NoiseTex to get an output noise value between 0 and 1 that animates over time:

A cleverer but more expensive approach might have been to fully generate noise within the shader. This is entirely achievable on modern GPUs, but impractical on lower end mobile devices.

Now we’re going to use the output of this samplenoise function within the samplesdf function to modify the sampled distance. Our sample function now looks like this:

//helper to perform the sdf sample 
float4 samplesdf(float2 uv)
{
      //sample distance field
    float4 sdf = tex2D(_MainTex, uv);

    //if we want to do the 'circle morph' effect, lerp from the sampled 
    //value to a circle value here
    if(_CircleMorphAmount > 0)
    {
        float4 circlesdf = centeredcirclesdf(uv,_CircleMorphRadius);
        sdf = lerp(sdf, circlesdf, _CircleMorphAmount);
    }

    //re-normalize gradient in gb components as bilinear filtering
    //and the morph can mess it up 
    sdf.gb = normalize(sdf.gb);
                
    //if edge based noise is on, adjust distance by sampling noise texture
    if(_EnableEdgeNoise)
    {
        sdf.r += lerp(_EdgeNoiseA,_EdgeNoiseB,samplenoise(uv));
    }
                
    return sdf;
}
    

Assuming _EnableEdgeNoise is true, the updated function now:

  • Reads a noise value from 0 to 1
  • Uses it to lerp between 2 input parameters: _EdgeNoiseA and _EdgeNoiseB
  • Adds the result to the sampled distance

As all our effects work by calling samplesdf, they will now all support reading the ‘noisy’ distances. Playing with different effects and noise values, the results can be very varied and quite pleasing:

noiseycats

And of course, because it’s animated…

A note on broken fields

It’s worth mentioning at this point that both the earlier morph effect, and our updated noise effect break the field a little. By this I mean that after fiddling with the sampled values they are no longer guaranteed to represent the distance to the closest edge. This is surprisingly well hidden with most of the effects so far, as they only really care about values very close to the edge of the geometry where the errors are smallest. However, we can spot the result in the bevel effect:

bevelnogradient

One result of the broken field is that our calculated gradient values no longer match those of the field with noisy edges applied. As a result, the wibbly edge is still shaded as though it were flat.

There are many approaches to solving this issue depending on your use case. At the extreme end, you might perform your morphing or noise on the CPU, then do a full re-sweep of the field every frame. On the other hand, if the problem isn’t particularly visible, you might do nothing at all!

For this particular situation, we’re going to ignore the distance errors, but attempt to fix the gradient on the fly so the bevel works nicely. To do so, we’ll add a new function to the shader designed to only sample the distance, but ignore gradient:

//cut down version of samplesdf that ignores gradient
float samplesdfnograd(float2 uv)
{
      //sample distance field
    float4 sdf = tex2D(_MainTex, uv);

    //if we want to do the 'circle morph' effect, lerp from the sampled 
    //value to a circle value here
    if(_CircleMorphAmount > 0)
    {
        float4 circlesdf = centeredcirclesdf(uv,_CircleMorphRadius);
        sdf = lerp(sdf, circlesdf, _CircleMorphAmount);
    }
                
    //if edge based noise is on, adjust distance by sampling noise texture
    if(_EnableEdgeNoise)
    {
        sdf.r += lerp(_EdgeNoiseA,_EdgeNoiseB,samplenoise(uv));
    }

    return sdf;      
}

That should look pretty familiar! Now, we’ll add the following lines to the main sdf sampling function:

    //if requested, overwrite sampled gradient with one calculated live in
    //shader that takes into account morphing and noise
    if(_FixGradient)
    {
        float d = sdf.r;
        float sign = d > 0 ? 1 : -1;
        float x0 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(-1,0));
        float x1 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(1,0));
        float y0 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(0,-1));
        float y1 = samplesdfnograd(uv+_MainTex_TexelSize.xy*float2(0,1));
               
        float xgrad = sign*x0 < sign*x1 ? -(x0-d) : (x1-d);
        float ygrad = sign*y0 < sign*y1 ? -(y0-d) : (y1-d);
                
        sdf.gb = float2(xgrad,ygrad);
    }

This rather expensive function performs the same calculations within the shader that our earlier CPU code used to calculate the gradient by sampling neighbours. However, by being run in the shader and accounting for morphing / noise, it provides much more accurate gradients:

bevelfixedgradient

Note the shading on the wobbly edges now contains matching shadows.

Summary

That concludes what I think is probably the main section my intro to SDFs. Across the 7 posts you should have learnt the core principles of signed distance fields and seen how they can be used in 2D to generate some useful effects. Some future things to look at that I may cover, or you may wish to investigate are:

  • 3D fields. These work exactly like 2D fields in every way! Visualising them can be tricky though. A good place to start is to search for the marching cubes or marching tetrahedrons algorithms online.
  • Physics. Signed distance fields are ideally suited to collision detection and ray casting, as they make it very easy to find out the distance to an edge from any given point. If you’re looking at making a game with very bumpy or modifiable terrain, SDFs are ideal.
  • CSG (constructive solid geometry) is the technique of combining shapes in different ways (primarily ‘add’ and ‘subtract’) to get more complex geometry.
  • Lighting. Both in 3D and 2D, using distance fields to represent lighting volumes can be a very effective way of generating complex dynamic lighting in a scene.
  • Sparse fields. In 3D especially fields can get very memory hungry. However many have used a technique in which fields are stored in a tree, with high resolution data only maintained close to the surface of geometry.
  • Compression. Throughout this blog we’ve stored data in fairly expensive floating point textures. A simple modification is to store field distances in classic 1 byte per channel textures. This is achieved by storing distances as values between 0 and 1, then rescaling them to cover large signed ranges.

Maybe if I get some requests I’ll cover some of those at some point. Until then, enjoy the code here:

https://github.com/chriscummings100/signeddistancefields

Enjoy!

Signed Distance Fields Part 7: Some Simple Effects

Well we’re on number 7 and it’s time to start using the fields for some more interesting rendering. This post will focus exclusively on adding to the signed distance field shader some exciting new modes with which to render the field! All the code can be found on github here.

Soft Borders

The first and arguably one of the most useful tools in the SDF toolkit is the ability to render soft borders to your geometry. Thus far the most advanced aspect of our SDF shader is the ability to render a solid shape with coloured borders and a coloured background. However, zooming in on our cat:

nosoftborders

You can see that even though this is rendered using a 512×512 pixel field, we still get jaggy edges. This is no longer down to a quality issue with the field – we’ve simply hit the resolution of the screen, and, unable to blend, have had to choose for each pixel 1 of 3 colours (background, border or fill). Naturally we’ll see ‘stepping’ occurring along the edges, otherwise known as aliasing.

To address it, we’ll add a new mode to the sdffunc function in SignedDistanceField.shader:


else if (_Mode == 7) //SoftBorder
{
    float d = sdf.r + _Offset;

    if (d < -_BorderWidth)
    {
        //if inside shape by more than _BorderWidth, use pure fill
        res = _Fill;
    }
    else if (d < 0)
    {
        //if inside shape but within range of border, lerp from border to fill colour
        float t = -d / _BorderWidth;
        t = t * t;
        res = lerp(_Border, _Fill, t);
    }
    else if (d < _BorderWidth)
    {
        //if outside shape but within range of border, lerp from border to background colour
        float t = d / _BorderWidth;
        t = t * t;
        res = lerp(_Border, _Background, t);
    }
}

Here the distance is calculated as normal. Just like the ‘solid with border’ mode, we take the fill colour if d < -_BorderWidth, and stick with the background colour if d > _BorderWidth. However, within the border region we:

  • Calculate a value (t) between 0 and 1 based on signed distance (d) from the geometry
  • Multiply t by itself to get a curvy blend instead of a linear blend
  • Use t to lerp from the border colour to either the fill or background depending on whether inside or outside or the geometry

The result is a gentle blend from Background to Border to Fill at the edge of the field:

softborders

And of course, as usual, we can play with the border width and offset variables to make thinner/fatter the border or the whole cat:

softborderfun

Neon (aka bloom)

Our human brains assume something is emitting or reflecting bright light if it appears ‘saturated’, turning from coloured to white at the brightest point. A classic example of this is a neon sign, which appears coloured around the edges but almost-white at the centre:

neon

This fun effect was produced with some very simple code:


else if (_Mode == 8) //Neon
{
    float d = sdf.r + _Offset;

    //only do something if within range of border
    if (d > -_BorderWidth && d < _BorderWidth)          {                  //calculate a value of 't' that goes from 0->1->0
        //around the edge of the geometry
        float t = d / _BorderWidth; //[-1:0:1]
        t = 1 - abs(t);             //[0:1:0]

        //lerp between background and border using t
        res = lerp(_Background, _Border, t);

        //raise t to a high power and add in as white
        //to give bloom effect
        res.rgb += pow(t, _NeonPower)*_NeonBrightness;
    }
}

As with the soft border effect earlier, we utilise the distance value and the border with to get a value (t) called the interpolator. In this case we manipulate it mathematically to be a value that goes from 0 to 1 to 0 around the edge of the geometry. Again, similar to the soft border, we then use ‘t’ to interpolate from background colour to border colour.

The extra ingredient, using 2 new parameters, raises ‘t’ to a high power (make it a ‘sharp curve’), then simply adds it to the output colour. This is in effect adding some whiteness into the output which sharply curves up in brightness close to the edge of the geometry.

Technically speaking this is a simulation of the common ‘bloom’ effect, usually applied as a post effect in games to highlight bright areas of the screen. Whilst good quality bloom can be tricky to achieve on low end devices, if you just want some neon looking text or objects, the SDF approach can be extremely effective with no real GPU overhead.

Edge Textures

Up until now we’ve used some simple maths to blend between various colours at the edge of the field – first to provide simple softened edges, then to get a neon style effect that simulates bloom. However, to get a more flexible (and art driven) effect, we can use ‘edge textures’. Note: these are often referred to as ‘gradient textures’, but ‘gradient’ is an important term with different meanings in SDFs, so I’m reserving it for later!

gradientcat1

Here a similar algorithm to the earlier ones has been used, however the ‘t’ value calculated from distance-to-edge has been used to sample the following texture, created in Paint.Net:

gradient

The code is relatively simple, and very similar to the earlier effects


else if (_Mode == 9) //Edge Texture
{
    float d = sdf.r + _Offset;

    if (d < -_BorderWidth)
    {
        //if inside shape by more than _BorderWidth, use pure fill
        res = _Fill;
    }
    else if (d < _BorderWidth)
    {
        //if inside shape but within range of border, calculate a 
        //'t' from 0 to 1
        float t = d / _BorderWidth; //[-1:0:1]
        t = (t+1)*0.5;             //[0:1]

        //now use 't' as the 'u' coordinate in sampling _EdgeTex
        res = tex2D(_EdgeTex, float2(t,0.5));
    }
}

Just as with the soft border we’re taking the _Fill colour if inside the geometry by more than _BorderWidth, or sticking with the background colour if outside the geometry by more than _BorderWidth. Within the border, we calculate an interpolator (t), and use it to sample the new _EdgeTex texture parameter.

Note that this arguably only needs a ‘1D’ edge texture, but as they don’t really exist, the shader simply samples horizontally along the centre of a 2D texture (i.e. at v = 0.5).

One of the benefits of this effect is that with little effort we can get stylistic colouring that’d be a pain to reproduce mathematically. For example, using this edge texture:

gradient2

We get this rather stylish kitty:

gradientcat2

Drop shadows

This classic effect is incredibly useful for HUDs in games, as it helps deal with the fact that you can’t rely on your in game UI elements always being drawn on top of the same coloured background. Score text in the top left will be over a dark background in a night scene, and a bright background in a day scene. The solution is generally to add a ‘shadow’ to your text or HUD elements, so their outline can be made out whatever the background:

dropshadowcats

The basic effect is often achieved without SDFs by simply rendering the geometry twice, first the shadow at a slight offset, then the fill over the top. Our SDF version takes pretty much the same approach:


else if (_Mode == 10) //Drop shadow (Blog post 7)
{
    //sample distance as normal
    float d = sdf.r + _Offset;

    //take another sample, _ShadowDist texels up/right from the first
    float d2 = tex2D(_MainTex, uv+_ShadowDist*_MainTex_TexelSize.xy).r + _Offset;

    //calculate interpolators (go from 0 to 1 across border)
    float fill_t = 1-saturate((d-_BorderWidth)/_BorderWidth);
    float shadow_t = 1-saturate((d2-_ShadowBorderWidth)/_ShadowBorderWidth);

    //apply the shadow colour, then over the top apply fill colour
    res = lerp(res,_Border,shadow_t);
    res = lerp(res,_Fill,fill_t);                 
}

The sdffunc function is now being passed an extra argument- the uv from which it was read. We use this combined with the new _ShadowDist parameter and the texel size of the signed distance field to take a 2nd sample from the field, offset diagonally from the 1st. Note that unity sets things up such that _MainTex_TexelSize.xy = [1/_MainTex.width,1/_MainTex.height].

Next, some simple maths is applied to each of the distance samples to give an interpolator (t) that starts at 1 ‘inside’ the geometry, and transitions to 0 over the border. These are calculated for the fill as normal, and then for the shadow using the new _ShadowBorderWidth parameter. The purpose of these interpolators is to allow us to create soft transitions, as with the earlier soft border effect.

Finally, the shadow is applied using our _Border parameter for the colour, then the fill on top of it. The basic result with a small border width and identical shadow border width looks like this:

simpledropshadow

Nice! However this basic setup has 2 key issues that none SDF based approaches also suffer from. Firstly, a shadow only shows in one direction, meaning the full geometry wont always be outlined (sometimes but not always desirable). Secondly, too large a shadow distance shows ugly areas that highlight we’re just rendering 1 image on top of another:

dropshadowproblem

Fortunately, with the SDF approach, our ability to adjust the size of the shadow’s border means we can address both these issue at once, fattening the shadow to hide concave areas and give some subtle outline all round the shape:

dropshadownice

Morphing

The final basic effect for this section is the ability to morph between different fields very easily. Here, for example, is a circle morphing into a cat:

morphcats

First up, we’ll add a simple function into the shader that evaluates the field for a centered circle:


float4 centeredcirclesdf(float2 uv, float radius)
{
    //calculate offset from [0.5,0.5] in uv space, then multiply
    //by _MainTex_TexelSize.zw to get offset in texels
    float2 offset_from_centre = (uv - 0.5) * _MainTex_TexelSize.zw;

    //signed distance for a circle is |offset| - radius 
    float signeddist = length(offset_from_centre) - radius;

    //build result: [signeddist,0,0,1]
    float4 res = 0;
    res.x = signeddist;
    res.yz = 0;
    res.w = 1;
    return res;
}

Any field could be used, but for the sake of the demo, this saves us generating the field texture for a circle and passing it through as a parameter (call that an exercise for the reader!).

Next, we’ll create a new function called samplesdf. This will be a wrapper that replaces our existing tex2D calls:


float4 samplesdf(float2 uv)
{
      //sample distance field
    float4 sdf = tex2D(_MainTex, uv);

    //if we want to do the 'circle morph' effect, lerp from the sampled 
    //value to a circle value here
    if(_CircleMorphAmount > 0)
    {
        float4 circlesdf = centeredcirclesdf(uv,_CircleMorphRadius);
        sdf = lerp(sdf, circlesdf, _CircleMorphAmount);
    }
                
    return sdf;
}

The first line simply samples the input distance field at the given uv as normal. If _CircleMorphAmount > 0 we then go on to sample the circle distance field. The 2 distance samples are then combined with a lerp to get a result, which is returned.

Finally, existing calls to tex2D, such as that in the core fragment shader are replaced with calls to samplesdf:

//sample distance field
float4 sdf = samplesdf(i.uv);

All we are doing here is sampling the distance for 2 fields (in this case a cat and a circle) and lerping between them. The result can be fed into any of the existing effects and it just works:

Whilst you may not find many opportunities in games to morph between cats and circles, this simple effect can be a very efficient way of creating elegant UI transitions such as dialog boxes appearing / disappearing. In the past I’ve also made more creative use of it to animate lightning and particle based effects (shameless No Stick Shooter plug)!

Summary

This section has covered a variety of ways to use the distance sampled from a field to achieve simple effects, generally falling into a few categories:

  • Avoiding aliasing with soft borders
  • Creating border effects
  • Creating drop shadows
  • Transitions (aka morphing)

By combining and tweaking these in different ways, simple low resolution distance fields can create a huge range of effects, all of which can of course be animated and used either in game (if the style is right) or for very flexible UI elements.

However, an issue thus far is that all the effects above have been pretty ‘1 dimensional’. All they’ve really used is the distance, and as such the type of effects is fairly limited. In the next blog I’ll get on to how field gradients and noise can make things more interesting.

Signed Distance Fields Part 8: Gradients, bevels and noise

Signed Distance Fields Part 6: Images

If you haven’t found this series incredibly exciting then shame on you, but you might find this a little more intriguing! Up until now we’ve dealt with some pretty simple utilities to draw circles, lines and rectangles. These were great for showing the core concepts of fields, but aren’t particularly useful when actually making cool stuff. So, by the time we reach post number 6, it’s finally time to take ‘cool stuff step 1’, and generate fields from textures you can make in any standard art package. Excited now? I thought so. Code here if you need it!

Basic texture sweeping

With everything we’ve built so far, creating a field from a texture is incredibly simple. We start with a monochrome image – each pixel is either black or white. White represents a ‘solid’ bit, black an ’empty’ bit:

rectangles

I’ve added a default constructor, so we can create an empty SignedDistanceFieldGenerator, then set it up with an extra function call. Our new function will be called LoadFromTexture:


public void LoadFromTexture(Texture2D texture)
{
    Color[] texpixels = texture.GetPixels();
    m_x_dims = texture.width;
    m_y_dims = texture.height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        if (texpixels[i].r > 0.5f)
            m_pixels[i].distance = -99999f;
        else
            m_pixels[i].distance = 99999f;
    }
}

Here we read a unity texture that is assumed to have been loaded elsewhere. Note that to work, the texture must be marked as ‘read/write’ when importing, and for best results, uncompressed. The pixels are read, and the dimensions/pixel buffers are setup to match the texture:


    Color[] texpixels = texture.GetPixels();
    m_x_dims = texture.width;
    m_y_dims = texture.height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];

Next, we iterate over every pixel. If the input colour is greater than 0.5 (i.e. close to white), we treat it as solid. If it is less than 0.5, we treat it as empty. The solid pixel is interpreted as ‘internal’ geometry, and so is given a very large negative number. The empty pixel is interpreted as ‘external’ geometry, and so is given a very large positive number:


    for (int i = 0; i < m_pixels.Length; i++)
    {
        if (texpixels[i].r > 0.5f)
            m_pixels[i].distance = -99999f;
        else
            m_pixels[i].distance = 99999f;
    }

A simple extra button in SignedDistanceField.cs to load the ‘rectangle’ texture in the sample project completes this first step:


if (GUILayout.Button("Load texture"))
{
    SignedDistanceFieldGenerator generator = new SignedDistanceFieldGenerator();
    generator.LoadFromTexture(Resources.Load("rectangles"));
    field.m_texture = generator.End();
}

Visualising the distances for the field, we now see bright red (+ve) external pixels, or bright green (-ve) internal pixels:

rectangleunswept

Crazily, that’s the hard part! By writing in 1 of these 2 ‘extreme’ values into our field, we’ve generated an extremely imprecise field. However, the ‘0 boundary’ that denotes the edge of a solid bit is still correct. When rendering, the shader will blend between pixels to calculate a distance at any given point. Thus when it blends between a solid pixel (-99999) and an empty pixel (99999), there will be a tiny point right on the boundary where a distance of 0 is read. By doing some tinkering with the numbers in our signed distance shader (technically rendering in ‘border’ mode with a border size of 99998!), we can actually visualise this:

rectangleedges

The great thing is, half of the previous post was devoted to a specific task – taking a signed distance field for which only the edge points are valid, and sweeping it to get a completely valid one. All we have to do is run the sweep, with no changes whatsoever, and we convert the image to a field:

Once its a field, any standard signed distance effects such as our ‘solid-with-borders’ shader can be used:

rectangleborders

And there you have it. The truth is, we’d already done most of the work for textures, so this first step was really just loading them up.

Aliasing

Unfortunately, whenever images are involved, eventually the issue of aliasing pops up. I chose rectangles quite specifically for the above demo – they’re all nice vertical or horizontal lines that fit perfectly into a grid of pixels. If, however we take the following image of a line:

aliaslines

You can see how MS Paint has generated zig-zaggy shapes along the edges. The effect in our signed distance field is not pleasant:

aliaslinesfield

Note: if you spotted it, apologies for images inverted horizontally – shader bug whilst taking screen shots!

One way of solving this would be to simply use giant textures (lets say 4k by 4k), do the whole sweeping process, then scale down (aka downsample) the result. SDFs actually scale down very well, so this isn’t a crazy idea. However hi res input data isn’t always available, and even when it is, burning CPU processing it may not be desirable.

Many more advanced art packages will hide this problem very effectively using anti aliasing. Pixels that are only partially on the line will be only partially coloured in, making them look a lot more smooth. This very similar image from Paint.Net looks much nicer:

aliaslines2

Sadly though, it seems to make no difference to our distance field:

aliaslines2field

This is because our current image loading algorithm completely ignores the ‘solidity’ of the pixel. As far as it’s concerned, a pixel is either solid or empty. As a result, the anti-aliasing performed by more advanced drawing packages to make nicer images hasn’t helped.

Using a low resolution version of this cat (thanks to Tricia Moore) we can see just how much we lose from ignoring this anti aliasing:

aliascat2

To show off the problem, the original 512×512 cat was shrunk down to 128×128 pixels in Paint.Net, which cleverly used anti aliasing to get a slightly blurry but otherwise nice image. On the left some detail has been lost but the a soft edge maintains the illusion of curved edges. Once again though, ignoring this anti aliasing has resulted in a pretty much useless SDF on the right.

To solve the problem, lets first look at an image pixel by pixel and think about what the field values really should be:

antialiasedrectangle
Left: 3×2 rectangle shape user has attempted to draw, Right: the result when written into the 5×5 grid of pixels

In this diagram we imagine a user has opened up a package such as Paint.Net and managed to draw a rectangle exactly 3 pixels wide and 2 pixels high into a 5×5 texture. However, whilst they aligned it perfectly horizontally, the rectangle overlaps pixel borders vertically. The result, shown on the right, is Paint.Net’s best guess at representing the rectangle in pixel format. The pixels that were fully covered are fully filled, but the pixels that were half covered have been blended in with the background.

Now we ask the question, given the texture that Paint.Net generated, what should the corresponding signed distance field texture look like?

antialiasedrectanglefield
Left: the same 3×2 rectangle with signed distances overlaid, Right: the texture with the same signed distances overlaid

Here you can see the box (left) and the texture (right) with the desired signed distance values written in. The first clear thing that stands out is that a ‘solidity’ of 0.5 in the input image suggests a signed distance value of 0. As expected, we can also see the more ‘solid’ pixels are assigned a negative distance (aka inside the shape), and the less solid pixels are assigned a positive distance (aka outside the shape).

Unfortunately, in addition to this handy info, we can also see there is no ‘clear’ answer to what solidity corresponds to what distance. If we were to examine the vertical edges on the left/right sides, we’d assume that a solidity of 0 meant a distance of 0.5. However If we were to examine the horizontal edges at the top/bottom, we’d assume it meant a distance of 1.

Techniques to addressing this have certainly been developed (check out this for example), though they are not entirely trivial and beyond the scope of this post. For now we’ll take the relatively good results that can be obtained simply by compromising and assuming edge distances ranging from -0.75 to 0.75. This leads us to:


public void LoadFromTextureAntiAliased(Texture2D texture)
{
    Color[] texpixels = texture.GetPixels();
    m_x_dims = texture.width;
    m_y_dims = texture.height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        //r==1 means solid pixel, and r==0 means empty pixel and r==0.5 means half way between the 2
        //interpolate between 'a bit outside' and 'a bit inside' to get approximate distance
        float d = texpixels[i].r;
        m_pixels[i].distance = Mathf.Lerp(0.75f, -0.75f, d);
    }
}

This extremely simple version of the texture loader just reads a pixel as in our previous example, then uses it to lerp between distance values of 0.75 (outside) and -0.75 (inside). Testing it out:

antialiascat
Left: cat using anti aliased approach, Right: cat using the old approach

Not only do we now have a softer border, but it is also slightly thicker due to more accurate approximation of the edge pixels. Similarly, looking at the earlier aliased lines:

antialiaslines

Yummy! Whilst the result is still clearly not perfect, it is substantially better – especially given the source texture is only 128×128 pixels.

For reference, here’s 3 more cats (though I’m actually more of a dog person myself) showing the cat generated from different source data with the border width adjusted for comparison.

threecats
Left 128×128 source, Middle 256×256, Right 512×512 (orig data)

Downsampling

This is turning into a long post, but I want to cover some fun effects soon, and they won’t look cool unless we nail the quality of our fields first! Interpreting anti aliased textures from packages such as Paint.Net / Photoshop has improved the conversion from image to field, but we still get some artefacts along the edges that it’d be nice to clean up:

lumpyfield

These come from the fact that our simple approach to extracting the field from images still struggles to get something perfect. Even a cleverer one would have trouble attaining really high quality, as the simple fact is data was lost when the paint package had to convert some nice clean geometry into a blurry low res image.

To fix these artefacts, we’ll utilise down-sampling, in which a higher resolution image is loaded/swept, then scaled down to the desired field size.

This function builds a new field by scaling down the existing one by 50%:


public void Downsample()
{
    //to keep life simple, only downsample images that can be halfed in size!
    if ((m_x_dims % 2) != 0 || (m_y_dims % 2) != 0)
        throw new Exception("Dumb downsample only divides by 2 right now!");

    //calculate new field size, and allocate new buffer
    int new_x_dims = m_x_dims / 2;
    int new_y_dims = m_y_dims / 2;
    Pixel[] new_pixels = new Pixel[new_x_dims * new_y_dims];

    //iterate over all NEW pixels
    for (int y = 0; y < new_y_dims; y++) 
    {
        int srcy = y * 2;
        for (int x = 0; x < new_x_dims; x++) 
        {
            int srcx = x * 2;

            //combine the 4 pixels in the existing field that this one corresponds to
            float new_dist = 0;
            new_dist += GetPixel(srcx,srcy).distance * 0.25f;
            new_dist += GetPixel(srcx+1, srcy).distance * 0.25f;
            new_dist += GetPixel(srcx, srcy+1).distance * 0.25f;
            new_dist += GetPixel(srcx+1, srcy+1).distance * 0.25f;

            //also divide distance by 2, as we're shrinking the image by 2, and distances
            //are measured in pixels!
            new_dist /= 2;

            //store new pixel
            new_pixels[y * new_x_dims + x].distance = new_dist;
        }
    }

    //once done, overwrite existing pixel buffer with new one and store new dimensions
    m_pixels = new_pixels;
    m_x_dims = new_x_dims;
    m_y_dims = new_y_dims;
}

Here we allocate a new buffer, and calculate the new dimensions for a field that is exactly half the size of the existing one:


//calculate new field size, and allocate new buffer
int new_x_dims = m_x_dims / 2;
int new_y_dims = m_y_dims / 2;
Pixel[] new_pixels = new Pixel[new_x_dims * new_y_dims];

Next, we loop over all the new pixels, and for each one calculate the location of the corresponding pixel in the existing image. With this we end up with:

  • x,y: The coordinate of the new pixel in the new field
  • srcx,srcy: The coordinate of the existing pixel in the existing field

Now the key code in the loop:


//combine the 4 pixels in the existing field that this one corresponds to
float new_dist = 0;
new_dist += GetPixel(srcx,srcy).distance * 0.25f;
new_dist += GetPixel(srcx+1, srcy).distance * 0.25f;
new_dist += GetPixel(srcx, srcy+1).distance * 0.25f;
new_dist += GetPixel(srcx+1, srcy+1).distance * 0.25f;

//also divide distance by 2, as we're shrinking the image by 2, and distances
//are measured in pixels!
new_dist /= 2;

//store new pixel
new_pixels[y * new_x_dims + x].distance = new_dist;

This reads 4 distances from in the existing field in a square and combines them to create 1 new distance to be stored in the new field. The final bit divides the new distance by 2, as we are dividing the size of the field by 2.

By loading a higher resolution image than necessary, sweeping it as normal and then down-sampling it to the desired field resolution we get a much nicer result:

smoothfield

This field was still built from a relatively low res 256×256 image. However, after down-sampling to a 128×128 field the result is much more pleasing.

Summary

This post focused on building high quality fields from images, as I want to get onto some fun effects soon but fun effects need good fields! The images in this blog are typically around 512×512 pixels, so here’s our cat image loaded and swept at 1024×1024, then downsampled to 512×512:

hirescat

Pretty tasty! The one final step we could go into is the use of eikonal equations to normalize the field, but I’ll leave that for another post.

There’s lots more boring stuff to learn about compression, normalizing, more sweeping, csg operations etc etc, but now that the foundation exists, it’s time for some cool s**t. Hence, next post, we’ll look at some funky 2D effects!

And for the 6th time, the code for this blog can be found on git hub here!

Signed Distance Fields Part 7: Some Simple Effects

Signed Distance Fields Part 5: Faster generation and sweeping

In the previous post we looked at the basics of signed distance field generation. Having setup a framework for generating a field, we wrote a series of functions for adding different shapes, all of which:

  • Iterated over every pixel in the field
  • Calculated the distance to the shape at the centre of the pixel
  • Updated the pixel in the field by comparing the new distance to the current one

And of course, here is the git hub repo with it all in!

The process worked well, but also highlighted some inconsistencies when combining shapes, demonstrated when drawing 2 rectangles very close together:

BigOverlapping3
Artifacts overlapping 2 rectangles. Left: the combined rectangles with a small border. Middle: the resulting distance field showing an error. Right: the resulting artifacts when using a big border.

The algorithm has no way of knowing that the central pixels of the large rectangle become very ‘deep’ when the 2 smaller rectangles are drawn overlapping one another. This inconsistency breaks the border effect as shown in the right hand image.

In addition to errors, the algorithm has to iterate over every pixel for every shape drawn, so can get slow pretty quickly!

Bounded/padded shapes

To begin, we’ll attempt to speed things up by implementing a simple optimisation. Rather than updating the entire field for every shape, we’ll pick a subset of pixels to update, and leave the rest alone. This will be substantially faster, as a shape that only affects a small area of the field will only have to iterate over a small number of pixels.

To start, lets create a couple of helpers:


//simple function to clamp an integer xy coordinate to a valid pixel
//coordinate within our field
void ClampCoord(ref int x, ref int y)
{
    if (x = m_x_dims) x = m_x_dims - 1;
    if (y = m_y_dims) y = m_y_dims - 1;
}

This very basic function (which we’ll use later) just clamps a coordinate to within the dimensions of our field.

Building on that, a slightly more complex one:


//takes an axis aligned bounding box min/max, along with a padding value, and 
//outputs the integer pixel ranges to iterate over when using it fill out a field
void CalcPaddedRange(Vector2 aabbmin, Vector2 aabbmax, float padding, 
                     out int xmin, out int ymin, out int xmax, out int ymax)
{
    //subtract the padding, and floor the min extents to an integer value
    xmin = Mathf.FloorToInt(aabbmin.x-padding);
    ymin = Mathf.FloorToInt(aabbmin.y-padding);

    //add the padding and ceil the max extents to an integer value
    xmax = Mathf.CeilToInt(aabbmax.x+padding);
    ymax = Mathf.CeilToInt(aabbmax.y+padding);

    //clamp both coordinates to within valid range
    ClampCoord(ref xmin, ref xmax);
    ClampCoord(ref ymin, ref ymax);
}

This function takes an axis aligned bounding box – aka a none rotated rectangle. It is described by the position of its top left corner (aabbmin), and the position of its bottom right corner (aabbmax). In addition, the function takes a padding value, which we’ll use to fatten the box. It then proceeds to calculate the integer range of pixels we’d need to loop over in order to touch every single pixel within the fattened box.

Now to get moving. Lets add new function in SignedDistanceFieldGenerator.cs to draw a line, which, for want of a better naming convention we’ll call PLine as it is a padded line.


//generates a line, writing only to the pixels within a certain distance
//of the line
public void PLine(Vector2 a, Vector2 b, float pad)
{
    //calculate axis aligned bounding box of line, then use to get integer
    //range of pixels to fill in
    Vector2 aabbmin = Vector2.Min(a, b);
    Vector2 aabbmax = Vector2.Max(a, b);
    int xmin, ymin, xmax, ymax;
    CalcPaddedRange(aabbmin, aabbmax, pad, 
                    out xmin, out ymin, out xmax, out ymax);

    Vector2 line_dir = (b - a).normalized;
    float line_len = (b - a).magnitude;

    for (int y = ymin; y <= ymax; y++)
    {
        for (int x = xmin; x <= xmax; x++)
        {
            //mathematical function to get distance from a line
            Vector2 pixel_centre = new Vector2(x + 0.5f, y + 0.5f);
            Vector2 offset = pixel_centre - a;
            float t = Mathf.Clamp(Vector3.Dot(offset, line_dir), 0f, line_len);
            Vector2 online = a + t * line_dir;
            float dist_from_edge = (pixel_centre - online).magnitude;

            //update the field with the new distance
            Pixel p = GetPixel(x, y);
            if (dist_from_edge < p.distance)
            {
                p.distance = dist_from_edge;
                SetPixel(x, y, p);
            }
        }
    }
}

This is only a slight modification of the earlier brute force implementation. At the beginning of the function the axis aligned bounding box of the line is calculated using its start/end points (a and b):


    Vector2 aabbmin = Vector2.Min(a, b);
    Vector2 aabbmax = Vector2.Max(a, b);

This is then passed into our CalcPaddedRange function, along with a new pad parameter, which outputs the integer range of pixels covered by the fattened bounding box:


    int xmin, ymin, xmax, ymax;
    CalcPaddedRange(aabbmin, aabbmax, pad, 
                    out xmin, out ymin, out xmax, out ymax);

The final change is the loops. Where previously they iterated over every pixel (starting from 0, going up to the width/height of the field), they now only iterate over the calculated range of pixels:


for (int y = ymin; y <= ymax; y++)
{
    for (int x = xmin; x <= xmax; x++)
    {

That’s the lot for this first section – everything else is identical to our earlier brute force version. We’ve not changed the algorithm for drawing a line in the slightest. All we’ve done is work out a reduced number of pixels to fill in.

As with our earlier tests, we can now add a button in SignedDistanceField.cs to generate a padded line:


        if (GUILayout.Button("Padded line"))
        {
            SignedDistanceFieldGenerator generator = new SignedDistanceFieldGenerator(32, 32);
            generator.PLine(new Vector2(8, 15), new Vector2(23, 20), 5);
            field.m_texture = generator.End();
        }

This is a roughly diagonal line in a 32×32 field, with a padding of 5 pixels. At first glance, the resulting render (with a border width of 1 pixel) looks perfectly good:

PaddedLine
Diagonal line with 1 pixel border

However, lets look at the distance visualisation:

PaddedLineDist
Distance visualisation with low scale to show unwritten outer pixels

Our distance visualiser renders a shade of red for positive distances to the nearest pixel, but by scaling the shade down, we can see that outside of the bounding box there’s lots of pixels with a huge distance value where we haven’t yet written anything.

The issue with this approach rears it’s ugly head when we try to make the line get fatter. Lets instead go back to rendering it as normal, but with a border width of about 6 pixels:

PaddedLineChopped
Chopping artefacts due to a lack of valid pixels

Our line has now been chopped off at the edges where we hit pixels that think they’re miles away from the edge of anything!

The reader might point out at this stage that rather than make things better, I’ve actually made things worse. On top of the earlier issues with field inconsistencies, we now have to deal with only partial field data. However, even without further refinement this technique is frequently put to work, as many uses of signed distance fields don’t care about every pixel – often the important ones are very close to the edge or inside of the geometry.

Assume, for example, I knew in my game I was never going to try and create borders fatter than 4 pixels. In that situation, generating all my field shapes with a padding of 5 pixels will give me all the data I need, and may save a lot of time.

Whilst I won’t touch upon it yet, many uses of signed distance fields avoid trying to even store data for empty pixels. When 3D voxels come into play, memory runs out very quickly – a 1024x1024x1024 voxel grid of 4 byte floating point distances would take 4GB of memory! Instead, techniques such as sparse octtrees are used, which can store hi resolution voxel data close to the surface of the geometry, but little to no data for voxels far from the surface.

Before proceeding, PCircle and PRect have also been added. As with the PLine function, these are both almost identical to their brute force counterparts, with the added use of the bounding box.

For a circle, the bounding box is calculated by simply building a square centred at the centre of the circle, with a width and height of 2 * the radius:


//circle bounding box calculation - a square that surrounds the circle
Vector2 aabbmin = centre - Vector2.one * rad;
Vector2 aabbmax = centre + Vector2.one * rad;
int xmin, ymin, xmax, ymax;
CalcPaddedRange(aabbmin, aabbmax, pad, 
                out xmin, out ymin, out xmax, out ymax);


And for a rectangle (aka a box) the bounding box is, you guessed it, the box!


//rectangle bound box 'calculation' - just use the min/max of the rectangle
Vector2 aabbmin = min;
Vector2 aabbmax = max;
int xmin, ymin, xmax, ymax;
CalcPaddedRange(aabbmin, aabbmax, pad, 
                out xmin, out ymin, out xmax, out ymax);

Sweeping

If you really want a clean signed distance field, you’re going to have to sweep it (hahaha yeah I went there). With this technique we assume that the only accurate pixels in the field are those that lie on the edge of the geometry, and that any others may have suffered inconsistencies, or not even been written to at all.

The most common sweeping algorithm has such a great name it has always to be written in full… yes, it’s the 8-points Signed Sequential Euclidean Distance Transform. Catchy! I haven’t actually managed to track down the original paper on it, however Richard Mitton on Coders Notes presents a nice and clean implementation in c++. The one we’ll use for this blog is largely a C# port of his work.

First up, a quick function to identify if a pixel is ‘outside’ or ‘inside’ the surface:


bool IsOuterPixel(int pix_x, int pix_y)
{
    if (pix_x < 0 || pix_y = m_x_dims || pix_y >= m_y_dims)
        return true;
    else
        return GetPixel(pix_x, pix_y).distance >= 0;
}

This really just returns true if the distance value is greater than 0, with an extra catch to treat pixels outside the bounds of the field as outside the geometry.

The algorithm starts by iterating over the existing pixels and clearing out anything that doesn’t contain reliable data. The metric we’ll use is whether a pixel is an edge pixel, defined by this little helper:


bool IsEdgePixel(int pix_x, int pix_y)
{
    bool is_outer = IsOuterPixel(pix_x, pix_y);
    if (is_outer != IsOuterPixel(pix_x - 1, pix_y - 1)) return true; //[-1,-1]
    if (is_outer != IsOuterPixel(pix_x, pix_y - 1)) return true;     //[ 0,-1]
    if (is_outer != IsOuterPixel(pix_x + 1, pix_y - 1)) return true; //[+1,-1]
    if (is_outer != IsOuterPixel(pix_x - 1, pix_y)) return true;     //[-1, 0]
    if (is_outer != IsOuterPixel(pix_x + 1, pix_y)) return true;     //[+1, 0]
    if (is_outer != IsOuterPixel(pix_x - 1, pix_y + 1)) return true; //[-1,+1]
    if (is_outer != IsOuterPixel(pix_x, pix_y + 1)) return true;     //[ 0,+1]
    if (is_outer != IsOuterPixel(pix_x + 1, pix_y + 1)) return true; //[+1,+1]
    return false;
}

Here the pixel is first identified as an outer or inner one. It is then compared to all 8 of its neighbours. If an outer pixel has any inner neighbours, its on an edge, and visa-versa.

With these 2 tools, the ‘clear’ function is as follows:


public void ClearNoneEdgePixels()
{
    for (int y = 0; y < m_y_dims; y++)
    {
        for (int x = 0; x  0 ? 99999f : -99999f;
            SetPixel(x,y,pix);
        }
    }
}

This iterates over every pixel, and identifies any that are not on an edge. In these cases we basically want to update the pixel to say “we still know if you’re inside or outside, but we have no idea how far you are from the surface”. To achieve this the pixel is assigned either a negative or positive very big number. Here’s a couple of examples of the distance visualiser before and after a clear:

ClearNoneEdge

In both examples, on the left is a field generated using one of the earlier functions, while the right shows what happens when the ClearNoneEdgePixels function is applied. Be aware for the remainder of the post I’ve switched to point sampled rendering to clearly show individual pixels, hence the ‘blocky’ look!

The reader might note here that most of the soft edges of the ‘padded circles’ have been obliterated, indicating they could have been generated with very tight padding with no negative consequences.

The sweeping algorithm is much simpler if it doesn’t have to worry about outsides/insides or positive/negative distances. As a result, the common approach is to run the algorithm once for inner pixels, and once for outer pixels, then combine the result at the end. To achieve this, we build 2 separate grids:


//reads current field into 2 grids - 1 for inner pixels and 1 for outer pixels
void BuildSweepGrids(out float[] outside_grid, out float[] inside_grid)
{
    outside_grid = new float[m_pixels.Length];
    inside_grid = new float[m_pixels.Length];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        if (m_pixels[i].distance < 0)
        {
            //inside pixel. outer distance is set to 0, inner distance
            //is preserved (albeit negated to make it positive)
            outside_grid[i] = 0f;
            inside_grid[i] = -m_pixels[i].distance;
        }
        else
        {
            //outside pixel. inner distance is set to 0,
            //outer distance is preserved
            inside_grid[i] = 0f;
            outside_grid[i] = m_pixels[i].distance;
        }
    }
}

When hitting a +ve pixel, the distance is written to the outer grid, but 0 is written to the inner grid, causing the inner sweep to ignore it.

Similarly, when hitting a -ve pixel, the distance is written (negated) to the inner grid, but 0 is written to the outer grid, causing the outer sweep to ignore it.

The core of this, and indeed many sweep algorithms can be seen in the following diagram:

SweepDist

In knowing the distance from B to its nearest edge point, we can take a guess at the distance from A to that same edge point, by taking the distance from A to B, and adding it to the distance from B to the edge.

It’s worth noting that the result isn’t perfect – the calculation gives us an upper bound for the distance from A to the edge. In reality, the distance from A to the edge is a little shorter, as shown here:

SweepDist2

With some cleverer algorithms it is possible to refine these errors a little, but, for many use cases they can be ignored with minimal issues.

Armed with this observation, we can introduce the Compare function:


//compares a pixel for the sweep, and updates it with a new distance if necessary
public void Compare(float[] grid, int x, int y, int xoffset, int yoffset)
{
    //calculate the location of the other pixel, and bail if in valid
    int otherx = x + xoffset;
    int othery = y + yoffset;
    if (otherx < 0 || othery = m_x_dims || othery >= m_y_dims)
        return;

    //read the distance values stored in both this and the other pixel
    float curr_dist = grid[y * m_x_dims + x];
    float other_dist = grid[othery * m_x_dims + otherx];

    //calculate a potential new distance, using the one stored in the other pixel,
    //PLUS the distance to the other pixel
    float new_dist = other_dist + Mathf.Sqrt(xoffset * xoffset + yoffset * yoffset);

    //if the potential new distance is better than our current one, update!
    if (new_dist < curr_dist)
        grid[y * m_x_dims + x] = new_dist;
}

This function takes a pixel (A) at coordinate [x,y], and another pixel (B) at coordinate [x+xoffset,y+yoffset]. Both pixels may already have valid distances associated with them, or they may still contain really-big-number. A new candidate distance is calculated by adding the distance, A->B, to the distance, B->edge. If the new distance is better than the current one, it is updated.

Now for the backbone of the sweep – the SweepGrid function:


public void SweepGrid(float[] grid)
{
    // Pass 0
    //loop over rows from top to bottom
    for (int y = 0; y < m_y_dims; y++)
    {
        //loop over pixels from left to right
        for (int x = 0; x = 0; x--)
        {
            Compare(grid, x, y, 1, 0);
        }
    }

    // Pass 1
    //loop over rows from bottom to top
    for (int y = m_y_dims - 1; y >= 0; y--)
    {
        //loop over pixels from right to left
        for (int x = m_x_dims - 1; x >= 0; x--)
        {
            Compare(grid, x, y, 1, 0);
            Compare(grid, x, y, 0, 1);
            Compare(grid, x, y, -1, 1);
            Compare(grid, x, y, 1, 1);
        }

        //loop over pixels from left to right
        for (int x = 0; x < m_x_dims; x++)
        {
            Compare(grid, x, y, -1, 0);
        }
    }
}

Utilising the Compare function, this sweeps over all the pixel rows, first from top to bottom, then from bottom to top. For each row, it sweeps from left to right, and right to left. Within these loops, pixels are compared to their neighbours, and distances updated if better ones are found. I don’t want to go into the exact proof that the above set of loops guarantee the correct result, as that’s a scientific paper in itself! However, these videos hopefully show the process in action very convincingly:

The final function that glues it all together and writes the grids back into the field pixels is Sweep:


public void Sweep()
{
    //clean the field so any none edge pixels simply contain 99999 for outer
    //pixels, or -99999 for inner pixels
    ClearNoneEdgePixels();

    //seperate the field into 2 grids - 1 for inner pixels and 1 for outer pixels
    float[] outside_grid,inside_grid;
    BuildSweepGrids(out outside_grid, out inside_grid);

    //run the 8PSSEDT sweep on each grid
    SweepGrid(outside_grid);
    SweepGrid(inside_grid);

    //write results back
    for (int i = 0; i < m_pixels.Length; i++)
        m_pixels[i].distance = outside_grid[i] - inside_grid[i];
}

The only new functionality here is the last write-back, which subtracts the inner grid from the outer one. This in effect just combines the 2 grids, and makes the inner pixels negative again.

If you’re interested, I made the videos using a slightly body new class called AnimatedSweep. It’s not meant to be a shining example of nice code, but it’s there if you want a peak!

Summary

Phew! Turns out sweeping takes more to explain than planned. By now you’ve got a good understanding of writing to distance fields, a way of minimizing performance cost of doing so and a robust technique for cleaning up errors.

We’ve definitely not covered everything on sweeping here. The 8SSD method described so far is technically an optimal version of an ‘8 tap’ algorithm, meaning for any given pixel, all 8 neighbours are checked and the best candidate is selected. In addition, we have the less accurate ‘4 tap’ algorithms which ignore the diagonals, and the more mathematical but highly accurate eikonal equation that I may go over later in the series.

Next episode will add a final powerful tool to your 2D SDF generation kit – generating from images created in anything from MS Paint to Adobe Photoshop.

Check out the code here for the latest stuff.

Signed Distance Fields Part 6: Images

Signed Distance Fields Part 4: Starting Generation

Up until now we’ve focused on the concepts of signed distance fields and how to render them using a special shader. However, a place to stumble once you understand these tools is how to get/find/create the fields in the first place. As it happens, there are loads of different ways that vary depending on source data, and performance/quality requirements. At the end of this post you should have a basic (if inefficient) approach to generating fields that works out the box, and is easy to extend.

As always, the full git repo is on github here.

The Field Generator

Having cleaned up the  signed distance field renderer in the previous post, we’ll switch to another class called SignedDistanceFieldGenerator. For a full listing, see SignedDistanceFieldGenerator.cs. This class provides 2 facilities:

  • Starting/finishing generation by allocating a temporary buffer for
    the field, then at the end converting it to a unity texture
  • A set of utility functions we’ll gradually add to for generating fields in
    a variety of ways

Our generator starts with the definition of a field pixel:


public struct Pixel
{
    public float distance;
}

This simple class is used to store the properties of a pixel during generation. It contains just 1 value for now:

  • distance: as you might expect, this is the calculated signed
    distance for the pixel

We have a constructor that does nothing more than allocate a buffer and initialise all the pixels to a very large distance, meaning they will always be interpreted as ‘outside’ the geometry:


public SignedDistanceFieldGenerator(int width, int height)
{
    m_x_dims = width;
    m_y_dims = height;
    m_pixels = new Pixel[m_x_dims * m_y_dims];
    for (int i = 0; i < m_pixels.Length; i++)
        m_pixels[i].distance = 999999f;
}

A couple of accessors to help with reading/writing the field pixels:


Pixel GetPixel(int x, int y)
{
    return m_pixels[y * m_x_dims + x];
}
void SetPixel(int x, int y, Pixel p)
{
    m_pixels[y * m_x_dims + x] = p;
}

And finally, the slightly more complex ‘End’ function, which converts our array of pixels into a texture:


public Texture2D End()
{
    //allocate an 'RGBAFloat' texture of the correct dimensions
    Texture2D tex = new Texture2D(m_x_dims, m_y_dims, TextureFormat.RGBAFloat, false);

    //build our array of colours
    Color[] cols = new Color[m_pixels.Length];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        cols[i].r = m_pixels[i].distance;
        cols[i].g = 0;
        cols[i].b = 0;
        cols[i].a = m_pixels[i].distance < 999999f ? 1 : 0;
    }

    //write into the texture
    tex.SetPixels(cols);
    tex.Apply();
    m_pixels = null;
    m_x_dims = m_y_dims = 0;
    return tex;
}

Within this ‘End’ function, the first step is to allocate a new texture for our field:


    Texture2D tex = new Texture2D(m_x_dims, m_y_dims, TextureFormat.RGBAFloat, false);

The texture format used here is 'RGBAFloat'. Unlike most formats, this uses a full floating point value for each colour channel. Whilst expensive (128 bits per pixel!), it is the most convenient format to use for these tutorials, as it allows us to store any numbers in our pixels without worrying about ranges or precision. That said, later in this series we’ll look at techniques for storing fields in much more compact formats.

Next, we generate the colours:


    //build our array of colours
    Color[] cols = new Color[m_pixels.Length];
    for (int i = 0; i < m_pixels.Length; i++)
    {
        cols[i].r = m_pixels[i].distance;
        cols[i].g = 0;
        cols[i].b = 0;
        cols[i].a = m_pixels[i].distance < 999999f ? 1 : 0;
    }

Here, the distance is written into the red channel and if the pixel has a sensible distance (indicating it has been written to), it is assigned an alpha value of 1, so the shader can identify it as containing valid data. Strictly speaking this last bit is unnecessary, but it’ll help with our debugging visualisations later. Eventually we will use the green and blue channels to visualise field gradients, but they’re not necessary right now.

The final step simply writes the colours into the texture, clears out any temporary data and returns:


    //write into the texture
    tex.SetPixels(cols);
    tex.Apply();
    m_pixels = null;
    m_x_dims = m_y_dims = 0;
    return tex;

With this lot written, we can start developing different approaches to fill out that grid of pixels, then convert the result to a texture and render it using our SignedDistanceField class.

Brute Force Geometry

The simplest approach to generation is to write a function for each type of geometric primitive we’re interested in (i.e. line/circle/square) that iterates over / potentially modifies every single pixel in the field (hence brute force).

A line

To begin, lets add the function for a line. We’ll call this one “BFLine” to indicate it uses the brute force approach:


//brute force line generator - iterates over every pixel and calculates signed distance from edge of rectangle
public void BFLine(Vector2 a, Vector2 b)
{
    Vector2 line_dir = (b - a).normalized;
    float line_len = (b - a).magnitude;

    for (int y = 0; y < m_y_dims; y++)
    {
        for (int x = 0; x < m_x_dims; x++)
        {
            //mathematical function to get distance from a line
            Vector2 pixel_centre = new Vector2(x + 0.5f, y + 0.5f);
            Vector2 offset = pixel_centre - a;
            float t = Mathf.Clamp(Vector3.Dot(offset, line_dir), 0f, line_len);
            Vector2 online = a + t * line_dir;
            float dist_from_edge = (pixel_centre - online).magnitude;

            //update the field with the new distance
            Pixel p = GetPixel(x, y);
            if (dist_from_edge < p.distance)
            {
                p.distance = dist_from_edge;
                SetPixel(x, y, p);
            }
        }
    }
}

This function:

  • Iterates over every pixel in the field, and for each one calculates the distance from the centre of the pixel to the line
  • If the calculated distance is lower than the existing one, update it.

The idea here follows naturally from our definition of a signed distance field – each pixel contains the distance to the closest edge of the geometry. A line is really just a single edge when you think about it, so we check every pixel in the field, and if the distance to the line is lower than the distance we have stored, we update it.

I’m going to try and avoid covering the maths for every shape in this blog, as the info is out there and I don’t want to end up writing a maths blog just yet! As this is our first shape though, here it is broken down a little more:


//calculate the centre of the pixel (the +0.5 is because we want the centre, not the corner)
Vector2 pixel_centre = new Vector2(x + 0.5f, y + 0.5f);

//calculate the offset from start of the line (a) to the centre of the pixel
Vector2 offset = pixel_centre - a;

//use a dot product to project the offset onto the line, giving us a 't' value
//then clamp the 't' value to between 0 and line len
float t = Mathf.Clamp(Vector3.Dot(offset, line_dir), 0f, line_len);

//calculate the position of the point of the line using our calculated/clamped t
Vector2 online = a + t * line_dir;

//finally, work out the distance from our pixel to the point on the line
float dist_from_edge = (pixel_centre - online).magnitude;

Before proceeding, lets go back to SignedDistanceField.cs. Right at the bottom you’ll find a ‘custom inspector’ (see the Unity docs here for more info), which lets us add a few extra tools to the inspector for our field object. Within the OnInspectorGUI function is the following code:


SignedDistanceField field = (SignedDistanceField)target;
if (GUILayout.Button("Generate Centre Line"))
{
    SignedDistanceFieldGenerator generator = new SignedDistanceFieldGenerator(16,16);
    generator.BFLine(new Vector2(3.5f, 8.5f), new Vector2(12.5f, 8.5f));
    field.m_texture = generator.End();
}

This button contains the simple code to:

  • Create temporary signed distance field generator for a 16×16 field
  • Add a line to it using BFLine
  • Generate the texture by calling ‘end’ and point our field at it

The resulting render is the one we generated right at the start of this blog:

HorzLineSDF1Pix
A line generated with BFLine

A circle

Let’s now add the same function for a circle:


//brute force circle generator - iterates over every pixel and calculates signed distance from edge of circle
public void BFCircle(Vector2 centre, float rad)
{
    for (int y = 0; y < m_y_dims; y++)
    {
        for (int x = 0; x < m_x_dims; x++)
        {
            Vector2 pixel_centre = new Vector2(x + 0.5f, y + 0.5f);
            float dist_from_edge = (pixel_centre - centre).magnitude - rad;

            Pixel p = GetPixel(x, y);
            if (dist_from_edge < p.distance)
            {
                p.distance = dist_from_edge;
                SetPixel(x, y, p);
            }
        }
    }
}

Once again, we’re iterating over every pixel, calculating the distance to the edge of the circle, and updating the stored distance if necessary. A new feature in this case however is that the circle is solid, so we generate negative values for pixels inside the circle, and positive values for pixels outside the circle.

As before, we can also add a button to the custom inspector to generate a field using this new function. There’s a few in there, but this is the first one that’s a little more fun:


if (GUILayout.Button("Generate 2 circles"))
{
    SignedDistanceFieldGenerator generator = new SignedDistanceFieldGenerator(16, 16);
    generator.BFCircle(new Vector2(5, 7), 3);
    generator.BFCircle(new Vector2(10, 8), 3.5f);
    field.m_texture = generator.End();
}

Remember our generator doesn’t just write into the pixel buffer – it first checks if the new distance is a better candidate than the stored one. As a result, we can use the function to add a combination of 2 circles, simply by calling it twice with different parameters.

If I remember correctly, the result should be something along these lines:

DualCircles
3 circles generated and combined with BFCircle

Note: I may have fiddled with positions/sizes since I took that snap shot. The principle is the same but the numbers might not be!

A rectangle

Without going into too much more detail, here’s the code to generate a rectangle


//brute force rectangle generator - iterates over every pixel and calculates signed distance from edge of rectangle
public void BFRect(Vector2 min, Vector2 max)
{
    Vector2 centre = (min + max) * 0.5f;
    Vector2 halfsz = (max - min) * 0.5f;

    for (int y = 0; y < m_y_dims; y++)
    {
        for (int x = 0; x < m_x_dims; x++)
        {
            //get centre of pixel
            Vector2 pixel_centre = new Vector2(x + 0.5f, y + 0.5f);

            //get offset, and absolute value of the offset, from centre of rectangle
            Vector2 offset = pixel_centre - centre;
            Vector2 absoffset = new Vector2(Mathf.Abs(offset.x), Mathf.Abs(offset.y));

            //calculate closest point on surface of rectangle to pixel
            Vector2 closest = Vector2.zero;
            bool inside;
            if (absoffset.x < halfsz.x && absoffset.y < halfsz.y)
            {
                //inside, so calculate distance to each edge, and choose the smallest one
                inside = true;
                Vector2 disttoedge = halfsz - absoffset;
                if (disttoedge.x < disttoedge.y)
                    closest = new Vector2(offset.x < 0 ? -halfsz.x : halfsz.x, offset.y);
                else
                    closest = new Vector2(offset.x, offset.y < 0 ? -halfsz.y : halfsz.y);
            }
            else
            {
                //outside, so just clamp to within the rectangle
                inside = false;
                closest = new Vector2(Mathf.Clamp(offset.x, -halfsz.x, halfsz.x), Mathf.Clamp(offset.y, -halfsz.y, halfsz.y));
            }
            closest += centre;

            //get offset of pixel from the closest edge point, and use to calculate a signed distance
            Vector3 offset_from_edge = (closest - pixel_centre);
            float dist_from_edge = offset_from_edge.magnitude * (inside ? -1 : 1);

            Pixel p = GetPixel(x, y);
            if (dist_from_edge < p.distance)
            {
                p.distance = dist_from_edge;
                SetPixel(x, y, p);
            }
        }
    }
}

That one may look a little scarier, but only because working out the signed distance to the edge of a rectangle is slightly more complex. If inside, we find the closest point by picking the closest edge and calculating the distance to it, then negate the result (as it’s inside). If outside, we simply clamp our pixel’s centre to within the rectangle, and calculate the distance from the unclamped position to the clamped one.

By now you’re hopefully seeing a pattern appearing, and by the end of this blog there may well be a few more ‘brute force’ functions to play with checked into the git repo.

Indeed, as relatively simple algorithms exist to calculate the distance from any pixel to any edge of any polygon (convex or concave), and also to calculate whether that pixel is inside or outside the polygon, we can generate a field for any arbitrary closed polygon. If the reader wants to have a go at this, here’s a couple of clues:

  • You already have the code to find the closest distance from a point to a line (aka edge!)
  • You know combining multiple shapes simply involves comparing the existing pixel distance to your new candidate distance.
  • To find out if a point is inside or outside a polygon, you simply need to draw a line from anywhere outside your shape to your point, and see how
    many edges it hits along the way. Check out the algorithm
    here

Some problems!

An astute (aka picky!) reader might have noticed a few issues or apparent inconsistencies with the algorithms I’ve described so far. The first point of possible confusion is the slightly misleading ‘is pixel closer’ test:


Pixel p = GetPixel(x, y);
if (dist_from_edge < p.distance)

This is used extensively throughout the code, but at first glance it appears there may be a problem. The distance test works for positive numbers, but for negative numbers, it will technically pass if the distance to the edge of our new shape is closer than the stored one (e.g. -10 is less than -1)!

Fortunately this is not a bug. Currently we are dealing with what are known as Union operations in Constructive Solid Geometry. In simple terms, we want to combine the solid bits of the existing shape with the shape we’re adding. As such, for pixels outside the geometry, we want to keep the closest pixels, but for pixels inside the geometry, we want to keep the deepest pixels.

A scarier problem that’s a little harder to see is that our field can actually become genuinely inconsistent! Lets generate 2 very close rectangles in a relatively hi res (64×64) field:


SignedDistanceFieldGenerator generator = new SignedDistanceFieldGenerator(64, 64);
generator.BFRect(new Vector2(4, 4), new Vector2(60, 35));
generator.BFRect(new Vector2(4, 34), new Vector2(60, 60));
field.m_texture = generator.End();

The result:

BigOverlapping
A beautiful big rectangle by combining 2 smaller ones!

At first glance, all looks fine, but what if we go back to visualising the field:

BigOverlappingDistance
Distance visualisation of combined rectangles, showing internal inconsistencies

By fiddling with the distance visualiser a bit (see the distance visualisation scale property in the shader), we can clearly show the makeup of our field, the centre line is clearly wrong. Indeed, there shouldn’t actually a centre line!

At its worst, the centre of the rectangle should contain the biggest (negative) distance (it is, after all, the ‘deepest’ pixel), however the distance stored is almost 0. The reason is clear when looking back at our generator. The centre pixel was very close to an inside edge of both rectangles, ergo we’ve only ever tried to write a distance close to 0 into it. In general terms, our distances are correct for outside the solid geometry, but have become inconsistent inside the solid geometry.

This inconsistency rears it ugly head in practice when we try to add a border:

BigOverlappingBorder
Border artifact resulting from internal inconsistencies

Our border shader algorithm from earlier in the blog assigns a different colour to pixels close to the edge, both inside and outside. As a result, we’ve incorrectly rendered an edge inside the shape.

There’s 2 real approaches to dealing with this. First up, we could just ignore the problem and work around it! Sounds a bit crazy, but sometimes it’s better to ask how you can achieve your goal with the tools you have, rather than spend lots of time making the tools perfect. We could for example create a perfectly reasonable border effect by only colouring pixels outside the field, making the dodgy internal pixels irrelevant. Indeed, many applications deal perfectly well with field inconsistencies, so spending valuable CPU fixing things may be a waste of time! However, if like the Borg perfection be your goal, the next post on sweeping should make you feel much better.

Lastly is our old friend – performance. Our current approach involves iterating over every pixel for every piece of geometry. With the simple stuff so far that’s all well and good, but what if we wanted to render a complex polygon with 5000 edges into a 512*512 texture? 1310720000 operations is what! Fortunately, as we shall see, sweeping comes to the rescue here too.

A quick word on fonts

Right at the start of this series I mentioned SDFs in games had been popularised by fonts, and this post may have taken you some way to seeing why. True type fonts effectively contain a lot of vector based geometry, ideally suited for generating signed distance fields even with the simple approach described above.

Unfortunately, loading ttf files and interpreting their geometry is probably a series in itself. If the reader is interested however, a quick google will reveal a good few c# libraries for loading font geometry. Alternatively, the excellent (and now free) Text Mesh Pro for Unity is SDF driven, and I believe makes it possible to access the raw field data it generates.

Summary

By the time this series is complete, the generator checked in will probably have more functionality than that covered so far. However, with just the tools in this post you now have the power to generate functional signed distance fields. The same principles can even be applied in 3D to voxels, which I may cover down the line! Next up, we’ll move on to refinement/optimisation of fields with an approach called sweeping and look at ways to use it for generating fields from images.

Signed Distance Fields Part 5: Faster generation and sweeping