Shaderoids 2

Right, I left off last time with the beginnings of a game, now its time to add some sounds and other shizzle. It’s up on git hub now here



Unfortunately one thing that just isn’t possible right now is playing sounds from the GPU (booo). However, I can implement a system where our GPU side code can ‘request’ a given sound to be played. Then the CPU side will just act on those requests.

We’ll start with the simplest possible 1-shot ‘audio engine’ you can build in unity….

    //array of clips + list of allocated audio sources
    AudioClip[] _clips;
    List _audioSources;

    //load list of clips
    void LoadClips(params string[] names) {
        _audioSources = new List();
        _clips = names.Select(a => Resources.Load(a)).ToArray();

    //find a free audio source, allocating if necessary, and play clip
    void PlayClip(int idx) {
        AudioSource src = null;
        for(int i = 0; i < _audioSources.Count; i++) {
            if(!_audioSources[i].isPlaying) {
                src = _audioSources[i];
        if (!src) {
            src = gameObject.AddComponent();
        src.clip = _clips[idx];

I’ve used that pattern a few times for very quick prototypes. It basically lets you load a set of audio files from resources, then ask to play them once.

Lets load 3 audio files I ripped out of the earlier you tube vid of the actual game…

LoadClips("fire", "explode", "blop");

Next, a simple structure that will be mirrored CPU and GPU side. This represents a request to play a sound, and for now just contains the integer id of the sound to be played:

    public struct SoundRequest {
        public int id;

I allocate a compute buffer to contain an array of those requests, alongside a CPU buffer to copy the requests into. On top, the counters structure now has a ‘numSoundRequests’ variable. At the end of the game update we can now read out and process the requests:

    for (int i = 0; i < _cpuCounters[0].numSoundRequests; i++)

And for the last bit of plumbing, a GPU side function in the compute side for making a request:

//matching ids of sounds loaded in game.cs
#define SND_FIRE 0
#define SND_EXPLODE 1
#define SND_BLOP 2

void PlaySound(int id) {
    int idx;
    InterlockedAdd(_counters[0].numSoundRequests, 1, idx);
    if (idx < _maxSoundRequests) {
        _soundRequestsRW[idx].id = id;

Now I can play sounds from GPU code! For example, when the player hits the space bar to fire a bullet:

if (_keyStates[' '].pressed) {
    //...stuff for firing the bullet...

I hook this up to firing and exploding asteroids by now, and it’s already starting to feel gamey!


Hmmmm.. text rendering on GPU. I want my GPU code to at least be able to ask to draw a given character – e.g. DrawCharacter(‘a’). I don’t really want to cheat and start using bitmap fonts, so… I guess I’m programming a vector font. I create Font.cginc, which is full of stuff like this….

    switch (character) {
    case 'a': 
    case 'A':
        AddFontLine(idx++, float2(0, 0), float2(0, 0.5f));
        AddFontLine(idx++, float2(1, 0), float2(1, 0.5f));
        AddFontLine(idx++, float2(0.5f, 1), float2(0, 0.5f));
        AddFontLine(idx++, float2(0.5f, 1), float2(1, 0.5f));
        AddFontLine(idx++, float2(0, 0.5f), float2(1, 0.5f));


Basically, I’m filling up a big buffer, which has 16 line slots for each character. i.e. for the character code 65 (‘A’), there are up to 16 lines stored in a font buffer (_font) at offset 16*65.

I’ve created a simple compute shader called ‘BuildFont’, which gets dispatched with 256 threads – 1 per ascii character, though I’ve obviously only filled in ‘A’-‘Z’ and a few other bits and pieces!

Now a simple test shader:

[numthreads(256, 1, 1)]
void DrawFont(uint3 id : SV_DispatchThreadID)
    int lineIdx = AllocLines(LINES_PER_CHARACTER);
    int charOffset = id.x * LINES_PER_CHARACTER;

    float2 pos = float2(id.x / 16, id.x % 16) * float2(20,25) + 100;
    float2 scl = float2(15,20);

    for (int i = 0; i < LINES_PER_CHARACTER; i++) {
        float2 a = _font[charOffset + i].a * scl + pos;
        float2 b = _font[charOffset + i].b * scl + pos;
        AddLine(lineIdx++, a, b);

This renders a 16×16 grid of characters on screen. For each one, we draw all 16 lines (note: empty lines are degenerate so don’t do anything). The result:


Woohoo! 1 hardcoded vector font. No better way to spend the first chunk of a 10 hour plane flight. Unfortunately I discovered the plane has wifi, so was able to check out the actual asteroids font, and discovered it is way less curvy than mine. Oh well.

Teeny bit of UI

To get started on the UI and game loop, I’ll add a single ‘UpdateGame’ kernel.

void UpdateGame(uint3 id : SV_DispatchThreadID)
    int gameMode = _counters[0].gameMode;
    if (gameMode == 0) {
        gameMode = 1;

    else if (gameMode == 1)
        //main menu


This is probably the closest bit to a classic game, as it’s a single function that gets called once per frame to update general game state. At this stage, we only have the game mode, which starts of as 0 (init) then proceeds to 1 (main menu). I’ve also added our first bit of UI:

void DrawText_OneCoinOnePlay() {
    float2 pos = float2(350,150);
    float2 scl = float2(18, 20);
    float2 spacing = float2(scl.x*1.25, 0);

    DrawCharacter('1', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('C', pos, scl); pos += spacing;
    DrawCharacter('O', pos, scl); pos += spacing;
    DrawCharacter('I', pos, scl); pos += spacing;
    DrawCharacter('N', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('1', pos, scl); pos += spacing;
    DrawCharacter(' ', pos, scl); pos += spacing;
    DrawCharacter('P', pos, scl); pos += spacing;
    DrawCharacter('L', pos, scl); pos += spacing;
    DrawCharacter('A', pos, scl); pos += spacing;
    DrawCharacter('Y', pos, scl); pos += spacing;

At this point though I want to get some proper startup code in. Ideally when the player presses ‘start’ we would init the game state and fire them into the first level. This creates a need for compute shaders to trigger other compute shaders, which leads onto…

More indirect dispatching

Enter the indirect dispatch pattern. The basic concept of indirect dispatch is that sometimes you need a compute shader to control how a following compute shader will be dispatched. For example, our current ‘spawn asteroids’ shader is dispatched 10 times to spawn 10 asteroids, but what if we want to control how many asteroids are spawned from a compute shader instead?

For this purpose I create a new compute buffer called _dispatch, with the type ComputeBuffer.IndirectArguments. It’s just a buffer of uints, each of which will represent one of the arguments we’d normally provide to the Dispatch function.

For this simple purpose I’ll have a load of parameters that can be passed to shaders via what used to be the counters buffer, and is now the globals buffer. In addition, I’ll have a simple setup kernel that checks which kernel is to be dispatched, and configures the indirect buffer accordingly. This approach is not particularly scalable but simple and effective.

So our globals structure now has 2 new ‘request’ integers:

        public int requestClearAsteroids;
        public int requestSpawnAsteroids;

And the game update can make requests like this:

        //when 'c' is pressed, ask to clear and then spawn 4 new asteroids
        if (_keyStates['c'].pressed) {
            _globals[0].requestClearAsteroids = true;
            _globals[0].requestSpawnAsteroids = 4;
            gameMode = 1;

A special ‘SetupDispatch’ shader now takes a predefined kernel id constant to run, and then uses the request values to fill out the dispatch settings:

//dispatch setup function that takes the kernel id being setup for and runs whatever logic
//is necessary for it
void SetupDispatch(uint3 id: SV_DispatchThreadID)
    switch (_kernelIdRequested) {
        //the CLEAR ASTEROIDS dispatch will dispatch across nothing if not requested, or
        //all asteroids if requested
        _dispatch[0] =  _globals[0].requestClearAsteroids ? (_maxAsteroids+255)/256 : 0;
        //SPAWN_ASTEROIDS dispatches the number of requested asteroids
        _dispatch[0] = (_globals[0].requestSpawnAsteroids + 255) / 256;
    default: {
        _dispatch[0] = 0; 
    _dispatch[1] = 1;
    _dispatch[2] = 1;

And functions like our SpawnAsteroids kernel look almost identical, with the small caveat that they may need to read the request counter to avoid overrunning:

//spawns 'n' requested asteroids
void SpawnAsteroids(uint3 id : SV_DispatchThreadID) {
    //read count (either from uniform or indirect request)
    int maxCount = _threadCount < 0 ? _globals[0].requestSpawnAsteroids : _threadCount;
    _globals[0].requestSpawnAsteroids = 0;
    if (id.x < maxCount)
        //usual stuff

The last bit of plumbing is a new DispatchIndirect helper CPU side, which first dispatches the setup kernel before dispatching the main one:

    //dispatches the 'setup dispatch' shader to configer the dispatch args for a given
    //dispatch type, then does an actual dispatch indirect on the requested kernel
    void DispatchIndirect(int kernel, int dispatchIndex) {
        _asteroidsShader.SetInt("_kernelIdRequested", dispatchIndex);

        _asteroidsShader.SetInt("_threadCount", -1);
        _asteroidsShader.DispatchIndirect(kernel, _dispatchBuffer);

And finally the CPU game update can first dispatch UpdateGame, then dispatch both ClearAsteroids and SpawnAsteroids on the off chance they have work to do.

        DispatchIndirect(kernelClearAsteroids, KID_CLEAR_ASTEROIDS);
        DispatchIndirect(kernelSpawnAsteroids, KID_SPAWN_ASTEROIDS);

Phew! Lots of setup, but that’s a nice simple example of indirect dispatch being used in anger.

Spawning the player

One final pain going into this lot is spawning the player from compute. The game has to wait until there’s no asteroids in the way before spawning, which is actually none trivial on GPU. To do so, all asteroids must be checked against the player. Thus I had to write a couple of new kernels. The first checks players that want to spawn, and marks them as ‘can spawn’:

//prepares all players that want to spawn for spawning, and defaults their 'canSpawn'
//to true
[numthreads(256, 1, 1)]
void PreparePlayerSpawning(uint3 id : SV_DispatchThreadID)
    if (id.x < _threadCount)
        PlayerState player = _playersRW[id.x];
        if (player.wantsToSpawn) {
            player.position = float2(1024, 768) * 0.5;
            player.velocity = 0;
            player.rotation = 0;
            player.canSpawn = true;
        _playersRW[id.x] = player;

The 2nd is very similar to the original CollidePlayerAsteroid, which compares the players to all asteroids, and if they overlap, sets ‘can spawn’ back to false’:

//checks all players that want to spawn against asteroids, and kills the 'canSpawn' flag
//for any that are too close to asteroids
[numthreads(256, 1, 1)]
void UpdatePlayerSpawning(uint3 id : SV_DispatchThreadID)
    if (id.x < _threadCount)
        int playerIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (playerIdx*_maxAsteroids);

        PlayerState player = _playersRW[playerIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (player.wantsToSpawn && asteroid.alive) {
            if (length(player.position - asteroid.position) < (asteroid.radius + 10 + 50)) {
                _playersRW[playerIdx].canSpawn = false;

Then with this cheeky code in UpdateAndDrawPlayer:

        PlayerState player = _playersRW[id.x];
        if (player.wantsToSpawn && player.canSpawn) {
            player.wantsToSpawn = false;
            player.canSpawn = false;
            player.alive = true;

We can detect players that both want to and can spawn, and spawn em!

With some tweaks to the game loop (partly to track current level), and the ability to spawn players, it’s all starting to look pretty cool:

More text and scores and things

Somewhere along the way I felt dirty requesting loads of lines for every render, so added a 2nd buffer and rendering pipeline for requesting characters (just like you can request lines). The basic concept is that, just like with lines, I can submit a list of requests for characters by code, size and position. I then have another procedural pass in DrawLines.shader that uses this function:

        //reads character id, then looks it up in lines buffer which should contain font
        v2f charvert(uint id : SV_VertexID, uint inst : SV_InstanceID)
            int vertsPerCharacter = (LINES_PER_CHARACTER * 2);
            int charIdx = id / vertsPerCharacter;

            int charVertex = id % vertsPerCharacter;
            int charLine = charVertex / 2;
            int charLineVertex = charVertex % 2;

            Character c = characters[charIdx];

            Line l = lines[*LINES_PER_CHARACTER + charLine];
            float2 p = charLineVertex==0 ? l.a : l.b;

            p = p * c.scl + c.pos;

            v2f o;
            o.vertex = toScreen(p);
            return o;

Just like the old lines one, it takes a vertex id, breaks it down into character index, chracter vertex, character line and line vertex, then fiddles with it and outputs it!

Now for some numbers. All numbers in asteroids have a 0 on the end, cos it looks cooler. So the new DrawNumberTimes10 function will do just that…

void DrawNumberTimes10(int number, float2 pos, float2 scl, bool pad) {
    float2 spacing = float2(scl.x*1.25, 0);

    int idx = AllocCharacters(8);

    bool onFirstChar = true;
    for (int base = 1000000; base > 0; base /= 10) {
        int val = clamp(number / base,0,9);
        number -= val * base;
        if (val != 0 || !onFirstChar || base==1) {
            //output character
            DrawCharacter(idx++, '0'+val, pos, scl); pos += spacing;
            onFirstChar = false;
        else {
            //output space and increment pos if padding requested
            DrawCharacter(idx++, ' ', pos, scl);
                pos += spacing;

    //final 0
    DrawCharacter(idx++, '0', pos, scl); pos += spacing;

Now by adding a score value, I can draw the nice numbers on screen:

    DrawNumberTimes10(_playersRW[0].score, float2(100, 700), float2(18, 20), true);
    DrawNumberTimes10(0, float2(800, 700), float2(18, 20), false);
    DrawNumberTimes10(0, float2(500, 680), float2(18, 20)*0.75, false);

Not sure what the other 2 numbers are supposed to be yet. Presumably credits and player 2 score. Oh well who cares – it looks cool:


Well, after that brain dump of crazy I have quite a nice thing goin on. It’s not asteroids yet, but any game logic is completely separate from CPU. Compute shaders are responsible for init, game loop and all general physics and spawning etc. Next time I’ll probably get some effects and flying saucers in or something…

Also – code:

One thought on “Shaderoids 2

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s