Shaderoids 1

Ok, time for something really productive. I have decided to put my pretty solid knowledge of GPU coding to the best possible use, and produce a perfect (as possible) clone of the 1979 game, Asteroids… entirely written on GPU.

Why you ask? Frankly you should be ashamed for asking that. Also, did you not notice the title of this web site?

First up, some rules:

  • All game code, graphics, physics and rendering entirely driven by GPU
  • CPU may only:
    • Read hardware (such as inputs), and pass to GPU
    • Fire off a fixed set of dispatches
    • Read requests from GPU to write to hardware (such as audio)
  • Pure line rendering

fyi, since writing, I’ve plonked all the code up here

Basic architecture / flow

The idea is to make the CPU side of things as minimal as possible. With that in mind, my basic flow is going to be a single MonoBehaviour in C# which:

  • Reads a set of input states (probably A-Z, 0-9 and a few others), and writes them into a compute buffer
  • Executes a series of dispatches to fire off various GPU compute kernels
  • Execute DrawProceduralIndirect to draw a load of lines generated by compute. (high bar: line renderer in compute!)
  • Read set of ‘audio requests’ written to buffer by GPU, and use to trigger some predefined sounds

So lets start with 3 files:

  • Game.cs for our monobehaviour
  • Asteroids.compute to contain all the compute jobs
  • DrawLines is a simple fragment shader that draws a set of lines
using UnityEngine;

public class Game : MonoBehaviour 
{
    public struct Line {
        public Vector3 a;
        public Vector3 b;
    }

    public const int LINE_BUFFER_SIZE = 10000;

    ComputeShader _asteroidsShader;
    Shader _drawLinesShader;
    Material _drawLinesMaterial;

    ComputeBuffer _linesBuffer;


    public void Awake() {
        _asteroidsShader = Resources.Load("asteroids");
        _drawLinesShader = Shader.Find("DrawLines");
        _drawLinesMaterial = new Material(_drawLinesShader);
        _drawLinesMaterial.hideFlags = HideFlags.HideAndDontSave;

        _linesBuffer = ComputeBufferUtils.Alloc(LINE_BUFFER_SIZE);
    }

    public void OnPostRender() {

        Line[] testLines = new Line[3];
        testLines[0] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(1, 0, 0) };
        testLines[1] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(1, 1, 0) };
        testLines[2] = new Line { a = new Vector3(0, 0, 0), b = new Vector3(0, 1, 0) };
        _linesBuffer.SetData(testLines);

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProcedural(MeshTopology.Lines, testLines.Length*2);
    }
}    

This code starts by loading up some shaders and building a material, then sets up a compute buffer that’ll contain a set of Line structures to be rendered. ComputeBufferUtils is a handy class for allocating structured buffers that I’ll upload at some point.

The OnPostRender function simply plonks 3 lines into the lines buffer, passes it to the draw line material and finally calls DrawProcedural, requesting line topology, and specifying 2 vertices per line.

DrawProcedural is a tasty beast. It allows us to write a vertex shader takes just an integer index. It is then up to the shader to take this index and convert it into useful vertex positions, which is done using data in the provided compute buffer:

Shader "DrawLines"
{
    Properties
    {
    }
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 100

        Pass
        {
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            
            struct v2f
            {
                float4 vertex : SV_POSITION;
            };

            struct Line {
                float3 a;
                float3 b;
            };

            StructuredBuffer lines;
            
            
            v2f vert (uint id : SV_VertexID, uint inst : SV_InstanceID)
            {
                Line l = lines[id / 2];
                float3 p = (id & 1) ? l.a : l.b;
                
                v2f o;
                o.vertex = float4(p, 1);
                return o;
            }
            
            fixed4 frag (v2f i) : SV_Target
            {
                return 1;
            }
            ENDCG
        }
    }
}

This cheaky little shader uses the vertex id to access the lines buffer and work out whether the vertex represents the start or the end of the line.

Result…

firstlines

Woohoo! Nearly there….?

Compute and indirect drawing

Right, time to write that compute shader I loaded earlier!

Let’s start by adding a new structure to the game code, and allocating a couple of extra buffers:

    public struct Counters {
        public int numLines;
    }
    _countersBuffer = ComputeBufferUtils.Alloc(1);
    _dispatchBuffer = ComputeBufferUtils.Alloc(8, ComputeBufferType.IndirectArguments);

The compute shader (Asteroids.compute) mimics the CPU side structure and buffers:

//structures
struct Line {
    float3 a;
    float3 b;
};
struct Counters {
    int numLines;
};

//buffers
RWStructuredBuffer _linesRW;
RWStructuredBuffer _counters;
RWStructuredBuffer _dispatch;

//general use uniform to limit dispatch thread counts
int _threadCount;


Now for some compute kernels. The first, I’ll dispatch once at the start of each frame to clear the line counter:

[numthreads(1,1,1)]
void ClearLines(uint3 id : SV_DispatchThreadID)
{
    _counters[0].numLines = 0;
}

This is designed to be executed as a single thread on the GPU, and just does one tiny bit of ‘setup’ work. This pattern is very common in GPU code – tiny compute jobs to pass around / clear / tweak some data, then large parallel dispatches to do actual work.

The next kernel is a more classic compute job, and designed to run in a highly parallel manner.

//creates lines based on dispatch thread
[numthreads(256, 1, 1)]
void GenerateTestLines(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        //allocate space
        int lineIdx;
        InterlockedAdd(_counters[0].numLines, 1, lineIdx);

        //build line
        Line l;
        float ang = radians(id.x);
        float3 dir = float3(sin(ang), cos(ang), 0);
        l.a = dir * 0.1f;
        l.b = dir * 0.75f;
        _linesRW[lineIdx] = l;
    }
}

This uses InterlockedAdd to ‘allocate’ a slot in the lines buffer by atomically incrementing the numLines counter. A line is then generated based on the thread index, and written out to the correct slot. In future I’ll use this pattern for outputting all lines that need rendering.

Next, another simple setup kernel that takes the numLines counter and converts it into a set of arguments that can be passed to DrawProceduralIndirect:


//fills out indirect dispatch args 
[numthreads(1, 1, 1)]
void LineDispatchArgs(uint3 id : SV_DispatchThreadID)
{
    _dispatch[0] = _counters[0].numLines*2; //v count per inst (2 verts per line)
    _dispatch[1] = 1; //1 instance
    _dispatch[2] = 0; //verts start at 0
    _dispatch[3] = 0; //instances start at 0
}

This indirect technique is used with both drawing and dispatching, and allows the use of a compute job to setup the work for a following compute job or draw call. In this case I’m setting up the arguments required for DrawProcedural.

That little ‘*2’ in LineDispatchArgs took a while to spot! Interesting fun – if you’re on NVidia and force Gen lines to only build 64 lines, and remove the ‘*2’ you see it alternate between 2 groups of 32 lines. That’s cos you’re only rendering half the lines you generate. As an NVidia GPU works in warps of 32 threads you randomly get the 1st half or the 2nd half. Hardware showing its true form!

The Game.cs OnPostRender now looks roughly like this:

void OnPostRender()
{
        _asteroidsShader.SetBuffer(kernelClearLines, "_counters", _countersBuffer);
        DispatchOne(kernelClearLines);

        _asteroidsShader.SetBuffer(kernelGenerateTestLines, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernelGenerateTestLines, "_linesRW", _linesBuffer);
          DispatchItems(kernelGenerateTestLines, 360);

        _asteroidsShader.SetBuffer(kernelLineDispatchArgs, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernelLineDispatchArgs, "_dispatch", _dispatchBuffer);
        DispatchOne(kernelLineDispatchArgs);

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProcedural(MeshTopology.Lines, testLines.Length*2);
        Graphics.DrawProceduralIndirect(MeshTopology.Lines, _dispatchBuffer);
}

Where DispatchOne and DispatchItems are a couple of handy helpers for dispatching compute kernels:

    void DispatchOne(int kernel) {
        _asteroidsShader.Dispatch(kernel, 1, 1, 1);
    }
    void DispatchItems(int kernel, int items) {
        uint x,y,z;
        _asteroidsShader.GetKernelThreadGroupSizes(kernel, out x, out y, out z);
        _asteroidsShader.SetInt("_threadCount", items);
        _asteroidsShader.Dispatch(kernel, (items + (int)x - 1) / (int)x, 1, 1);
    }

Anyhoo, making it 2 degrees per line + rendering at 1024*768:

linecircleincompute

(oh yeah – I changed the clear colour to black too!).

Next, I think I need to correct for aspect ratio and stuff. Gonna operate at fixed res of 1024/768 then upscale / downscale points. Hmmm… spent a while trying to make this work with any resolution but got bored. Instead I’ll just convert from (1024,768) to screen space:

p = 2 * (p – float2(1024,768)*0.5) / float2(1024,768);

Also changed lines to be float2 whilst there!

A bit of game

It’s getting a bit big to post all the code up here, but I’ll eventually share it all on git hub!

Let’s get inputs in. To stay true to the goal, I want to avoid any understanding of ‘gameplay’ on CPU, so I’ll just push a big list of keys into game. CPU side, I’ll setup a KeyState structure and read a load of buttons into it each frame:


    public struct KeyState {
        public bool down;
        public bool pressed;
        public bool released;
    }
      for(int i = 0; i < 26; i++) {
            _cpuKeyStates['a' + i] = new KeyState {
                down = Input.GetKey(KeyCode.A + i),
                pressed = Input.GetKeyDown(KeyCode.A + i),
                released = Input.GetKeyUp(KeyCode.A + i),
            };
        }

On top of ‘A’-‘Z’, I also added some handy keys like ‘0’-‘9’, ‘ ‘ and a few other bits.

Now to define a new PlayerState (which also comes with a new _playerState buffer).

    public struct PlayerState {
        public Vector2 position;
        public float rotation;
        public Vector2 speed;
        public bool alive;
    }

For the moment, I’ll init this CPU side to get going

        PlayerState[] initPlayer = new PlayerState[1];
        initPlayer[0].position = new Vector2(1024f, 768f) * 0.5f;
        initPlayer[0].alive = true;
        _playerState.SetData(initPlayer);

The long term goal is to get a proper game loop going, so I don’t feel too dirty about a bit of CPU setup for now!

So compute side I’ve setup corresponding structures and buffers. Also added these few helper functions for drawing lines and manipulating vectors:

int AllocLines(int count) {
    int lineIdx;
    InterlockedAdd(_counters[0].numLines, count, lineIdx);
    return lineIdx;
}
void AddLine(int idx, float2 a, float2 b) {
    _linesRW[idx].a = a;
    _linesRW[idx].b = b;
}

float2 mulpoint(float3x3 trans, float2 p) {
    return mul(trans, float3(p, 1)).xy;
}
float2 mulvec(float3x3 trans, float2 p) {
    return mul(trans, float3(p, 0)).xy;
}

For now it’s 1 player, but I want to plan for a million, so I’ll work as though there’s more than 1. The basic compute kernel for player update looks like this:

[numthreads(256, 1, 1)]
void UpdateAndDrawPlayer(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        PlayerState player = _playersRW[id.x];
        if(player.alive) {
            //do stuff
        }
        _playersRW[id.x] = player;

    }
}

The core update code splits into 2 sections. First, standard asteroid style inputs:

            float rot = 0;
            float thrust = 0;
            float rotPerSecond = 1;
            float thrustPerSecond = 100;

            if (_keyStates['A'].down) {
                rot += rotPerSecond * _timeStep;
            }
            if (_keyStates['D'].down) {
                rot -= rotPerSecond * _timeStep;
            }
            if (_keyStates['W'].down) {
                thrust += thrustPerSecond * _timeStep;
            }
            player.rotation += rot;

            float2 worldy = float2(sin(player.rotation), cos(player.rotation));
            float2 worldx = float2(-worldy.y, worldy.x);

            player.velocity += worldy * thrust;
            player.position += player.velocity * _timeStep;

Then the second half calculates a transform matrix and generates some hard coded lines:

            worldx *= 50;
            worldy *= 50;
            float3x3 trans = {
                worldx.x, worldy.x, player.position.x,
                worldx.y, worldy.y, player.position.y,
                0, 0, 1
            };

            int lineIdx = AllocLines(5);
            
            float2 leftcorner = mulpoint(trans, float2(-0.7, -1));
            float2 rightcorner = mulpoint(trans, float2(0.7, -1));
            float2 tip = mulpoint(trans, float2(0, 1));
            float2 leftback = mulpoint(trans, float2(-0.2, -0.7f));
            float2 rightback = mulpoint(trans, float2(0.2, -0.7f));

            AddLine(lineIdx++, leftcorner, tip);
            AddLine(lineIdx++, rightcorner, tip);
            AddLine(lineIdx++, leftcorner, leftback);
            AddLine(lineIdx++, rightcorner, rightback);
            AddLine(lineIdx++, leftback, rightback);


Those vectors look roughly right I think. Side by side with the video:

shipsidebyside

After a bit of debugging, I’ve found the bool types in KeyState appeared to be causing problems. Seems odd, but I can fix it with ints and don’t fancy digging too deep right now. Once that’s sorted, we have a game sort of…

Quickly add screen wrapping…

            //wrap player (note: better version should handle overshoot amount)
            player.position = player.position >= 0 ? player.position : float2(1024, 768);
            player.position = player.position <= float2(1024,768) ? player.position  : 0;

This quick and dirty version could be better by handling the fact that if they overshoot by k pixels, they shoot come back k pixels from the other side. Pain in the ass though!

The thrust in the video looks like it’s just a flashing triangle, maybe with a bit of randomness thrown in. To achieve this, I’ll pass through the frame number, which is fed into a wang hash based random number generator. Then some cheeky code to add the extra triangle when thrusting:


            int thrustframe = (_frame / 4);
            if (thrust > 0 && (thrustframe &1)) {
                lineIdx = AllocLines(2);
                float2 thrustback = mulpoint(trans, float2(0.0f, -1.5f-wang_rand(thrustframe)*0.15f));
                AddLine(lineIdx++, leftback, thrustback);
                AddLine(lineIdx++, rightback, thrustback);
            }

Asteroids

Just like with players, I’ll define a structure for the asteroid state, with a bit of extra data in:

    public struct AsteroidState {
        public Vector2 position;
        public float rotation;
        public Vector2 velocity;
        public int alive;
        public float radius;
        public int level;
    }

As before, I create a buffer for this, and init a randomly placed, sensible set of asteroids:

        AsteroidState[] initAsteroids = new AsteroidState[MAX_ASTEROIDS];
        for(int i = 0; i < START_ASTEROIDS; i++) {             while(true) {                 initAsteroids[i].position = new Vector2(Random.Range(0f, 1024f), Random.Range(0f, 768f));                 if ((initAsteroids[i].position - initPlayer[0].position).magnitude > 200f)
                    break;
            }
            initAsteroids[i].alive = 1;
            initAsteroids[i].radius = 30;
            initAsteroids[i].rotation = Random.Range(-Mathf.PI, Mathf.PI);
            initAsteroids[i].velocity = Random.insideUnitCircle * 50f;
            initAsteroids[i].level = 0;
        }
        _asteroidState.SetData(initAsteroids);

A note at this point, after a few bugs I’ve swizzled a few bits around:

  • I’ve split update and render out into the unity Update and OnPostRender functions, as reading input seemed to break a bit when called from render
  • I’ve added a helpful BindEverything function that just takes a compute kernel and binds all my buffers and variables to it. This would be terrible behaviour in production, but this isn’t production.
  • I also changed ClearLines to BeginFrame, cos it’s starting to look like it’ll do more than clear out lines

The BindEverything code:

    void BindEverything(int kernel) {
        _asteroidsShader.SetInt("_maxPlayers", MAX_PLAYERS);
        _asteroidsShader.SetInt("_maxAsteroids", MAX_ASTEROIDS);
        _asteroidsShader.SetFloat("_time", Time.time);
        _asteroidsShader.SetFloat("_timeStep", Time.deltaTime);
        _asteroidsShader.SetInt("_frame", Time.frameCount);

        _asteroidsShader.SetBuffer(kernel, "_dispatch", _dispatchBuffer);
        _asteroidsShader.SetBuffer(kernel, "_counters", _countersBuffer);
        _asteroidsShader.SetBuffer(kernel, "_linesRW", _linesBuffer);
        _asteroidsShader.SetBuffer(kernel, "_keyStates", _keyStates);
        _asteroidsShader.SetBuffer(kernel, "_playersRW", _playerState);
        _asteroidsShader.SetBuffer(kernel, "_asteroidsRW", _asteroidState);
    }

With the asteroids added, the dispatch now looks like this:

    private void Update() {
        //all my inputs are read into _cpyKeyStates here
        _keyStates.SetData(_cpuKeyStates);

        DispatchOne(kernelBeginFrame);
        DispatchItems(kernelUpdateAndDrawPlayer, MAX_PLAYERS);
        DispatchItems(kernelUpdateAndDrawAsteroid, MAX_ASTEROIDS);
        DispatchOne(kernelLineDispatchArgs);
    }

And render is still very simple:

    public void OnPostRender() {

        _drawLinesMaterial.SetBuffer("lines", _linesBuffer);
        _drawLinesMaterial.SetPass(0);
        Graphics.DrawProceduralIndirect(MeshTopology.Lines, _dispatchBuffer);
    }

So, the asteroids update kernel. It’s pretty similar to player update in functionality. The first half updates its position, and the second renders it.

//updates player movement and outputs draw request
[numthreads(256, 1, 1)]
void UpdateAndDrawAsteroid(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)     {         AsteroidState asteroid = _asteroidsRW[id.x];         if (asteroid.alive) {             asteroid.position += asteroid.velocity * _timeStep;             asteroid.position = asteroid.position >= 0 ? asteroid.position : float2(1024, 768);
            asteroid.position = asteroid.position <= float2(1024, 768) ? asteroid.position : 0;

            float scl = asteroid.radius;

            float2 worldy = float2(sin(asteroid.rotation), cos(asteroid.rotation));
            float2 worldx = float2(-worldy.y, worldy.x);
            worldx *= scl;
            worldy *= scl;
            float3x3 trans = {
                worldx.x, worldy.x, asteroid.position.x,
                worldx.y, worldy.y, asteroid.position.y,
                0, 0, 1
            };

            //alloc edges
            const int NUM_EDGES = 9;
            int lineIdx = AllocLines(NUM_EDGES);

            //build first point then start iterating
            float randscl = 0.75f;
            float2 first;
            {
                int i = 0;
                float ang = 0;
                float2 pos = float2(sin(ang), cos(ang));
                pos += randscl * float2(wang_rand(id.x*NUM_EDGES + i), wang_rand(id.x*NUM_EDGES * 2 + i));
                first = mulpoint(trans, pos);
            }
            float2 prev = first;
            for (int i = 1; i < NUM_EDGES; i++) {

                //offset every other point using random number
                float ang = (i*3.1415927f*2.0f) / NUM_EDGES;
                float2 pos = float2(sin(ang), cos(ang));
                pos += randscl * float2(wang_rand(id.x*NUM_EDGES + i), wang_rand(id.x*NUM_EDGES * 2 + i));

                //add new line
                float2 curr = mulpoint(trans, pos);
                AddLine(lineIdx++, prev, curr); 
                prev = curr;
            }

            //add final line to joinn previous point to first point
            AddLine(lineIdx++, prev, first);

        }
        _asteroidsRW[id.x] = asteroid;
    }
}

I spent quite a while before I was happy with the look of the asteroids, and I’m still not fully satisfied. The above code basically generates a 9 edge circle, then randomly adjusts the position of each vertex. It also takes account of the asteroid’s radius, in anticipation of varying sizes.

firstasteroids

Bullets

Bullets are the first slightly cheeky one, as I need to be able to spawn them, which means  allocating/freeing of some form. This kind of model is tricky on GPU, so I’m going to go for the simple option of a large circular buffer of bullets. If the buffer overflows it’ll start recycling bullets, but I’ll just make it big enough not too!

Aside from a life time, the bullet state is very simple:

    public struct BulletState {
        public Vector2 position;
        public Vector2 velocity;
        public float lifetime;
    }

And the update equally so:

//updates player movement and outputs draw request
[numthreads(256, 1, 1)]
void UpdateAndDrawBullet(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)     {         BulletState bullet = _bulletsRW[id.x];         if (bullet.lifetime > 0) {

            bullet.position += bullet.velocity * _timeStep;
            if (any(bullet.position < 0) || any(bullet.position > float2(1024, 768))) {
                bullet.lifetime = -1;
                return;
            }
            bullet.lifetime -= _timeStep;

            float scl = 2;
            float3x3 trans = {
                scl, 0, bullet.position.x,
                0, scl, bullet.position.y, 
                0, 0, 1
            };

            //alloc edges
            const int NUM_EDGES = 6;
            int lineIdx = AllocLines(NUM_EDGES);

            //build first point then start iterating
            float2 first;
            {
                int i = 0;
                float ang = 0;
                float2 pos = float2(sin(ang), cos(ang));
                first = mulpoint(trans, pos);
            }
            float2 prev = first;
            for (int i = 1; i < NUM_EDGES; i++) {

                float ang = (i*3.1415927f*2.0f) / NUM_EDGES;
                float2 pos = float2(sin(ang), cos(ang));
                float2 curr = mulpoint(trans, pos);
                AddLine(lineIdx++, prev, curr);
                prev = curr;
            }

            //add final line to joinn previous point to first point
            AddLine(lineIdx++, prev, first);

        }
        _bulletsRW[id.x] = bullet;
    }
}

The main difference is that unlike the ship and asteroid, bullets have a life time and automatically die when it runs out. I couldn’t figure out whether bullets wrap in the original game, but it felt better when they didn’t, so they die when going off screen for the moment.

Now to implement the spawning. I’ll start with adding a nextBullet counter:

    public struct Counters {
        public int numLines;
        public int nextBullet;
    }

Now for the code in the player update to fire one:

            if (_keyStates[' '].pressed) {
                int nextBullet;
                InterlockedAdd(_counters[0].nextBullet, 1, nextBullet);
                BulletState b;
                b.position = player.position;
                b.velocity = worldy * 1000;
                b.lifetime = 3;
                _bulletsRW[nextBullet%_maxBullets] = b;
            }

When space is pressed, I increment the bullet counter, then a new bullet is created and written into the new slot. Note the ‘mod’ allows the nextBullet counter to get ever higher but still be used as an index into the _bulletsRW buffer.

Collision

Ok! We have stuff. Now to smash it. If I hit millions of players and asteroids, collision will have to start using optimisation structures, but GPUs are fast and like nothing more than doing 1000s of identical calculations. So collision will be a classic brute force ‘test everything against everything else’ situation.

Lets start with player vs asteroid collision:

[numthreads(256, 1, 1)]  
void CollidePlayerAsteroid(uint3 id : SV_DispatchThreadID) 
{
    if (id.x < _threadCount)
    {
        int playerIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (playerIdx*_maxAsteroids);

        PlayerState player = _playersRW[playerIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (player.alive && asteroid.alive) {
            if (length(player.position - asteroid.position) < (asteroid.radius+10)) {
                _playersRW[playerIdx].alive = 0;
                _asteroidsRW[asteroidIdx].alive = 0;
            }
        }
    }
}

The actual collision code here is a basic circle test. If the player is within 10 pixels of the asteroid, they die. The sneaky bit is at the top, where the player and asteroid indices are calculated. This makes more sense when you look at the dispatch CPU side:

        DispatchItems(kernelCollidePlayerAsteroid, MAX_PLAYERS * MAX_ASTEROIDS);

In effect, I dispatch 1 thread for every combination of player and asteroid. The funky calculations in CollidePlayerAsteroid simply decompose the thread id into player index and asteroid index, just like you might convert a pixel index to a pixel coordinate in an image.

Shockingly enough, for now the bullet vs asteroid is pretty similar:

[numthreads(256, 1, 1)]
void CollideBulletAsteroid(uint3 id : SV_DispatchThreadID)
{
    if (id.x < _threadCount)
    {
        int bulletIdx = id.x / _maxAsteroids;
        int asteroidIdx = id.x - (bulletIdx*_maxAsteroids);

        BulletState bullet = _bulletsRW[bulletIdx];
        AsteroidState asteroid = _asteroidsRW[asteroidIdx];

        if (bullet.lifetime > 0 && asteroid.alive) {
            if (length(bullet.position - asteroid.position) < (asteroid.radius + 2)) {
                _bulletsRW[bulletIdx].lifetime = -1;
                _asteroidsRW[asteroidIdx].alive = 0;
            }
        }
    }
}

There’s one ingredient before we can declare real progress though – asteroids smash! To achieve smashing asteroids I’ll take a similar approach to the bullets. The asteroids buffer will be large enough to contain all the asteroids a level will ever need (big, medium and small), and I’ll just use a counter to allocate into it. Once that’s added, I can create a splitting function:

void SplitAsteroid(int idx) {
    int nextIndex;
    InterlockedAdd(_counters[0].nextAsteroid, 2, nextIndex);

    AsteroidState asteroid = _asteroidsRW[idx];
    if (asteroid.level < 2) {

        float childSpeed = 50; 

        AsteroidState child;
        child.position = asteroid.position;
        child.velocity = asteroid.velocity + (float2(wang_rand(nextIndex), wang_rand(nextIndex * 2)) * 2 - 1) * childSpeed;
        child.alive = 1;
        child.radius = asteroid.radius * 0.5;
        child.rotation = (wang_rand(nextIndex * 3) * 2 - 1) * 3.1415927f;
        child.level = asteroid.level + 1;
        _asteroidsRW[nextIndex++] = child;

        child.position = asteroid.position;
        child.velocity = asteroid.velocity + (float2(wang_rand(nextIndex), wang_rand(nextIndex * 2)) * 2 - 1) * childSpeed;
        child.alive = 1;
        child.radius = asteroid.radius * 0.5;
        child.rotation = (wang_rand(nextIndex * 3) * 2 - 1) * 3.1415927f;
        child.level = asteroid.level + 1;
        _asteroidsRW[nextIndex++] = child;
    }


    _asteroidsRW[idx].alive = 0;
}

This code allocates 2 new slots by incrementing nextAsteroid by 2. Assuming the asteroid level < 2 (i.e. it isn’t too small), I then proceed to generate 2 new smaller asteroids. These start at the position of the larger one, but take a on randomly adjusted velocity. Finally, the asteroid is killed off. I also just noticed I probably shouldn’t allocate child asteroids unless I intend to use them, but it doesn’t matter.

A quick tweak to the bullet code:

        if (bullet.lifetime > 0 && asteroid.alive) {
            if (length(bullet.position - asteroid.position) < (asteroid.radius + 2)) {
                _bulletsRW[bulletIdx].lifetime = -1;
                SplitAsteroid(asteroidIdx);
            }
        }

And it’s asteroid time:

Yay! Ok – I have to get on a plane. Hopefully by the time I’ve landed there’ll be sound effects or something…

For UI, sounds and game loops, head to the next post here: Shaderoids 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s