Optimizing the Particle Life: WebGPU and Millions of 3D Particles (Part 3)

Exceeded expectations

For a modest web developer like me, diving into the world of computer graphics APIs can be more than intimidating. It's not even about the libraries' complexity, scarce learning resources, or the large amount of boilerplate code required to achieve the simplest of things. For me, it required a shift in perspective and I felt like I was a total beginner at programming again. To be honest, it was a nice feeling, but it's certainly not something you can learn "on the go". You can take, let's say, a Java developer with several years of experience, and they'll almost immediately be a valuable addition to a team developing in C# or TypeScript. But to be a productive graphics programmer, I'd probably have to spend months, if not more, on intensive learning before being worth anything.

I wanted to finish my project, however! Initially, I thought that learning complex graphics API just for this project would be overkill, and I wanted to use some higher-level library. But I quickly found out that some of the likes of Three.js have a learning curve of their own, and simpler ones, focused on computation (think GPU.js) are not enough, often buggy and just offering a headache of their own.

For my particle life, rendering the particles on the screen was not the most expensive operation. Before rendering, computing their positions was the real bottleneck. So when I came across an article about WebGPU introducing compute shaders, I decided to give it a try. It's a simple idea: you load data to a buffer, perform operations on it in parallel, and output the result to a buffer. As opposed to WebGL, where something similar is possible through a hacky, unreliable way: you convert data to a texture, load it to GPU with a pixel shader, get the pixels from the canvas, and convert them back to data.

In general, when I realized that when you throw away the scary boilerplate code, all that my code was buffers filled with easy-to-visualize data, flowing from the compute pipeline to the rendering pipeline, things clicked in my head and I could focus on the actual problem I had to solve.

And I have to say, that I'm extremely happy with how it turned out. Not only did I go far beyond my initial goal of rendering a million particles with acceptable frame rate, but due to tweaks in the algorithm and turning the whole thing 3D, the visual effect, in my opinion, is quite stunning and bordering in what could be called generative art.

1. Preparing the buffers for compute pass

First, let's focus on computing the particle position. As you remember from the first part, to describe our particles' movement, we need four components of our system:

Particles' positions
Particles' velocities
Particles' colors
Color-attraction matrix

To calculate position, we need velocity, and to calculate velocity, we need to know the attraction (or repulsion) between particles, and for that, we need to know their colors. All these must be written to buffers so we can access them in the shader module.

//The array needs have length of particlesCount*4, because our position will be a 4 
//dimensional vector (x,y,z coordinate + additional projection value w we need to project 3D  
//coordinates to 2D plane)
let positionBufferData = new Float32Array(particleCount * 4); 
//as with python and js version, we start with random distribution
for (let i = 0; i < positionBufferData.length; i += 4) {
  positionBufferData[i] = Math.random() * 2 - 1;
  positionBufferData[i + 1] = Math.random() * 2 - 1;
  positionBufferData[i + 2] = Math.random() * 2 - 1;
  positionBufferData[i + 3] = 1;
}
//buffer usage must be declared upon creation
//VERTEX - can be used as a vertex buffer we'll use in render pass
//COPY_DST - can be a destination or copy/write operations
//STORAGE - can be used as storage, for example in bind group entry
const positionBuffer = device.createBuffer({
size: positionBufferData.byteLength,
usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST | GPUBufferUsage.STORAGE,
});
device.queue.writeBuffer(positionBuffer, 0, positionBufferData);

//This will be an array of integers from 0 to m-1 of length = particleCount
//in other words, one random color for every particle
const computeColorsBufferData = new Int32Array(particleCount);
for (let i = 0; i < positionBufferData.length; i++) {
  computeColorsBufferData[i] = Math.floor(Math.random() * m);
}
const computeColorsBuffer = device.createBuffer({
  size: computeColorsBufferData.byteLength,
  usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.STORAGE,
});
device.queue.writeBuffer(computeColorsBuffer, 0, computeColorsBufferData);

//Oour color-attraction matrix, flattened.
const attractionBufferData = new Float32Array(m * m);
for (let i = 0; i < attractionBufferData.length; i++) {
  attractionBufferData[i] = matrix.flat()[i]
}
const attractionBuffer = device.createBuffer({
  size: attractionBufferData.byteLength,
  usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.STORAGE,
});
device.queue.writeBuffer(attractionBuffer, 0, attractionBufferData);

//Velocities. Here, we need 3 values (x,y,z) for every particle.
const velocityBufferData = new Float32Array(particleCount * 3);
for (let i = 0; i < velocityBufferData.length; i += 3) {
  velocityBufferData[i] = 0.0;
  velocityBufferData[i + 1] = 0.0;
  velocityBufferData[i + 2] = 0.0;
}
const velocityBuffer = device.createBuffer({
size: velocityBufferData.byteLength,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.STORAGE,
});
device.queue.writeBuffer(velocityBuffer, 0, velocityBufferData);

2. Space partitioning

With webGPU, I'll be able to render the Particle Life on a much larger scale, but if I want to squeeze as much as possible from my machine, I'll need to come up with a way to organize my data efficiently, just like in the previous article. On my Macbook, without any space partitioning, I was able to render about 100000 particles. After optimizations, I was able to render over 4 million particles with speed of about 5fps. For ~million particles, I achieved the results of around 30fps.

The key to that was a realization that after grouping my particles so that particles in the same sectors of space occupy a neighbouring position in buffers, I don't need to recalculate groupings again. Particles are bound to groups of mutual interactions in the beginning, and they will interact with each other (given they are within the right radius) regardless of their current position in clip space. This way, we save one more compute pass we'd have to execute on every frame. Moreover, the animation will be more stable, as we're preventing large clumps of particles from aggregating.

So, let's divide our 3D space into sectors based on interaction radius length. Unlike QuadTree, which we used in the previous part, this will always be 1-level deep space partitioning.

const axisDivisionCount = (Math.floor(2 / interactionRadius))
const axisDivisionLength = 2 / axisDivisionCount;
const sectorsNumber = Math.pow(axisDivisionCount, 3);

As we see in the code above, to calculate the number of sectors, first we need to get the number of divisions of every axis. Based on that, we can calculate the side length of a sector. In clip space, coordinates always range from -1 to 1, so the total length is 2. To get the sectors count, we just raise the number of divisions per axis to the power of 3.

We're going to loop over every particle, determine its sector, and push it to the correct sectors' array. Then, we'll rearrange the positionBufferData array so that particles from sector 1 are in the very beginning, followed by sector 2 particles, and so on. Additionally, we'll need an additional array with the first indexes in the buffer of every sector's particles, so we can access them in the shader module.

//let's store particles for every sector separately
let sectors: number[] = [];
//this array will store counters for sectors length
let sectorsLengths: number[] = [];

//As discussed above, let's create axisDivisionCount^3 sectors, and set their initial length counters to 0.
for (let i = 0; i < axisDivisionCount; i++) {
  for (let j = 0; j < axisDivisionCount; j++) {
    for (let k = 0; k < axisDivisionCount; k++) {
      sectors.push([])
      sectorsLengths.push(0)
    }
  }
}

for (let i = 0; i < positionBufferData.length; i += 4) {
  determineParticlesSector([positionBufferData[i], positionBufferData[i + 1], positionBufferData[i + 2]])
}

function determineParticlesSector(particle: number[]) {
  for (let i = 0; i < axisDivisionCount; i++) {
    for (let j = 0; j < axisDivisionCount; j++) {
      for (let k = 0; k < axisDivisionCount; k++) {
        //Determining particles sectors, as visualized on the image below. 
        //We're checking if the particle resides within gives area of 3D space
        //Were moving alongside every axis, checking how many division lengths have we traversed
        if ((particle[0] > (-1) + (axisDivisionLength * i) && particle[0] < (-1) + (axisDivisionLength * (i + 1))) &&
          (particle[1] > (-1) + (axisDivisionLength * j) && particle[1] < (-1) + (axisDivisionLength * (j + 1))) &&
          (particle[2] > (-1) + (axisDivisionLength * k) && particle[2] < (-1) + (axisDivisionLength * (k + 1)))) {
          
          //Translate position in 3D array to 1D (flattened) array. We need to store data as 1D array for
          //buffers. Let's consider sector 6 from the image below. It's axisDivisionCount is 2. Were on 0th division 
          //on x axis, 1st division on y axis and 1st division on z axis, which gives us: 0+2+4=6 (index in 1D array).
          const sectorsIndex = i + (j * axisDivisionCount) + (k * Math.pow(axisDivisionCount, 2))
          //get current length to determine its position in sectors array
          const index = sectorsLengths[sectorsIndex]

          //push the particle to its correct position
          sectors[sectorsIndex][index] = particle[0]
          sectors[sectorsIndex][index + 1] = particle[1]
          sectors[sectorsIndex][index + 2] = particle[2]
          sectors[sectorsIndex][index + 3] = 1
          
          //increment sector's length
          sectorsLengths[sectorsIndex] += 4
        }
      }
    }
  }
}

Determining the particle's sector was the most difficult part so far. This is when I had to step away from my laptop for a while, and just spend some time with pen and paper to wrap my head around this. Thinking about this as moving alongside the axes from -1 onwards and checking how many division lengths have we traversed, helped me come up with the formula above.

Now that we have our sectors in order, let's use the helper arrays to rearrange data in positionBufferData.

let currentIndex = 0
let firstIndexes = new Uint32Array(sectorsNumber)

for (let i = 0; i < sectors.length; i++) {
  //to loop over the right particles in the shader module, we'll need starting index of every
  //sectors data in 1D positions buffer
  if (0 === i) {
    firstIndexes[i] = 0;
  } else {
    firstIndexes[i] = (firstIndexes[i - 1] + sectors[i].length / 4);
  }
  for (let j = 0; j < sectors[i].length; j++) {
    positionBufferData[currentIndex] = (sectors[i][j]);
    currentIndex++
  }
}

Here we have it. Now data in positionBufferData is grouped and particles born in the same sector will always occupy neighboring positions in the buffer. We have an additional array with the first indexes of every sector, which will allow us to only consider the relevant particles when calculating force. Now, we can proceed to writing our compute shader.

3. The compute shader

WebGPU shaders are written in its own shader language, WGSL. To familiarize yourself with its basics, I can only recommend its documentation. It's clear, concise, and easy to navigate. Pretty much, it is the only resource I have used. We'll be executing the main shader function on every particle in parallel. My biggest pain when writing the shaders was how difficult it was to debug them. You cannot look up the value of variables, there's no print() or console.log(). You can only see the initial layout of buffers with the Chrome extension, and it's not very beneficial. Error messages are helpful, but it's not nearly enough. I had to be smart about debugging and conditionally do one thing or another, to see if my code was on the right track.

It's important to notice that the algorithm below is not a 1:1 translation of the 2D version. When simplifying calculations for dev purposes, I noticed that if below the repulsive radius you apply symmetric, negative force, the shapes the particles take are much more beautiful. With little luck in generating the right initial forces, they start to resemble electron clouds you might remember from physics classes. Take your time to experiment and see for yourself!

`
@group(0) @binding(0) var<storage, read_write> positions: array<vec4f>;
@group(0) @binding(1) var<storage, read_write> velocities: array<vec3f>; 
@group(0) @binding(2) var<storage> colors: array<i32>; 
@group(0) @binding(3) var<storage> attraction: array<f32>; 
@group(0) @binding(4) var<storage> indexes: array<u32>; 

//Global space is divided into workgroups that share the same workgroup space
//and workgroups are subdivided into threads executing entry point function in
//parallel. Global_invocation_id equal to workgroup_id * workgroup_size + local_invocation_id.
@compute @workgroup_size(${workGroupSize})
fn cs(@builtin(global_invocation_id) global_id: vec3u) {
    let index = global_id.x;
    var position = positions[index].xyz;
    var velocity = velocities[index].xyz;
    var sectorsStartIndex: u32;
    var sectorsEndIndex: u32;

    //First, let's calculate first and last index, based on the 
    //firstIndexes array we calculated in previous paragraphs
    for(var i: i32 = 0; i < ${firstIndexes.length}; i+=1) {
      if (i <= ${firstIndexes.length - 2}) {
        if (index > indexes[i] && index < indexes[i+1]) {
          sectorsStartIndex = indexes[i];
          sectorsEndIndex = indexes[i+1];
        }
      } else {
        if (index > indexes[i] && index < ${particleCount}) {
          sectorsStartIndex = indexes[i];
          sectorsEndIndex = ${particleCount};
        }
      }
    }

    //initialize variables for total forces
    var forcex = 0.0;
    var forcey = 0.0;
    var forcez = 0.0;

    let interactionRadius = ${interactionRadius};
    //force factor is an arbitrarily calculated value to balance out
    //forces in the system
    let forceFactor = ${interactionRadius}/${forceFactor};
    
    for (var i = sectorsStartIndex;i < sectorsEndIndex; i++) {
      let xdiff = positions[i].x - position.x;
      let ydiff = positions[i].y - position.y;
      let zdiff = positions[i].z - position.z;
      
      let dist = sqrt((xdiff*xdiff) + (ydiff*ydiff) + (zdiff*zdiff));

      if (dist < interactionRadius) {
        //constants below are direct translations of code explained
        //in the first part
        let forceV = attraction[(${m}*colors[index])+colors[i]];
        let ratio = dist/interactionRadius;
        let repF = 0.2;
        let force =  forceV * (1.0 - ratio) * forceFactor;

        if (ratio > repF) {
          forcex += xdiff*force;
          forcey += ydiff*force;
          forcez += zdiff*force;
        //here lies the diffence compared to 2D version. On short distances, we 
        //apply symmetric, negative force
        } else if (ratio <= repF) {
          forcex -= xdiff*force;
          forcey -= ydiff*force;
          forcez -= zdiff*force;
        }
      }
    }

    //code analogous to that explained in part 1
    velocity.x+=forcex;
    velocity.y+=forcey;
    velocity.z+=forcez;

    velocity*=0.3; // friction

    positions[index] = vec4f(position + velocity, 1);
    velocities[index] = vec3f(velocity);
}

`

4. Rendering shader

Now, after creating the compute pipeline and bind group, it's time to prepare additional buffers for render pass (which you can look up in the GitHub repo) and render the shader module itself. I won't be doing anything outside of the box there, except for one adjustment. When creating a colors buffer, I assigned every particle an initial opacity, loaded from system's parameters. Here, to improve the visual effect and give my animation more depth I will update the particle's opacity based on their z coordinate in the vertex shader. So, particles at -1 of the z-axis will have 0% of the initial opacity, and particles at 1 will have 100% of it.

const renderShaderModule = device.createShaderModule({
code: `
        struct VertexUniforms {
            screenDimensions: vec2f,
            particleSize: f32
        };

        struct VSOut {
            @builtin(position) clipPosition: vec4f,
            @location(0) color: vec4f
        };

        @group(0) @binding(0) var<uniform> vertexUniforms: VertexUniforms;

        @vertex
        fn vs(
            @location(0) vertexPosition: vec2f,
            @location(1) color: vec4f,
            @location(2) position: vec3f
        ) -> VSOut {
            var vsOut: VSOut;
            var adjustedP = position;
            adjustedP.y *= ${screenRatio};
            vsOut.clipPosition = vec4f(vertexPosition * vertexUniforms.particleSize / vertexUniforms.screenDimensions + adjustedP.xy, adjustedP.z, 1.0);
            var normalizedColor = color;
            normalizedColor.w *= ((position.z * 0.5 ) + 0.5);
            vsOut.color = normalizedColor;
            return vsOut;
        }             

        @fragment 
        fn fs(@location(0) color: vec4f) -> @location(0) vec4f {
            return vec4f(color.rgb * color.a, color.a);
        } 
    `
});

I had to omit a lot of boilerplate code, otherwise, this would be a several-hour-long read. To understand these parts, I can recommend WebGPU fundamentals - I studied it myself. To see how all the pieces fit, please see this component in the project. The paragraphs above are crucial to understanding how my particles work, especially the space partitioning section. Additionally, please read through the current webGPU implementation status.

Final thoughts

This project made me really appreciate how far the web has come. I remember, just 15 years ago, when I was a little kid spending way too much time on my computer, sending a form over PHP would be a pinnacle of web development. Now, we have canvas-based-UIs thanks to GPU rendering, with applications like Figma or Photoshop working in the browser, we have file system APIs like IndexedDB, Cache API, or LocalStorage. Service workers allow our web apps to work offline. WebTransport can make low-latency, real-time updates possible for games and other applications. WebAssembly is natively supported in all major browsers. The things I just mentioned are probably just the tip of the iceberg. I still can't wrap my head around being able to render 4 million particles through my Chrome browser in just under 500 lines of code. How amazing is that?