16

2

I have set up some FPS-measuring code in WebGL (based on this SO answer) and have discovered some oddities with the performance of my fragment shader. The code just renders a single quad (or rather two triangles) over a 1024x1024 canvas, so all the magic happens in the fragment shader.

Consider this simple shader (GLSL; the vertex shader is just a pass-through):

```
// some definitions
void main() {
float seed = uSeed;
float x = vPos.x;
float y = vPos.y;
float value = 1.0;
// Nothing to see here...
gl_FragColor = vec4(value, value, value, 1.0);
}
```

So this just renders a white canvas. It averages around 30 fps on my machine.

Now let's ramp up the number crunching and compute each fragment based on a few octaves of position-dependent noise:

```
void main() {
float seed = uSeed;
float x = vPos.x;
float y = vPos.y;
float value = 1.0;
float noise;
for ( int j=0; j<10; ++j)
{
noise = 0.0;
for ( int i=4; i>0; i-- )
{
float oct = pow(2.0,float(i));
noise += snoise(vec2(mod(seed,13.0)+x*oct,mod(seed*seed,11.0)+y*oct))/oct*4.0;
}
}
value = noise/2.0+0.5;
gl_FragColor = vec4(value, value, value, 1.0);
}
```

^{If you want to run the above code, I've been using this implementation of snoise.}

This brings down the fps to something like 7. That makes sense.

Now the weird part... let's compute only one of every 16 fragments as noise and leave the others white, by wrapping the noise computation in the following conditional:

```
if (int(mod(x*512.0,4.0)) == 0 && int(mod(y*512.0,4.0)) == 0)) {
// same noise computation
}
```

You'd expect this to be much faster, but it's still only 7 fps.

For one more test, let's instead filter the pixels with the following conditional:

```
if (x > 0.5 && y > 0.5) {
// same noise computation
}
```

This gives the exact same number of noise pixels as before, but now we're back up to almost 30 fps.

**What is going on here?** Shouldn't the two ways to filter out a 16th of the pixels give the exact same number of cycles? And why is the slower one as slow as rendering *all* pixels as noise?

Bonus question: **What can I do about this?** Is there any way to work around the horrible performance if I actually *do* want to speckle my canvas with only a few expensive fragments?

(Just to be sure, I have confirmed that the actual modulo computation does not affect the frame rate at all, by rendering every 16th pixel black instead of white.)

5Rendering to a smaller texture and upsampling is a good way to do it. But if for some reason you really need to write to every 16th pixel of the large texture, using a compute shader with one invocation for every 16th pixel plus image load/store to scatter the writes into the render target could be a good option. – Nathan Reed – 2015-08-16T00:28:39.233