WebGL performance monitor with GPU loads

You asking about it really gentle, thank you for this feedback. Code above gives me the opportunity to estimate GPU load.

piece of code
// attach gpu profilers
if (gl) {
  const addProfiler = (fn, self, target) => function() {
    const t = self.now();
    fn.apply(target, arguments);
    if (self.trackGPU) {
      gl.readPixels(0, 0, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, new Uint8Array(4));
      const dt = self.now() - t;
      self.activeAccums.forEach((active, i) => {
        if (active) {
          self.gpuAccums[i] += dt;
          self.cpuAccums[i] -= dt;
        }
      });
    }
  };
  ['drawArrays', 'drawElements', 'drawArraysInstanced',
    'drawBuffers', 'drawElementsInstanced', 'drawRangeElements']
    .forEach(fn => { if (gl[fn]) gl[fn] = addProfiler(gl[fn], this, gl) });

  const extProfiler = (fn, self) => function() {
    let ext = fn.apply(gl, arguments);
    ['drawElementsInstancedANGLE', 'drawBuffersWEBGL']
      .forEach(fn => { if (ext[fn]) ext[fn] = addProfiler(ext[fn], self, ext) });
    return ext;
  };
  gl.getExtension = extProfiler(gl.getExtension, this);
}

In fact, I take into account only gl.RGBA format with 8-bit framebuffer. But if you have float textures in your own custom material perhaps it may lead to an error or not to measure the drawing time. Also this code not asynchronous and often lead to bottleneck on CPU before you hit GPU limit. I need to think about it again, just was happy that itā€™s work on most of examples.

When I measured time with EXT_disjoint_timer_query I often have 20% of load free on gpu, because OS also use it. But with cpu timers things become not so precise and show sometime 0% left in requestAnimationFrame loop. Anyway it is good rough estimation and it works even on mobile phones which is amazing I think.

Interesting. I take it this will force the GPU to execute all previous drawing commands, so that the pixels (pixel) delivered are correct. But how do you avoid impacting the performance you are trying to measure?

I tried this again, it works but absolutely kills my app performance =[

I have a Geforce GTX 1060 which is never dips below 60fps in my game. With this monitor enabled I get like 18fps.

Is there anyway I could help debug?

  1. You can off gpu tracking, just press on it. And all will be perfect I suppose.
  2. Yes it will be cool, I want to figure out how to sync GPU with CPU with less impact on overall performance. I am tried: EXT_disjount_timer_query, gl.finish(), gl.getPixels(). For now I am stoped on getError() + getParameter(). Work smoothly for me.

Thank you all for the feedback! If you have any additional question go to github, please. We started to repeating topics that already was mentioned.

Thanks for the explanation, this makes more sense.

I have figured out how to test and see if itā€™s working correctly.
(very simple - disable the custom shader material and see if it impacts the performance)
Gpu use hovers around 60% without the custom shader, and 65% with it, at full screen, thereā€™s other stuff in three.js going on. So, even if it doesnā€™t give absolute accuracy, itā€™s easy to make comparison tests.

Btw, the new version doesnt seem to work, I just updated it, hereā€™s a screen comparison of them both running the same:
59

Sorry, I canā€™t find the discussion on github. Where is it?

If first version worked for you than you are able to get super precise result from EXT_disjoint_timer_query! Some guys earlier says that itā€™s not working. Obviously we need an extension for extension.

just FYI, the version you see working, I canā€™t find any reference to EXT_disjoint_timer_query.

This is the old version that works:

This is the new version that doesnā€™t work:

You forcing me to return on gl.readPixels, but actually there was a bug in asynchronous code. I used setTimeout(f, 0) without Promise, so it was macrotask. But macrotask can thrown over rAF in chrome, here a events priority in different browsers:

  • Chrome : microtask, requestAnimationFrame, macrotask
  • Firefox : microtask, macrotask, requestAnimationFrame

Just check new version please, sorry for this rush.

I tried it in Chrome now, and it seems to give reasonable results. :+1: I donā€™t understand what the Count is, except in the instancing example, other than something that kills performance when increased.

I am not a fan of the icons. I donā€™t find them intuitive. The first looks like a microchip, and the second looks like a passive component like a resistor or something. The letters ā€˜Cā€™ and ā€˜Gā€™ are obvious choices, but they will perhaps look too similar on that scale? Any other options that have been considered? How about a triangle for the GPU, and something else for the CPU?

Weird, itā€™s not working for me, and I noticed you changed the example code to set it up, so im not even sure im implementing it the right way.
Plus: bench.newFrame() - isnā€™t a function, and I donā€™t even know what to pass it as the argument ā€œnowā€

I tried only the linked examples, no custom application.

yeah, those examples are working fine for me, I just cant get the example to work in my own project.

I agree that the chip icons are not intuitiveā€¦ C and G would be much better :+1:

2 Likes

I drew this svg icons in Inskape by hands and was inspired by this pack. Simple C/G letters will be smaller than svg in bundle.

Examples was copypasted from official Three.js repo for testing purposes, variable ā€˜countā€™ changing number of GPU draw calls. I need to keep references to original authors.

@Cotterzz thank you for your patience, it was a typo in the documentation and some garbage in minified file included from vscode.

Ok, I got it.
itā€™s not newFrame, its nextFrame, (And I think you donā€™t need to pass it a now variable)

Itā€™s working, but gpu seems to be stuck at 100%, most of the time, which i donā€™t think is correct, the bar indicator isnā€™t very accurate, so I donā€™t really know.

I can explain why I am choose this explicit API with nextFrame:

  1. time from rAF callback more precise than performance.now() here proof and after that we can make signal filtering for chart.
  2. with this explicit function we can make a tool for measuring outside of the rAF

Also if you will find a way how to measure with readPixel and it will be working with Framebuffers and Float textures I will merge this poll request on github.