What's the difference between linear depth, perspective depth, and orthographic depth?

Studying the code of three.webgpu.js, I come across the terms “linear depth”, “perspective depth”, and “orthographic depth”.
I know there is a visual difference between perspective and orthographic projections, but why would the viewZ value differ amongst the different projections?
Also, is linear depth just another name for orthographic depth?

I was inspired to ask this question after studying the functions viewZToOrthographicDepth and viewZToPerspectiveDepth, and noticing there is no viewZToLinearDepth function.

const viewZToOrthographicDepth = (viewZ, near, far) => viewZ.add(near).div(near.sub(far));
const viewZToPerspectiveDepth = (viewZ, near, far) => near.add(viewZ).mul(far).div(far.sub(near).mul(viewZ));

Note: I’m specifically referring to the depth (z) values between each projection type.

Yes you’re correct I think orthographic depth is linear depth.
Perspective depth is depth divided by the w component.
and logarithmic depth is 1/perspective depth, so you get more depth resolution closer to camera, at the expense of some code complexity, and potentially less hardware support for early z-rejection.
( I might be wrong about some of these, but that’s my general understanding. )

Thank you for the info–however, I don’t fully understand how using a perspective Z instead of an orthographic Z matters. From the renderer’s point of view, the projection should only affect the X and Y coords (since they are rendered to a 2D viewport), with the Z coord simply dictating which 3D objects should be rendered in front of which. Am I correct in this assumption? I understand that the X and Y coords should be “squashed” when using perspective, but I don’t see why the Z coord (depth) should too. Ultimately, it’s the X and Y coords that create a “vanishing point”, right? Why bother “squashing” the Z value if it doesn’t change the resulting image?

The z value does change the resulting image since it’s used for perspective correct texture mapping and clipping. Dividing the entire coordinate by w brings it into “NDC” space (normalized device coordinates) -1 to 1 on each axis. The clipping hardware then discards fragments outside this unit sized box.

There probably are scenarios you could contrive where it would not be necessary to divide z, but due to how gpu math works, it’s probably cheaper/same cost to divide a whole vector rather than only one part of it.

1 Like