Sunday, September 28, 2008

Light Pre-Pass: More Blood

I spent some more time with the Light Pre-Pass renderer. Here are my assumptions:

N.H^n = (N.L * N.H^n * Att) / (N.L * Att)

This division happens in the forward rendering path. The light source has its own shininess value in there == the power n value. With the specular component extracted, I can apply the material shininess value like this.

(N.H^n)^nm

Then I can re-construct the Blinn-Phong lighting equation. The data stored in the Light Buffer is treated like one light source. As a reminder, the first three channels of the light buffer hold:

N.L * Att * DiffuseColor

Color = Ambient + (LightBuffer.rgb * MatDiffInt) + MatSpecInt * (N.H^n)^mn * N.L * Att


So how could I do this :-)

N.H^n = (N.L * N.H^n * Att) / (N.L * Att)

N.L * Att is not in any channel of the Light buffer. How can I get this? The trick here is to convert the first three channels of the Light Buffer to luminance. The value should be pretty close to N.L * Att.
This also opens up a bunch of ideas for different materials. Every time you need the N.L * Att term you replace it with luminance. This should give you a wide range of materials.
The results I get are very exciting. Here is a list of advantages over a Deferred Renderer:
- less cost per light (you calculate much less in the Light pass)
- easier MSAA
- more material variety
- less read memory bandwidth -> fetches only two instead of the four textures it takes in a Deferred Renderer
- runs on hardware without ps_3_0 and MRT -> runs on DX8.1 hardware

30 comments:

Dunhill Hwang said...

I still don't know why you don't directly store N.H^n but store N.H^n = (N.L * N.H^n * Att) / (N.L * Att) for specular.

Wolfgang Engel said...

I have to store N.L * N.H^n * Att to preserve the locality of the specular reflection in the light buffer.
I also extend the specular reflection model by having two shininess values. One coming from the light source and one coming from the material. This is a more accurate representation of specular than what we usually do in games.

Dunhill Hwang said...

Ok, I understood what you want to do.

I think in z/normal fill pass(or called mini-gbuffer pass), since normal only need 2 or 3 componments(view space or world space), we have one componment free to store something else, so can store material's speculer power(mn) or a material id(a texcoord referrance to a 3D texture like standard deferred shading do) in "mini-gbuffer", so we can do full specular calculation( (N.H^n)^mn * N.L * Att or other approach use material id) in light prepass.

Since we only can store lumance for specular, maybe we can recover light color from diffuse lighting(normalize(LightBuffer.rgb)) in composite pass.

This method is correct when only one light. But for multi-lights, it is also get rather eye candy result.

Wolfgang Engel said...

Oh yes you can store the material shininess always in the normal buffer or in the stencil part of the depth buffer ... if you want that.

momo said...
This comment has been removed by the author.
David said...

Wolfgang,

Let me first thank you for sharing this great method. I was the one who asked about this method during Devcon 08 and you gave the answer. I have since implemented Light Pre-pass into actual game engine and running game assets. It works great. There is an Achilles heel to it though. For engines that solely relies on the z pre-pass as occlusion culling, rendering out the normal at the same time as z pre-pass can blow performance. Especially for hardware which support double speed z only pass. I still like this method alot, but would reserve this only for engine which has some kind of cpu occlusion culling.

David Lam

Wolfgang Engel said...

David, if you compare it to a Deferred renderer: there you fill up the G-Buffer and that results in the same dilemma.
In general Z-buffer update and Hierarchical Z update works like a Z Pre-Pass but you loose the 2x-8x fast depth write because you write more than only depth. But this is the same for the Deferred Renderer as for the Light Pre-Pass renderer. I actually wrote an article for ShaderX7 about the Light Pre-Pass renderer and I can send it to you. I also show in this article a few more ideas on what you can do with it. Send me an e-mail to may private e-mail address if you are interested.

David said...

Wolfgang,
With Deferred render, if you want to take advantage of a z-prepass, you would first render a z-only prepass and then follow by a mrt render to (normal,diffuse,etc). The mrt pass would take advantage of already laydown z-buffer and have no pixel redundancy (and taking advantage of high speed z pass at the same time).

Whereas in Light prepass, if I understand correctly, the order needs to be as follow:

1) depth,normal
2) light passes
3) final material pass

so either you render z-only first to take advantage of fast z-only pass, follow by normal, and final pass, requiring a total of 3 passes. Or you render z and normal at the same and loss high speed z-only. Either way, losing performance to Deferred (assuming you dont have any CPU occlusion culling).

David Lam

Wolfgang Engel said...

Well you can lay out a Z Pre-pass in the same way as you described it. First Z pre-Pass than fill up normal, then light buffer, then forward rendering. You want to do this on DirectX 8.1 hardware like the Wii.

David said...

Right, and for this reason (rendering 3 passes - depth, normal, forward) Light prepass can lose on performance compare to Deferred.

Wolfgang Engel said...

Yes, you always have to render geometry once more with the Light Pre-Pass but because the cost per light is lower, you can achieve more lights then :-) ... run MSAA at it is meant to be, have more material variety (real character shaders like skin, cloth etc.) and you can run it on lower end hardware because the memory bandwidth usage is much better and it does not require a MRT ... so overall a win I would say :-)

David Lam said...

And I agree with those points. But in our case Deferred is currently winning the performance race for the hardware we are working on, for the reason I mentioned.

I sure would like to peek into those articles about Prepass Lighting, but I can't your email address.

Wolfgang Engel said...

My target platforms are PS3 and XBOX 360. On the PS3 you can check out the game Resistance 2 to see how a Light Pre-Pass renderer can look like (it is not out so far).
I build the design of the Light Pre-Pass renderer on the experience I made by helping to ship a Deferred renderer on this platform and the 360, a Z Pre-Pass renderer (Midnight Club Los Angeles) and hopefully soon a Light Pre-pass renderer (unannounced game) (and there was another game on the 360 "Table-Tennis" that had none of those designs).

David Lam said...

Does those games that use Light Prepass have cpu occlusion culling?

Wolfgang Engel said...

Yes, they all have CPU/SPU occlusion culling.

Thanh Nguyen said...

Wolfgang,

First of, thank you for an awesome renderer design, I've implemented it for our current project and loving it so far. I'm curious though about how you're encoding your viewspace depth. From one of your earlier posts, it looks like you are encoding the depth in 2 channels (16-bit) in the 'normal-depth buffer'. Do you get any precision issue with that at all? I'm doing the same thing in my implementation and I'm seeing quite a bit of banding in my depth.

Thanh Nguyen

Wolfgang Engel said...

My target platforms are the XBOX 360 and the PS3 ... so I can access the depth buffer that is available there.
This is how I re-construct position. In other words I do not have any precision errors.

Benjamin said...

Wolfgang,
I'm in the midst of implementing the lighting prepass renderer on the XB360 as a possible solution to our engine being pixel shader bound in lighting/PCF shadow-map taps. PS3 has its own intricacies because we use HDR RGBE encoding and I'll tackle that separately.

Regardless, we're hoping it will be a win on both platforms.

I have few questions / comments.
1) Because we want to preserve accurate specular color per pixel we are factoring our lighting equations as follows:

float4 finalColor(0,0,0,0);
For each light:
finalColor += N.L * LightColor.rgb * (MaterialDiffuseColor.rgb + MaterialSpecColor.rgb * specular)

Where specular = (R dot V)^MaterialSpecPower and
half3 R = N * (2 * N.L).xxx - ToLight;

For our lighting prepass deferred rendering implementation we factor out the MaterialDiffuseColor and MaterialSpecColor so that you precompute:

N.L* LightColor in one buffer
N.L* LightColor * specular.xxx in another buffer

In our engine, the prelighting deferred render passes would work as follows:

Pass 1) Depth only prepass
Pass 2) For each object render normals to RGB and Material Spec power to Alpha
Pass 3) For each light render N.L * LightColor and N.L* LightColor * specular and store in separate light buffers using MRT
Pass 4) Render Decal
Pass 5) Render reflections
Pass 6) Forward render looking up values stored in passes 3-5

2) One question we have is how you handle 2x MSAA? Our current thought is that Pass 1, 2 & 6 are rendered using regular 2x MSAA.
Pass 3 light volumes are a little more tricky because we lose the information about the geometry we're going to shade with them so we're thinking we'll have to do two MRT passes per light using a multi-sample mask. In Pass 6 we'll use centroid sampling so that we can guarantee all the fragments we render will be within polygon boundaries.

Pass 3 A) mask off fragment 1 and write out fragment 0 samples using MRT for diffuse and specular light color.

Pass 3 B) mask off fragment 0 and write out fragment 1 samples using MRT for diffuse and specular light color.

The other alternative is to super sample:

Blit MSAA depth buffer to a non MSAA depth buffer and use MRT, render supersampled with width*2

Can you describe how you ended up using MSAA?

3) Currently because we're so heavily pixel bound we're thinking of going in two directions - either the deferred lighting prepass approach or forward rendering with better subdivision of our geometry (above and beyond material/vertex type and sector boundaries) with the goal of getting the minimum number of lights to affect a pixel.

The argument we haven't been able to make is how offline precomputed visibility works well with a lighting prepass deferred renderer in light of the fact that our engine is entirely pixel bound. It seems that for the deferred approach any subdivision beyond vertex type, material and sector boundaries would only help the CPU which would result in a zero net gain for scenes that are pixel bound. Can you comment on this?

Wolfgang Engel said...

<<<
accurate specular color
<<<
There is no such thing as accurate specular (light) color. You want to have material specular color ... light does not have a specular color. The specular color is created through material interaction.
I don't know who came up with the idea of using a specular light color but it is substantially wrong and I would think it was just an optimization for hardware that was not able to use material specular color.

Currently I MSAA everything on my target platforms. I did not spend much time with this because I consider it not critical. If MSAA'ing the light buffer is too expensive I just won't do it :-)

PS3 HDR: there is Quasi HDR and then there is the LUV color model. If you want the Light Pre-Pass to use the LUV color model you might check out Pat Wilson's article in ShaderX7. The LUV model gives you even more accurate color quality.

Benjamin said...

Wolfgang,

You're right. We have modified the blinn-phong equation to apply lightcolor to the specular component.
Accurate in this sense is in support of this shading model.

What about the benefits of precomputed visibility with light pre-pass ? Have you seen this help GPU bound scenes at all?

multisample said...

I find this technique interesting but since (ignoring the n exponent for now) ...

(N.H + N2.H2)^m != N.H^m + N2.H2^m

I realize we are just trying to approximate the result, but what happens if your accumulated N.H^n values go over 1.0 before you apply your material specular exponent (m) ? It seems it would break down quickly in that case. What am I missing here ?

Wolfgang Engel said...

You are not missing anything. You can do it this way, or you can just do it as in a Deferred renderer (storing spec in the G-Buffer) or in about four other ways. I think I counted six ways to handle specular. I describe all this in my ShaderX7 article.

multisample said...

Ok, so what are you doing to deal with that ? (anything) ? In our games its quite easy to go over 1.0 with multiple lights, and even just a bit over 1.0 can quickly degenerate.

You could probably rescale the values to fall below 1.0 with a smooth function before applying the exponent. This seems pretty reasonable, and better than saturating as that can cause nasty banding.

Note: Just trying to work out most of issues in my head before I attempt the same in our codebase.
When is ShaderX7 out ? I'd like to read more on this and the YUV method of storage.

Wolfgang Engel said...

Well this problem shows up in any multi-light solution. As soon as you have have several lights occupying the same spot, it start adding up.
I just ignore it for now because in the game environment I have this running I can do this. The LUV solution is probably more forgiving. Having a 16-bit per channel buffer would be better as well :-)

Wolfgang Engel said...

oh and if you write an e-mail to Pat Wilson from Garagegames (he contributed to one of the Light Pre-Pass threads on this blog) he might share his article for ShaderX7 with you for proof-reading.

multisample said...

Thanks for the info....
To be clear, I am not referring to the limited range of RGBA8; I am referring to the exponential of an accumulated value being out of the usual 0->1 range.

ie..

(N.H)^n, where N.H total is > 1.0.

pow(0.6, 32) == 0.0000000795
pow(1.2, 32) == 341.82

if you have two lights each with N.H values of 0.6, the sum reaches 1.2 before the exponential in this method. If you were to do the exponential inline, then the result would not expand.

If you aren't going out of 0-1 range, then of course this is not an issue.

Messiah Andrew said...

I came up with a similar light-prepass idea then stumbled upon your other blog post. You've provided inspiration for me to follow it through to integrate it into my engine.

Anyway, couldn't you use HSL colour instead of RGB and save a channel by using:
Lightness * R.V^n * N.L * Att

?

Wolfgang Engel said...

Sounds good to me. Have you tried it? Pat Wilson described in his ShaderX7 article how to use CIE Luv color space to do something similar.

Messiah Andrew said...

To save conversion between colour spaces, since you're using additive rendering you could multiply each colour channel by the intensity and free the alpha channel to store extra data in.

Wolfgang Engel said...

I can't remember exactly but can you add up any component of HSL easily? It seems to me that wouldn't work. You would have to go through a compression / decompression phase to add up lights. What do you think?