Calculating screen space texture coordinates for the 2D projection of a volume is more complicated than for an already transformed full-screen quad. Here is a step-by-step approach on how to achieve this:
1. Transforming position into projection space is done in the vertex shader by multiplying the concatenated World-View-Projection matrix.
2. The Direct3D run-time will now divide those values by Z; stored in the W component. The resulting position is then considered in clipping space, where the x and y value is clipped to the [-1.0, 1.0] range.
xclip = xproj / wproj
yclip = yproj / wproj
3. Then the Direct3D run-time transforms position into viewport space from the value range [-1.0, 1.0] to the range [0.0, ScreenWidth/ScreenHeight].
xviewport = xclipspace * ScreenWidth / 2 + ScreenWidth / 2
yviewport = -yclipspace * ScreenHeight / 2 + ScreenHeight / 2
This can be simplified to:
xviewport = (xclipspace + 1.0) * ScreenWidth / 2
yviewport = (1.0 - yclipspace ) * ScreenHeight / 2
The result represents the position on the screen. The y component need to be inverted because in world / view / projection space it increases in the opposite direction than in screen coordinates.
4. Because the result should be in texture space and not in screen space, the coordinates need to be transformed from clipping space to texture space. In other words from the range [-1.0, 1.0] to the range [0.0, 1.0].
u = (xclipspace + 1.0) * 1 / 2
v = (1.0 - yclipspace ) * 1 / 2
5. Due to the texturing algorithm used by Direct3D, we need to adjust texture coordinates by half a texel:
u = (xclipspace + 1.0) * ½ + ½ * TargetWidth
v = (1.0 - yclipspace ) * ½ + ½ * TargetHeight
Plugging in the x and y clipspace coordinates results from step 2:
u = (xproj / wproj + 1.0) * ½ + ½ * TargetWidth
v = (1.0 - yproj / wproj ) * ½ + ½ * TargetHeight
6. Because the final calculation of this equation should happen in the vertex shader results will be send down through the texture coordinate interpolator registers. Interpolating 1/ wproj is not the same as 1 / interpolated wproj. Therefore the term 1/ wproj needs to be extracted and applied in the pixel shader.
u = 1/ wproj * ((xproj + wproj) * ½ + ½ * TargetWidth * wproj)
v = 1/ wproj * ((wproj - yproj) * ½ + ½ * TargetHeight* wproj)
The vertex shader source code looks like this:
Float4 vPos = float4(0.5 * (float2(p.x + p.w, p.w – p.y) + p.w * inScreenDim.xy), pos.zw)
The equation without the half pixel offset would start at No. 4 like this:
u = (xclipspace + 1.0) * 1 / 2
v = (1.0 - yclipspace ) * 1 / 2
Plugging in the x and y clipspace coordinates results from step 2:
u = (xproj / wproj + 1.0) * ½
v = (1.0 - yproj / wproj ) * ½
Moving 1 / wproj to the front leads to:
u = 1/ wproj * ((xproj + wproj) * ½)
v = 1/ wproj * ((wproj - yproj) * ½)
Because the pixel shader is doing the 1 / wproj, this would lead to the following vertex shader code:
Float4 vPos = float4(0.5 * (float2(p.x + p.w, p.w – p.y)), pos.zw)
All this is based on a response of mikaelc in the following thread:
Lighting in a Deferred Renderer and a response by Frank Puig Placeres in the following thread:
Reconstructing Position from Depth Data
8 comments:
You should also mention that the half pixel offset is specific to D3D9 as D3D10/OpenGL/consoles do not need to do this.
Code I have been using for years to do this (seen in my Light Index Deferred Rendering code)
Vertex Shader:
projectSpace = gl_ModelViewProjectionMatrix * gl_Vertex;
gl_Position = projectSpace;
projectSpace.xy = (projectSpace.xy + vec2(projectSpace.w)) * 0.5;
Fragment shader:
vec4 texValue = texture2DProj( TextureID, projectSpace);
I am doing my world space reconstruction using mostly comments found here: http://forum.beyond3d.com/showthread.php?t=45628
To store, in HLSL:
float3 wsPos = IN.pos.xyz / IN.pos.w;
float depth = dot( vEye, wsPos - eyePos );
Where IN.pos comes from the VShader and is:
OUT.pos = mul( objToWorldMat, IN.position );
vEye is a shader constant, and is the world-space view-vector normalized to 1/zFar
eyePos is a shader constant, and is the world-space eye position
I am storing depth in 16 bits as an integer and this seems to be plenty.
To reconstruct:
float3 worldPos = eyePos + eyeRay * depth;
eyePos is a shader constant, world-space eye position.
eyeRay is:
-For a full-screen quad:
Calculate in vertex shader:
OUT.wsEyeRay = float4( IN.wsFrustCoord - eyePos, 1.0 );
Calculate in pixel shader:
OUT.wsEyeRay = float4( IN.normal - eyePos, 1.0 );
In the vertex shader, it is a full screen quad, and each vertex has the world-space co-ordinate of the far-frustum plane. I am calculating like this:
Point3F farFrustumCorners[4];
farFrustumCorners[0].set( frustLeft * zFarOverNear, zFar, frustBottom * zFarOverNear );
farFrustumCorners[1].set( frustLeft * zFarOverNear, zFar, frustTop * zFarOverNear );
farFrustumCorners[2].set( frustRight * zFarOverNear, zFar, frustTop * zFarOverNear );
farFrustumCorners[3].set( frustRight * zFarOverNear, zFar, frustBottom * zFarOverNear );
MatrixF camToWorld = thisFrame.worldToCamera;
camToWorld.inverse();
for( int i = 0; i < 4; i++ )
camToWorld.mulP( farFrustumCorners[i] );
-For convex geometry:
In Pixel shader:
float3 eyeRay = getDistanceVectorToPlane( negFarPlaneDotEye, IN.wsPos.xyz / IN.wsPos.w, farPlane );
'negFarPlaneDotEye' is a shader constant which is:
-dot( worldSpaceFarPlane, eyePosition )
'farPlane' is a shader constant which is the world-space far-plane.
This function is from that thread:
inline float3 getDistanceVectorToPlane( in float negFarPlaneDotEye, in float3 direction, in float4 plane )
{
float denum = dot( plane.xyz, direction.xyz );
float t = negFarPlaneDotEye / denum;
return direction.xyz * t;
}
-----
This works well for me. I am sure it can be optimized further.
Hi Damian,
Yes this is without the DX9 offset the same. So you can consider it trivial but in my specific case we forgot about the half pixel offset :-~ so I had to figure out why there is light leaking around a person :-) (we use this also to fetch shadow maps that are in screen-space).
- Wolfgang
BTW: didn't you forget the divide by z? I would think there is something like
projectSpace.xy /= projectSpace.w
in there as well.
Hi Pat,
I think this is the Crytek approach that was covered in a SIGGRAPH 2007 session by Carsten Wenzel. This looks very cool to me. You have to generate a dedicated depth buffer for this?
- Wolfgang
Wolfgang,
(Pat Wilson from GarageGames)
It doesn't require a dedicated depth buffer. I am using these formats for g-buffers:
8:8:8:8
normal.theta|normal.phi|depthHi|depthLo
16:16:16:16
normal.theta|normal.phi|foo|depth
The reason I chose this method for world space reconstruction is because it is very cheap, requiring only 1 mad in the case of a FS quad.
The z-data that is stored is also very good because it is linear, and it is in the range 0..1 where 1 is zFar in camera space. I like integer formats over FP16 formats for the G-buffer because I can control the ranges of the data.
I haven't done enough profiling to know for sure, but I think that using an 8:8:8:8 g-target may hit light shader performance significantly (it is slower on high-bandwidth cards, but not as much on low-bandwidth cards). The first thing the light does is sample from the G-buffer, but then every subsequent thing that it does is dependent on knowing the depth.
Sorry I just re-read that comment and realized none of those thoughts were really complete. I was distracted.
For the g-buffer I am storing world space normals using spherical coordinates. For the 8:8:8:8 target case, if you store the normal.xy and reconstruct z, you need to know the sign of z, since it's really +/-sqrt(1 - normal.xy *normal.xy). So before I switched to spherical storage, I had to store stuff like this:
8:8:8:8
normal.xy_8_8|sign(z)_1|depthHi_7|depthLo_8
This added unpack time to retrieving the depth value, and the normal value. It also only gave me 15 bits to store depth, instead of 16. Switching to spherical coordinates got me back that bit, and removed another op from getting the depth value.
The actual format for the spherical is generated like this:
inline float2 cartesianToSpGPU( in float3 normalizedVec )
{
float atanYX = atan2( normalizedVec.y, normalizedVec.x );
float2 ret = float2( atanYX / PI, normalizedVec.z );
return (ret + 1.0) * 0.5;
}
and retrieved like this:
inline float3 spGPUToCartesian( in float2 spGPUAngles )
{
float2 expSpGPUAngles = spGPUAngles * 2.0 - 1.0;
float2 scTheta;
sincos( expSpGPUAngles.x * PI, scTheta.x, scTheta.y );
float2 scPhi = float2( sqrt( 1.0 - expSpGPUAngles.y * expSpGPUAngles.y ), expSpGPUAngles.y );
// Renormalization not needed
return float3( scTheta.y * scPhi.x, scTheta.x * scPhi.x, scPhi.y );
}
It is slightly more expensive to re-construct than the cartesian, but I think (this may not be true) that because the light shaders use the surface normal last, the GPU can do the work whenever it has time.
On lower end cards, the atan2 and sincos functions take longer. Some of the ATI boards with the unified shaders assign 4 shader cores which can't do trancendental functions, and 1 which can per ALU. NVidia cards have 4 cores per ALU, and each can do all ops. I encoded sincos and atan2 into A8 lookup textures for that case, and it works better.
Hi Wolfgang,
The "w" divide is done automatically in the texture2DProj call. I believe the D3D version is called tex2Dproj.
Oh and I am sure you know, that consoles provide a render state (ones that are D3D9 based anyway) to turn off this annoying half pixel thing.
Post a Comment