Monday, June 29, 2009

Ambient Occlusion in Screen-Space

Screen-Space Ambient Occlusion (SSAO) is quite popular in the moment. ShaderX7 had several articles and there are lots of approaches to gradually improve the effect.
A good way to look at SSAO or any similar approach is to consider it part of a whole pipeline of effects that can share resources and extend the idea to include one diffuse (and specular) indirect bounce of light by re-using resources.
The overall issues with SSAO are:
1. quite expensive for the image quality improvement. Using the astonishing high amount of frame-time for other effects is an intriguing idea. In other words the performance / quality-improvement ratio is not very good compared to e.g. PostFX where a bunch of effects consumes a similar amount of time.
2. a typical problem is that lighting is ignored by SSAO. Using the classical SSAO implementation under varying illumination introduces objectionable artifacts because the ambient term is darkened equally (obviously you can apply SSAO to the diffuse and specular term like a shadow term ... but then it isn't ambient anymore). If you have a "global ambient" light term like skylights, SSAO will diminish the effect. It also leads to problems with dynamic shadows.

Overall I believe a fundamental shift to more generic method is necessary to solve those issues. This is one of the things I am looking into ... so expect an update at some point in the future.

Wednesday, June 17, 2009

MSAA on the PS3 with Deferred Lighting / Shading / Light Pre-Pass

The Killzone 2 team came up with an interesting way to use MSAA on the PS3. You can find it on page 39 of the following slides:

What they do is read both samples in the multisampled render target, do the lighting calculations for both of them and then average the result and write it into the multi-sampled (... I assume it has to be multi-sampled because the depth buffer is multisampled) accumulation buffer. That somehow decreases the effectiveness of MSAA because the pixel averages all samples regardless of whether they actually pass the depth-stencil test. The multisampled accumulation buffer may therefore contain different values per sample when it was supposed to contain a unique value representing the average of all sample. Then on the other side they might only store a value in one of the samples and resolve afterwards ... which would mean the pixel shader runs only once.
This is also called "on-the-fly resolves".

It is better to write into each sample a dedicated value by using the sampling mask but then you run in case of 2xMSAA your pixel shader 2x ... DirectX10.1+ has the ability to run the pixel shader per sample. That doesn't mean it fully runs per sample. The MSAA unit seems to replicate the color value accordingly. That's faster but not possible on the PS3. I can't remember if the XBOX 360 has the ability to run the pixel shader per-sample but this is possible.

Saturday, June 13, 2009

Multisample Anti-Aliasing

Utilizing the Multisample Anti-Aliasing (MSAA) functionality of graphics hardware for deferred lighting can be challenging. Nicolas Thibieroz wrote an excellent article about MSAA published in ShaderX7 with the title "Deferred Shading with Multisampling Anti-Aliasing in DirectX10".
The following figure from the ShaderX7 article shows how MSAA works:

The pixel represented by a square has two triangles (blue and yellow) crossing some of its sample points. The black dot represents the pixel sample location (pixel center); this is were the pixel shader is executed. The cross symbol corresponds to the location of the multisamples where the depth tests are performed. Samples passing the depth test receive the output of the pixel shader. Those samples are replicated by the MSAA back-end into a multisampled render target that represents each pixel with -in that case- four samples. That means the render target size for an intended resolution of 1280x720 would be 2560x1440 representing each pixel with four samples but the pixel shader only writes 1280x720 times (assuming there is no overdraw) while the MSAA back-end replicates for each pixel four samples into the multisampled render target.
With deferred lighting there can be several of those multi-sampled render targets as part of a Multiple-Render-Target (MRT). In the so called Geometry stage, data is written into this MRT; therefore called G-Buffer. In case of 4xMSAA each of the render targets of the G-Buffer would be 2560x1440 in size.
In case of Deferred Lighting / Light Pre-Pass the G-Buffer holds normal and depth data. This data can never be resolved because resolving it would lead to incorrect results as shown by Nicolas in his article.
After the Geometry phase comes the Lighting or Shading phase in a Deferred Lighting/Light Pre-Pass/Deferred Shading renderer. In an ideal world you could blit each sample (not pixel) into the multisampled render target -that holds the result of the Shading phase- by reading the G-Buffer sample and performing all the calculations necessary on it.
In other words to achieve the best possible MSAA quality with those renderer designs, lighting equations would need to be applied on a per-sample basis into a multisampled render target and then later resolved.
This is possible with DirectX 10.1 graphics hardware (AMD's 10.1 capable cards; didn't try if S3 cards that support 10.1 can do this as well) that allows to execute a pixel shader at sample frequency.
To make this a viable option, this operation needs to be restricted to samples that belong to pixel edges. There are two passes necessary to make this work. One pass will use the pixel shader that runs operations performed on samples and in a second pass the pixel shader is run that performs operations per-pixel, which means the result of the pixel shader calculation is output to all samples passing the depth-stencil test.
To restrict the pixel shader that performs operations per-sample, a stencil test is used.
One interesting idea covered in the article is to detect edges with centroid sampling (available already on DirectX9 class graphics hardware). During the G-Buffer phase the vertex shader writes a variable unique to every pixel (e.g. pixel position data) into two outputs, while the associated pixel shader declares two inputs: one without and one with centroid sampling enabled. The pixel shader then compares the centroid-enabled input with the one without it. Differing values mean that samples were only partially covered by the triangle, indicating an edge pixel. A "centroid value" of 1.0 is then written out to a selected area of the G-Buffer (previously cleared to 0.0) to indicate that the covered samples belong to an edge pixel. Those values are then averaged while being resolved to find out the value per pixel. If the result is not exactly 0, then the current pixel is an edge pixel. This is shown in the following image from the article.
On the left the pixel shader input will always be evaluated at the center of the pixel regardless of whether it is covered by the triangle. On the right with centroid sampling, the two rightmost depth samples are covered by the triangle. The comparison of the values in the pixel shader will lead to the result that the samples were only partially covered by the triangle, indicating an edge pixel.
Because DirectX10 capable graphics hardware does not support the pixel shader running at sample frequency, a different solution needs to be developed here.
The best MSAA quality in that case is achieved by running the pixel shader multiple times per pixel, only enabling output to a single sample each pass. This can be achieved by using the OMSetBlendState() API. The results of this method would be identical to the DirectX 10.1 method but obviously due to the increased number of rendering passes and slightly reduced texture cache effectiveness more expensive.