Sunday, November 29, 2009

Order-Independent Transparency

Transparent objects that require alpha blending cannot be rendered on top in a G-Buffer. Blending two or more normals, depth or position values leads to wrong results.
In other words deferred lighting of objects that need to be visible through each other is not easily possible because the data for the object that is visible through another object is lost in a G-Buffer that can only store one layer of data for normals, depth and position.
The traditional way to work around this is to have a separate rendering path that deals with rendering and lighting of transparent objects that need to be alpha blended. In essence that means there is a second lighting system that can be forward rendered and usually has a lower quality than the deferred lights.
This system breaks down as soon as you have light numbers that are higher than a few dozen lights because forward rendering can't render so many lights. In that case it would be an advantage to use the same deferred lighting system that is used on opaque objects on transparent objects that would require alpha blending.
The simple case is for example windows where you can look through one window and maybe two more windows behind each other and see what is behind them. For example you look through the window from the outside into a house and then in the house is another glass wall through which you can look and then behind that glass wall is a freshwater tank that is lit ... etc. you got the idea.
This would be the "light" case to solve. Much harder are scenarios in which the number of transparent objects that can be behind each other is much higher ... like with particles or a room of transparent T-pots :-).

On DirectX9 and DirectX 10 class of hardware, one of the solutions that is mentioned to solve the problem of order-independent transparency is called Depth Peeling. It seems this techniques was first described by Abraham Mammen ("Transparency and antialiasing algorithms Implemented with the virtual pixel maps technique", IEEE Computer Graphics and Applications, vol. 9, no. 4, pp. 43-55, July/Aug. 1989) and Paul Diefenbach ("Pipeline rendering: Interaction and realism through hardware-based multi-pass rendering", Ph.D., University of Pennsylvania, 1996, 152 pages)(I don't have access to those papers). A description of the implementation was given by Cass Everitt here. The idea is to extract each unique depth in a scene into layers. Those layers are then composited in depth-sorted order to produce the correct blended image.
In other words: the standard depth test gives us the nearest fragment/pixel. The next pass over the scene gives us the second nearest fragment/pixel; the pass after this pass the third nearest fragment/pixel. The passes after the first pass are rendered by using the depth buffer computed in the first pass and "peel away" depths values that are less than or equal to the values in that depth buffer. All the values that are not "peeled away" are stored in another depth buffer. Pseudo code might look like this:

const float bias 0.0000001;

// peel away pixels from previous layers
// use a small bias to avoid precision issues.
clip(In.pos.z - PreviousPassDepth - bias);

By using the depth values from the previous pass for the following pass, multiple layers of depth can be stored. As soon as all the depth layers are generated, for each of the layers the G-Buffer data needs to be generated. This might be the color and normal render targets. In case we want to store three layers of depth, color and normal data also need to be stored for those three depth layers.
Having a scene that has many transparent objects overlay each other, the number of layers increases substantially and therefore the memory consumption.

A more advanced depth peeling technique was named Dual Depth Peeling and described by Louis Bavoil et al. here. The main advantage of this technique is that it peels a layer from the front and a layer from the back at the same time. This way four layers can be peeled away in two geometry passes.
On hardware that doesn't support independent blending equations in MRTs, the two layers per pass are generated by using MAX blending and writing out each component of a float2(-depth, depth) variable into a dedicated render target that is part of a MRT.

Nicolas Thibieroz describes in "Robust Order-Independent Transparency via Reverse Depth Peeling in DirectX 10" in ShaderX6 a technique called Reverse Depth Peeling. While depth peeling extracts layers in a front-to-back order and stores them for later usage, his technique peels the layers in back-to-front order and can blend with the backbuffer immediately. There is no need to store all the layers compared to depth peeling. Especially on console platforms this is a huge advantage.
The order of operations is:

1. Determine furthest layer
2. Fill-up depth buffer texture
3. Fill-up normal and color buffer
4. Do lighting & shadowing
5. Blend in backbuffer
6. Go to 1 for the next layer

Another technique is giving up MSAA and using the samples to store up to eight layers of data. Kevin Myers et al. uses in the article "Stencil Routed A-Buffer" the stencil buffer to do sub-pixel routing of fragments. This way eight layers can be written in one pass. Because the layers are not ordered by depth they need to be sorted afterwards. The drawbacks are that the algorithm is limited to eight layers, allocates lots of memory (8xMSAA can be depending on the underlying implementation a 8x screen-size render target), requires hardware that supports 8xMSAA and the bitonic sort might be expensive. Giving up MSAA, the "light" case described above would be easily possible with this technique with satisfying performance but it won't work on scenes where many objects are visible behind several other objects.

Another technique extends Dual Depth Peeling by attaching a sorted bucket list. The article "Efficient Depth Peeling via Bucket Sort" by Fang Liu et al. describes an adaptive scheme that requires two geometry passes to store depth value ranges in a bucket list, sorted with the help of a depth histogram. An implementation will be described in the upcoming book GPU Pro. The following image from this article shows the required passes.

The Initial Pass is similar to Dual Depth Peeling. Similar to other techniques that utilize eight render targets, 32:32:32:32 each, the technique has huge memory requirements.

To my knowledge those are the widely known techniques for order-independent transparency on DirectX 10 today. Do you know of any newer techniques suitable for DirectX 10 or DirectX 11 hardware?


Unknown said...

You forgot to mention technique from inferred lighting paper.

Matt Enright said...

The latest ATI demo uses Direct Compute to sort blended layers:

sebh said...

A simple method to take into account the closest transparent layer to the virtual camera in a deferred renderer was also presented by Evans in "Graphics Engine Postmortem from LittleBigPlanet" (Siggraph 2009).

Wolfgang Engel said...

Yep I forgot stippling and Alex Evans technique for two depth layers.
For my purposes both techniques are not generic enough.
ATI's approach sounds interesting so.

Johannes Totz said...

Hi Wolfgang,
for our non-game (medical) rendering we do depth-peeling front-to-back and shade each layer with standard deferred-shading. Shading blends into the final buffer (i.e. on screen).

Works quite nicely with scenes of around 1M triangles.

The trouble we have is with the z-bias. It appears to require values of around 0.00003, which is much more than I was expecting... (we use OpenGL btw).

Victor Coda said...

Yes, compute shader + scattered writes seems to be the best technique for OIT.

Anonymous said...

Paul Diefenbach's thesis can be found here: