I am going to teach GPU Programming in the upcoming quarter at UCSD. Look out for course CSE 190. Here is the announcment:
Course Objectives:
This course will cover techniques on how to implement 3D graphics
techniques in an efficient way on the Graphics Processing Unit (GPU).
Course Description:
This course focuses on algorithms and approaches for programming a
GPU, including vertex, hull, tesselator, domain, geometry, pixel and
compute shaders. After an introduction into each of the algorithms,
the students will learn step-by-step on how to implement those
algorithms on the GPU. Particular subjects may include geometry
manipulations, lighting, shadowing, real-time global illumination,
image space effects and 3D Engine design.
Example Textbook(s):
A list of reading assignments will be given out each week.
Laboratory work:
Programming assignments.
Very exciting :-)
Sunday, November 29, 2009
Order-Independent Transparency
Transparent objects that require alpha blending cannot be rendered on top in a G-Buffer. Blending two or more normals, depth or position values leads to wrong results.
In other words deferred lighting of objects that need to be visible through each other is not easily possible because the data for the object that is visible through another object is lost in a G-Buffer that can only store one layer of data for normals, depth and position.
The traditional way to work around this is to have a separate rendering path that deals with rendering and lighting of transparent objects that need to be alpha blended. In essence that means there is a second lighting system that can be forward rendered and usually has a lower quality than the deferred lights.
This system breaks down as soon as you have light numbers that are higher than a few dozen lights because forward rendering can't render so many lights. In that case it would be an advantage to use the same deferred lighting system that is used on opaque objects on transparent objects that would require alpha blending.
The simple case is for example windows where you can look through one window and maybe two more windows behind each other and see what is behind them. For example you look through the window from the outside into a house and then in the house is another glass wall through which you can look and then behind that glass wall is a freshwater tank that is lit ... etc. you got the idea.
This would be the "light" case to solve. Much harder are scenarios in which the number of transparent objects that can be behind each other is much higher ... like with particles or a room of transparent T-pots :-).
On DirectX9 and DirectX 10 class of hardware, one of the solutions that is mentioned to solve the problem of order-independent transparency is called Depth Peeling. It seems this techniques was first described by Abraham Mammen ("Transparency and antialiasing algorithms Implemented with the virtual pixel maps technique", IEEE Computer Graphics and Applications, vol. 9, no. 4, pp. 43-55, July/Aug. 1989) and Paul Diefenbach ("Pipeline rendering: Interaction and realism through hardware-based multi-pass rendering", Ph.D., University of Pennsylvania, 1996, 152 pages)(I don't have access to those papers). A description of the implementation was given by Cass Everitt here. The idea is to extract each unique depth in a scene into layers. Those layers are then composited in depth-sorted order to produce the correct blended image.
In other words: the standard depth test gives us the nearest fragment/pixel. The next pass over the scene gives us the second nearest fragment/pixel; the pass after this pass the third nearest fragment/pixel. The passes after the first pass are rendered by using the depth buffer computed in the first pass and "peel away" depths values that are less than or equal to the values in that depth buffer. All the values that are not "peeled away" are stored in another depth buffer. Pseudo code might look like this:
const float bias 0.0000001;
// peel away pixels from previous layers
// use a small bias to avoid precision issues.
clip(In.pos.z - PreviousPassDepth - bias);
By using the depth values from the previous pass for the following pass, multiple layers of depth can be stored. As soon as all the depth layers are generated, for each of the layers the G-Buffer data needs to be generated. This might be the color and normal render targets. In case we want to store three layers of depth, color and normal data also need to be stored for those three depth layers.
Having a scene that has many transparent objects overlay each other, the number of layers increases substantially and therefore the memory consumption.
A more advanced depth peeling technique was named Dual Depth Peeling and described by Louis Bavoil et al. here. The main advantage of this technique is that it peels a layer from the front and a layer from the back at the same time. This way four layers can be peeled away in two geometry passes.
On hardware that doesn't support independent blending equations in MRTs, the two layers per pass are generated by using MAX blending and writing out each component of a float2(-depth, depth) variable into a dedicated render target that is part of a MRT.
Nicolas Thibieroz describes in "Robust Order-Independent Transparency via Reverse Depth Peeling in DirectX 10" in ShaderX6 a technique called Reverse Depth Peeling. While depth peeling extracts layers in a front-to-back order and stores them for later usage, his technique peels the layers in back-to-front order and can blend with the backbuffer immediately. There is no need to store all the layers compared to depth peeling. Especially on console platforms this is a huge advantage.
The order of operations is:
1. Determine furthest layer
2. Fill-up depth buffer texture
3. Fill-up normal and color buffer
4. Do lighting & shadowing
5. Blend in backbuffer
6. Go to 1 for the next layer
Another technique is giving up MSAA and using the samples to store up to eight layers of data. Kevin Myers et al. uses in the article "Stencil Routed A-Buffer" the stencil buffer to do sub-pixel routing of fragments. This way eight layers can be written in one pass. Because the layers are not ordered by depth they need to be sorted afterwards. The drawbacks are that the algorithm is limited to eight layers, allocates lots of memory (8xMSAA can be depending on the underlying implementation a 8x screen-size render target), requires hardware that supports 8xMSAA and the bitonic sort might be expensive. Giving up MSAA, the "light" case described above would be easily possible with this technique with satisfying performance but it won't work on scenes where many objects are visible behind several other objects.
Another technique extends Dual Depth Peeling by attaching a sorted bucket list. The article "Efficient Depth Peeling via Bucket Sort" by Fang Liu et al. describes an adaptive scheme that requires two geometry passes to store depth value ranges in a bucket list, sorted with the help of a depth histogram. An implementation will be described in the upcoming book GPU Pro. The following image from this article shows the required passes.
The Initial Pass is similar to Dual Depth Peeling. Similar to other techniques that utilize eight render targets, 32:32:32:32 each, the technique has huge memory requirements.
To my knowledge those are the widely known techniques for order-independent transparency on DirectX 10 today. Do you know of any newer techniques suitable for DirectX 10 or DirectX 11 hardware?
In other words deferred lighting of objects that need to be visible through each other is not easily possible because the data for the object that is visible through another object is lost in a G-Buffer that can only store one layer of data for normals, depth and position.
The traditional way to work around this is to have a separate rendering path that deals with rendering and lighting of transparent objects that need to be alpha blended. In essence that means there is a second lighting system that can be forward rendered and usually has a lower quality than the deferred lights.
This system breaks down as soon as you have light numbers that are higher than a few dozen lights because forward rendering can't render so many lights. In that case it would be an advantage to use the same deferred lighting system that is used on opaque objects on transparent objects that would require alpha blending.
The simple case is for example windows where you can look through one window and maybe two more windows behind each other and see what is behind them. For example you look through the window from the outside into a house and then in the house is another glass wall through which you can look and then behind that glass wall is a freshwater tank that is lit ... etc. you got the idea.
This would be the "light" case to solve. Much harder are scenarios in which the number of transparent objects that can be behind each other is much higher ... like with particles or a room of transparent T-pots :-).
On DirectX9 and DirectX 10 class of hardware, one of the solutions that is mentioned to solve the problem of order-independent transparency is called Depth Peeling. It seems this techniques was first described by Abraham Mammen ("Transparency and antialiasing algorithms Implemented with the virtual pixel maps technique", IEEE Computer Graphics and Applications, vol. 9, no. 4, pp. 43-55, July/Aug. 1989) and Paul Diefenbach ("Pipeline rendering: Interaction and realism through hardware-based multi-pass rendering", Ph.D., University of Pennsylvania, 1996, 152 pages)(I don't have access to those papers). A description of the implementation was given by Cass Everitt here. The idea is to extract each unique depth in a scene into layers. Those layers are then composited in depth-sorted order to produce the correct blended image.
In other words: the standard depth test gives us the nearest fragment/pixel. The next pass over the scene gives us the second nearest fragment/pixel; the pass after this pass the third nearest fragment/pixel. The passes after the first pass are rendered by using the depth buffer computed in the first pass and "peel away" depths values that are less than or equal to the values in that depth buffer. All the values that are not "peeled away" are stored in another depth buffer. Pseudo code might look like this:
const float bias 0.0000001;
// peel away pixels from previous layers
// use a small bias to avoid precision issues.
clip(In.pos.z - PreviousPassDepth - bias);
By using the depth values from the previous pass for the following pass, multiple layers of depth can be stored. As soon as all the depth layers are generated, for each of the layers the G-Buffer data needs to be generated. This might be the color and normal render targets. In case we want to store three layers of depth, color and normal data also need to be stored for those three depth layers.
Having a scene that has many transparent objects overlay each other, the number of layers increases substantially and therefore the memory consumption.
A more advanced depth peeling technique was named Dual Depth Peeling and described by Louis Bavoil et al. here. The main advantage of this technique is that it peels a layer from the front and a layer from the back at the same time. This way four layers can be peeled away in two geometry passes.
On hardware that doesn't support independent blending equations in MRTs, the two layers per pass are generated by using MAX blending and writing out each component of a float2(-depth, depth) variable into a dedicated render target that is part of a MRT.
Nicolas Thibieroz describes in "Robust Order-Independent Transparency via Reverse Depth Peeling in DirectX 10" in ShaderX6 a technique called Reverse Depth Peeling. While depth peeling extracts layers in a front-to-back order and stores them for later usage, his technique peels the layers in back-to-front order and can blend with the backbuffer immediately. There is no need to store all the layers compared to depth peeling. Especially on console platforms this is a huge advantage.
The order of operations is:
1. Determine furthest layer
2. Fill-up depth buffer texture
3. Fill-up normal and color buffer
4. Do lighting & shadowing
5. Blend in backbuffer
6. Go to 1 for the next layer
Another technique is giving up MSAA and using the samples to store up to eight layers of data. Kevin Myers et al. uses in the article "Stencil Routed A-Buffer" the stencil buffer to do sub-pixel routing of fragments. This way eight layers can be written in one pass. Because the layers are not ordered by depth they need to be sorted afterwards. The drawbacks are that the algorithm is limited to eight layers, allocates lots of memory (8xMSAA can be depending on the underlying implementation a 8x screen-size render target), requires hardware that supports 8xMSAA and the bitonic sort might be expensive. Giving up MSAA, the "light" case described above would be easily possible with this technique with satisfying performance but it won't work on scenes where many objects are visible behind several other objects.
Another technique extends Dual Depth Peeling by attaching a sorted bucket list. The article "Efficient Depth Peeling via Bucket Sort" by Fang Liu et al. describes an adaptive scheme that requires two geometry passes to store depth value ranges in a bucket list, sorted with the help of a depth histogram. An implementation will be described in the upcoming book GPU Pro. The following image from this article shows the required passes.
The Initial Pass is similar to Dual Depth Peeling. Similar to other techniques that utilize eight render targets, 32:32:32:32 each, the technique has huge memory requirements.
To my knowledge those are the widely known techniques for order-independent transparency on DirectX 10 today. Do you know of any newer techniques suitable for DirectX 10 or DirectX 11 hardware?
Sunday, November 15, 2009
You want to become a Graphics Programmer ...
I regularly receive e-mails with the question what kind of books I recommend if someone wants to become a graphics programmer. Here is my current list (maybe some of you guys can add to this list?):
First of all math is required:
- Vector Calculus
- Vector Calculus, Linear Algebra, and Differential Forms I have the 1999 version of this book
- Computer Graphics Mathematical First Steps
- Mathematics for Computer Graphics
For a general knowledge in programming the CPU:
- Write Great Code Volume 1: Understanding the Machine
For a better knowledge on how to program the GPU:
- DirectX documentation
- NVIDIA GPU Programming Guide
- ATI GPU Programming Guide
To learn about how to program certain effects in an efficient way:
- ShaderX - ShaderX7
- GPU Gems - GPU Gems 3
- GPU Pro and GPU Pro Blog
To start learning DirectX 10 API + Shader Programming:
- Introduction to 3D Programming with DirectX 10
- Programming Vertex, Geometry and Pixel Shaders
To start learning OpenGL & OpenGL ES:
- Khronos group
For general overview:
- Real-Time Rendering
- Fundamentals of Computer Graphics (this one also belongs in the math section)
To get started with C:
- C Programming Language
To learn C++
- C++ for Game Developers
- C++ Cookbook
- there is a long list of more advanced C++ books ...
First of all math is required:
- Vector Calculus
- Vector Calculus, Linear Algebra, and Differential Forms I have the 1999 version of this book
- Computer Graphics Mathematical First Steps
- Mathematics for Computer Graphics
For a general knowledge in programming the CPU:
- Write Great Code Volume 1: Understanding the Machine
For a better knowledge on how to program the GPU:
- DirectX documentation
- NVIDIA GPU Programming Guide
- ATI GPU Programming Guide
To learn about how to program certain effects in an efficient way:
- ShaderX - ShaderX7
- GPU Gems - GPU Gems 3
- GPU Pro and GPU Pro Blog
To start learning DirectX 10 API + Shader Programming:
- Introduction to 3D Programming with DirectX 10
- Programming Vertex, Geometry and Pixel Shaders
To start learning OpenGL & OpenGL ES:
- Khronos group
For general overview:
- Real-Time Rendering
- Fundamentals of Computer Graphics (this one also belongs in the math section)
To get started with C:
- C Programming Language
To learn C++
- C++ for Game Developers
- C++ Cookbook
- there is a long list of more advanced C++ books ...