Benualdo posted in the Light Pre-Pass Thread a cool trick on how to detect edges to run a per-sample shader for MSAA (just in case centroid sampling doesn't work for you). Here it is:
----------
another stupid trick for edge detection pass on platforms that support sampling the MSAA surface with linear sampling: sample the normal buffer twice, once with POINT sampling and once with LINEAR sampling. Use clip(-abs(L-P)+eps). The linear sampled value should be used to compute the lighting of "non-MSAA" texels in the same shader to avoid an extra pass.
----------
eps is a small threshold value to bias the texkill test so that when the multisampled normals are only a little different then we could use the averaged value to perform the lighting at non-MSAA resolution during the first pass as an optimization.
Saturday, March 20, 2010
Saturday, February 27, 2010
GPU Pro
There is a blog concerning the upcoming book GPU Pro at http://gpupro.blogspot.com/.
I posted the Table of Contents for GPU Pro. You can pre-order it on Amazon here.
There is another blog for GPU Pro 2 with a call for authors, in case you want to see your name written in golden letters in a book :-)
I posted the Table of Contents for GPU Pro. You can pre-order it on Amazon here.
There is another blog for GPU Pro 2 with a call for authors, in case you want to see your name written in golden letters in a book :-)
Sunday, January 31, 2010
Hardware Tessellation
I was thinking about the advantages of Hardware Tessellation. I can see mainly three:
- Compression
Reduces on-disk storage, system, video memory usage ->only the coarse mesh is stored
Animation data is only stored for the coarse mesh
- Memory bandwidth
GPU fetches only vertex data of coarse mesh through PCI-E bus -> higher vertex cache and fetch performance
- Scalability
Subdivision is recursive -> offers auto-LOD with adaptive metrics
With the DirectX 11 implementation it might also reduce the workload of the vertex shader because the shader transforms or animates only the coarse mesh. But if we add up the additional workload of the hull and domain shader it might be a wash.
For console developers, being able to store more world geometry on disc and in memory would be a great advantage. The reduction of the read memory bandwidth -while reading the data from memory- would also increase the efficiency.
The main question is if tessellating the geometry puts such a huge workload on the GPU that it is not feasible. I would love to have some real-world data here ...
- Compression
Reduces on-disk storage, system, video memory usage ->only the coarse mesh is stored
Animation data is only stored for the coarse mesh
- Memory bandwidth
GPU fetches only vertex data of coarse mesh through PCI-E bus -> higher vertex cache and fetch performance
- Scalability
Subdivision is recursive -> offers auto-LOD with adaptive metrics
With the DirectX 11 implementation it might also reduce the workload of the vertex shader because the shader transforms or animates only the coarse mesh. But if we add up the additional workload of the hull and domain shader it might be a wash.
For console developers, being able to store more world geometry on disc and in memory would be a great advantage. The reduction of the read memory bandwidth -while reading the data from memory- would also increase the efficiency.
The main question is if tessellating the geometry puts such a huge workload on the GPU that it is not feasible. I would love to have some real-world data here ...