## Friday, January 10, 2014

### Visual Studio 2013 - C99 support

I think using C99 in game development could be useful for large teams, especially if they are distributed over several locations.
So I thought I look a little bit closer on the support of C99 in Visual Studio 2013 (we also use VS 2013 with C99 now in my UCSD class).
The new features that are support in VS 2013 are:

New features in 2013
- variable decls
- _Bool
- compound literals
- designated initializers

variadic macros, long long, __pragma, __FUNCTION__, and __restrict

What is missing:
- variable-length arrays (VLAs)
- Reserved keywords in C99
C99 has a few reserved keywords that are not recognized by C++:
restrict
_Bool -> this is now implemented ... see above
_Complex
_Imaginary
_Pragma
- restrict keyword
C99 supports the restrict keyword, which allows for certain optimizations involving pointers. For example:

void copy(int *restrict d, const int *restrict s, int n)
{
while (n-- > 0)
*d++ = *s++;
}
C++ does not recognize this keyword.
A simple work-around for code that is meant to be compiled as either C or C++ is to use a macro for the restrict keyword:

#ifdef __cplusplus
#define restrict    /* nothing */
#endif
(This feature is likely to be provided as an extension by many C++ compilers. If it is, it is also likely to be allowed as a reference modifier as well as a pointer modifier.)

Don't know if it is in there:
float  pi = 0x3.243F6A88p+03;
- C99 adds a few header files that are not included as part of the standard C++ library, though:
<complex.h>
<fenv.h>
<inttypes.h>
<stdbool.h>
<stdint.h>
<tgmath.h>

References

## Thursday, January 2, 2014

### CSE 190 - GPU Programming UCSD class Winter 2014

GPU Programming
With the new console generation and the advances in PC hardware, compute support is becoming more important in games. The new course in 2014 will therefore start with compute and we will spend about a 1/3 of the whole course talking about how it is used on next-gen consoles and in next-gen games. We will also look into several case studies and discuss the feasibility to "re-factor" existing game algorithms so that they run in compute. An emphasis is put here on effects that are traditionally used for post-processing effects.

The remaining 2 / 3 of the course will focus on the DirectX 11.2 graphics API and how it is used in games to create a rendering engine for a next-gen game. We will cover most of the fundamental concepts like the HLSL language, renderer design, lighting in games, how to generate shadows and we also discuss how transparency can be mimicked with techniques other than alpha blending.
The course will end with a survey of different real-time Global Illumination algorithms that are used in different types of games.

First Class
Overview
-- DirectX 11.2 Graphics
-- DirectX 11.2 Compute
-- Tools of the Trade - how to setup your development system
Introduction to DirectX 11.2 Compute
-- Memory Model
-- DirectX 10.x support

Second Class
Simple Compute Case Studies
- PostFX Color Filters
- PostFX Parallel Reduction
- DirectX 11 Mandelbrot
- DirectX 10 Mandelbrot

Third Class
DirectCompute performance optimization
- Histogram optimization case study

Fourth Class
Direct3D 11.2 Graphics Pipeline Part 1
- Direct3D 9 vs. Direct3D 11
- Direct3D 11 vs. Direct3D 11.1
- Direct3D 11.1 vs. Direct3D 11.2
- Resources (typeless memory arrays)
- Resource Views
- Resources Access Intention
- State Objects
- Pipeline Stages
-- Input Assembler
-- Tesselation
-- Stream Out
-- Setup / Rasterizer
-- Output Merger
-- Video en- / decoder access

Fifth Class
Direct3D 11.2 Graphics Pipeline Part 2
-- HLSL
--- Keywords
--- Basic Data Types
--- Vector Data Types
--- Swizzling
--- Matrices
--- Type Casting
--- SamplerState
--- Texture Objects
--- Intrinsics
--- Flow Control
-- Case Study: implementing Blinn-Phong lighting with DirectX 11.2
--- Physcially / Observational Lighting Models
--- Local / Global Lighting
--- Lighting Implementation
---- Ambient
---- Diffuse
---- Specular
---- Normal Mapping
---- Point Light
---- Spot Light

Sixth Class
Physically Based Lighting
- Normalized Blinn-Phong Lighting Model
- Cook-Torrance Reflectance Model

Seventh Class
Deferred Lighting, AA
- Rendering Many Lights History
- Light Pre-Pass (LPP)
- LPP Implementation
- Efficient Light rendering on DX 9, 10, 11
- Balance Quality / Performance
- MSAA Implementation on DX 10.0, 10.1, XBOX 360, 11
Screen-Space Materials
- Skin

Eigth Class
- “Attaching” a Shadow Map frustum around a view frustum
- CSM Challenges
- Softening the Penumbra

Nineth Class
Order-Independent Transparency
- Depth Peeling
- Reverse Depth Peeling

Tenth Class
Global Illumination Algorithms in Games
- Requirement for Real-Time GI
- Ambient Cubes
- Diffuse Cube Mapping
- Screen-Space Ambient Occlusion
- Screen-Space Global Illumination
- Splatting Indirect Illumination (SII)

Prerequisite
Each student should bring a DirectX 11.0 or higher capable notebook with Windows 7 or 8 into class. All the examples accompanying the class are build in C/C++ in Visual Studio 2013.

## Thursday, November 7, 2013

### Visual Studio 2013 / Demo Skeleton Programming

I updated my demo skeleton in the google code repository. It is using now Visual Studio 2013, that now partially supports C99 and therefore can compile the code. I updated the compute shader code a bit and I upgraded Crinkler to version 1.4. The compute shader example now also compiles the shader into a header file and then Crinkler compresses this file as part of the data compression. It packs now overall to 2,955 bytes.

If you have fun with this code, let me know ... :-)

## Monday, September 30, 2013

### Call for a new Post-Processing Pipeline - KGC 2013 talk

This is the text version of my talk at KGC 2013.
The main motivation for the talk was the idea of looking for fundamental changes that can bring a modern Post-Processing Pipeline to the next level.
Let's look first into the short history of Post-Processing Pipelines, where we are in the moment and where we might be going in the near future.

History
Probably one of the first Post-Processing Pipelines appeared in the DirectX SDK around 2004. It was a first attempt to implement HDR rendering. I believe from there on we called a collection of image space effects at the end of the rendering pipeline Post-Processing pipeline.
The idea was to re-use resources like render targets and data with as many image space effects as possible in a Post-Processing Pipeline.
A typical collection of screen-space effects were
• Tone-mapping + HDR rendering: the tone-mapper can be considered a dynamic contrast operator
• Camera effects like Depth of Field with shaped Bokeh, Motion Blur, lens flare etc..
• Full-screen color filters like contrast, saturation, color additions and multiplications etc..
One of the first coverages of a whole collection of effects in a Post-Processing Pipeline running on XBOX 360 / PS3 was done in [Engel2007].
Since then numerous new tone mapping operators were introduced [Day2012], new more advanced Depth of Field algorithms with shaped Bokeh were covered but there was no fundamental change to the concept of the pipeline.

Call for a new Post-Processing Pipeline
Let's start with the color space: RGB is not a good color space for a post-processing pipeline. It is well known that luminance variety is more important than color variety, so it makes sense to pick a color space that has luminance in one of the channels. With the 11:11:10 render targets it would be cool to store luminance in one of the 11 bit channels. Having luminance available in the pipeline without having to go through color conversions opens up many new possibilities, from which we will cover a few below.

Global tone mapping operators didn't work out well in practice. We looked at numerous engines in the last four years and a common decision by artists was to limit the luminance values by clamping them. The reasons for this were partially in the fact that the textures didn't provide enough quality to survive a "light adaptation" without blowing out or sometimes most of their resolution was in the low-end greyscale values and there wasn't just enough resolution to mimic light adaptations.
Another reason for this limitation was that the available resolution in the rendering pipeline with the RGB color space was not enough. Another reason for this limitation is the fact that we limited ourselves to Global tone mapping operators, because local tone mapping operators are considered too expensive.

A fixed global gamma adjustment at the end of the pipeline is partially doing "the same thing" as the tone mapping operator. It applies a contrast and might counteract the activities that the tone-mapper already does.
So the combination of a tone-mapping operator and then the commonly used hardware gamma correction, which are both global is odd.

On a lighter note, a new Post-Processing Pipeline can add more stages. In the last couple of years, screen-space ambient occlusion, screen-space skin and screen-space reflections for dynamic objects became popular. Adding those to the Post-Processing Pipeline by trying to re-use existing resources need to be considered in the architecture of the pipeline.

Last, one of the best targets for the new compute capabilities of GPUs is the Post-Processing Pipeline. Saving memory bandwidth by merging "render target blits" and re-factoring blur kernels for thread group shared memory or GSM are considerations not further covered in the following text; but most obvious design decisions.

Let's start by looking at the an old Post-Processing Pipeline design. This is an overview I used in 2007:

A Post-Processing Pipeline Overview from 2007

A few notes on this pipeline. The tone mapping operation happens at two places. At the "final" stage for tone-mapping the final result and in the bright-pass filter for tone mapping the values before they can be considered "bright".
The "right" way to apply tone mapping independent of the tone mapping operator you choose is to convert into a color space that exposes luminance, apply the tone mapper to luminance and then convert back to RGB. In other words: you had to convert between RGB and a different color space back and forth twice.
In some pipelines, it was decided that this is a bit much and the tone mapper was applied to the RGB value directly. Tone mapping a RGB value with a luminance contrast operator led to "interesting" results.
Obviously this overview doesn't cover the latest Depth of Field effects with shaped Bokeh and separated near and far field Center of Confusion calculations, nevertheless it shows already a large amount of render-target to render-target blits that can be merged with compute support.

All modern rendering pipelines calculate color values in linear space; meaning every texture that is loaded is converted into linear space by the hardware, then all the color operations are applied like lighting and shadowing, post-processing and then at the end the color values are converted back by applying the gamma curve.
This separate Gamma Control is located at the end of the pipeline, situated after tone mapping and color filters. This is because the GPU hardware can apply a global gamma correction to the image after everything is rendered.

The following paragraphs will cover some of the ideas we had to improve a Post-Processing Pipeline on a fundamental level. We implemented them into our Post-Processing Pipeline PixelPuzzle. Some of the research activities like finally replacing the "global tone mapping concept" with a better way of calculating contrast and color will have to wait for a future column.

Yxy Color Space
The first step to change a Post-Processing Pipeline in a fundamental way is to switch it to a different color space. Instead of running it in RGB we decided to use CIE Yxy through the whole pipeline. That means we convert RGB into Yxy at the beginning of the pipeline and convert back to RGB at the end. In-between all operations run on Yxy.
With CIE Yxy, the Y channel holds the luminance value. With a 11:11:10 render target, the Y channel will have 11 bits of resolution.

Instead of converting RGB to Yxy and back each time for the final tone mapping and the bright-pass stage, running the whole pipeline in Yxy means that this conversion might be only done once to Yxy and once or twice back to RGB.
Tone mapping then still happens with the Y channel in the same way it happened before. Confetti's PostFX pipeline offers eight different tone mapping operators and each of them works well in this setup.
Now one side effect of using Yxy is also that you can run the bright-pass filter as a one channel operation, which saves on modern scalar GPUs some cycles.

One other thing that Yxy allows to do is to consider the occlusion term in Screen-Space Ambient Occlusion as a member of the Y channel. So you can mix in this term and use it in interesting ways. Similar ideas apply to any other occlusion term that your pipeline might be able to use.
The choice of using CIE Yxy as the color space of choice was arbitrary. In 2007 I evaluated several different color spaces and we ended up with Yxy at the time. Here is my old table:

Pick a Color Space Table from 2007

Compared to CIE Yxy, HSV doesn't allow easily to run a blur filter kernel. The target was to leave the pipeline as unchanged as possible when picking a color space. So with Yxy, all the common Depth of Field algorithms and any other blur kernel runs unchanged in Yxy. HSV conversions also seem to be more expensive compared to RGB -> CIE XYZ -> CIE Yxy and vice versa.
There might be other color spaces similar tailored to the task.

Dynamic Local Gamma
As mentioned above, the fact that we apply a tone mapping operator and then later on a global gamma operator appears to be a bit odd. Here is what the hardware is supposed to do when it applies the gamma "correction".

Gamma Correction
The main take-away from this curve is that the same curve is applied to every pixel on screen. In other words: this curve shows an emphasis on dark areas independently of the pixel being very bright or very dark.
Whatever curve the tone-mapper will apply, the gamma correction might be counteracting it.

It appears to be a better idea to move the gamma correction closer to the tone mapper, making it part of the tone mapper and at the same time apply gamma locally per pixel.
In fact gamma correction is considered depending on the light adaptation level of the human visual system. The "gamma correction" that is applied by the eye changes the perceived luminance based on the eye's adapatation level [Bartleson 1967] [Kwon 2011].
When the eye is adapted to dark lighting conditions, the exponent for the gamma correction is supposed to increase. If the eye is adapted to bright lighting conditions, the exponent for the gamma correction is supposed to decrease. This is shown in the following image taken from [Bartleson 1967]:
Changes in Relative Brightness Contrast [Bartleson 1967]

A local gamma value can vary with the eye's adaptation level. The equation that adjusts the gamma correction following the current adaptation level of the eye can be found in [Kwon 2011].
γv=0.444+0.045 ln(Lan+0.6034)
For this presentation, this equation was taken from the paper by Kwon et all. Depending on the type of game there is an opportunity to build your own local gamma operator.
The input luminance value is generated by the tone mapping operator and then stored in the Y channel of the Yxy color space:
YYxy=Lγv
γv changes based on the luminance value of the current pixel. That means each pixels luminance value might be gamma corrected with a different exponent. For the equation above, the exponent value is in the range of 0.421 to 0.465.
Applied Gamma Curve per-pixel based on luminance of pixel
Eye’s adaptation == low - >blue curve
Eye’s adaptation value == high -> green curve
Lγv
works with any tone mapping operator. L is the luminance value coming from the tone mapping operator.
With a dynamic local gamma value, the dynamic lighting and shadowing information that is introduced in the pipeline will be considered for the gamma correction. The changes when going from bright areas to dark areas appear more natural. Textures are holding up better the challenges of light adaptation. Overall lights and shadows look better.

Depth of Field
As a proof-of-concept of the usage of Yxy color space and the local dynamic gamma correction, this section is showing screen-shots of a modern Depth of Field implementation with separated near and far field calculations and a shaped Bokeh, implemented in compute.

Producing an image through a lens leads to a "spot" that will vary in size depending on the position of the original point in the scene:
Circle of Confusion (image taken from Wikipedia)

The Depth of Field is the region, where the CoC is less than the resolution of the human eye (or in our case the resolution of our display medium). The equation on how to calculate the CoC [Potmesil1981] is:

Following the variables in this equation, Confetti demonstrated in a demo at GDC 2011 [Alling2011] the following controls:
• F-stop - ratio of focal length to aperture size
• Focal length – distance from lens to image in focus
• Focus distance – distance to plane in focus
Because the CoC is negative for far field and positive for near field calculations, separate results are commonly generated for the near field and far field of the effect [Sousa13].
Usually the calculation of the CoC is done for each pixel in a down-sampled buffer or texture. Then the near and far field results are generated. Then, first, the far and focus field results are combined and then this result is combined with the near field, based on a near field coverage value. The following screenshots show the result of those steps, with the first screenshot showing the near and far field calculations:

Red = max CoC(near field CoC)
Green = min CoC(far field CoC)

Here is a screenshot of the far field result in Yxy:

Far field result in Yxy

Here is a screenshot of the near field result in Yxy:
Near field result in Yxy

Here is a screenshot of resulting image after it was converted back to RGB:
Resulting Image in RGB

Conclusion
A modern Post-Processing Pipeline can benefit greatly from being run in a color space that offers a separable luminance channel. This opens up new opportunities for an efficient implementation of many new effects.
With the long-term goal of removing any global tone mapping from the pipeline, a dynamic local gamma control can offer more intelligent gamma control that is per-pixel and offers a stronger contrast of bright and dark areas, considering all the dynamic additions in the pipeline.
Any future development in the area of Post-Processing Pipelines can be focused on a more intelligent luminance and color harmonization.

References
[Alling2011] Michael Alling, "Post-Processing Pipeline", http://www.conffx.com/GDC2011.zip
[Bartleson 1967] C. J. Bartleson and E. J. Breneman, “Brightness function: Effects of adaptation,” J. Opt. Soc. Am., vol. 57, pp. 953-957, 1967.
[Day2012] Mike Day, “An efficient and user-friendly tone mapping operator”, http://www.insomniacgames.com/mike-day-an-efficient-and-user-friendly-tone-mapping-operator/
[Engel2007] Wolfgang Engel, “Post-Processing Pipeline”, GDC 2007 http://www.coretechniques.info/index_2007.html
[Kwon 2011] Hyuk-Ju Kwon, Sung-Hak Lee, Seok-Min Chae, Kyu-Ik Sohng, “Tone Mapping Algorithm for Luminance Separated HDR Rendering Based on Visual Brightness Function”, online at http://world-comp.org/p2012/IPC3874.pdf
[Potmesil1981] Potmesil M., Chakravarty I. “Synthetic Image Generation with a Lens and Aperture Camera Model”, 1981
[Reinhard] Erik Reinhard, Michael Stark, Peter Shirley, James Ferwerda, "Photographic Tone Reproduction for Digital Images", http://www.cs.utah.edu/~reinhard/cdrom/
[Sousa13] Tiago Sousa, "CryEngine 3 Graphics Gems", SIGGRAPH 2013, http://www.crytek.com/cryengine/presentations/cryengine-3-graphic-gems

## Friday, September 13, 2013

### KGC 2013

I will be a speaker on the Korean Game Developer Conference this year. This is my third time and I am very much enjoying it.
This year I want to talk about building a next-gen Post-Processing Pipeline. Most people haven't change their PostFX pipeline algorithms since 6 or 7 years ( ... no re-writing it in compute doesn't count ... also replacing your Reinhard operator with an approx. Hable operator: check out Insomniac's website doesn't count either :-) ).

Please come by and say hi if you are around.

## Monday, July 29, 2013

### TressFX - Crystal Dynamics and AMD cover TressFX on SIGGRAPH

There were more talks about Confetti's work on TressFX on SIGGRAPH: One talk by Jason Lacroix was: "Adding More Life to Your Characters With TressFX".

Activision's head demo uses TressFX as well: "Digital Ira: High-Resolution Facial Performance Playback".

If you are a registered developer and you need XBOX One or PS4 implementations, send me an e-mail.

## Thursday, July 25, 2013

### SIGGRAPH 2013

I would like to highlight the talk "Crafting a Next-Gen Material Pipeline for The Order: 1886":

The 3D Fabric Scanner is a fantastic idea and the results are awesome. Those are next-gen characters. Great work!

## Monday, July 22, 2013

### Tiled Resources / Partially Resident Textures / MegaTextures

One of the new features of DirectX 11.2 and now OpenGL 4.4 is Tiled Resources. Tiled Resources allow to manage one large texture in "hardware" tiles and implement a megatexture approach. The advantage of using the hardware for this compared to the software solution that was used before are:
- no dependent texture read necessary
- hardware filtering works including anisotropic filtering
AMD offers an OpenGL extension for this as well and it is available on all newer AMD GPUs. NVIDIA has shown it running on the build conference on DirectX 11.2. So there is a high chance that it is available on a large part of the console and PC market soon.
Let's step one step back and see what challenge a Megatexture is supposed to solve. In Open World games, we solve the challenge of having a high detail in textures with two techniques:
- on-going texture streaming: on a console you keep streaming from physical media all the time. This requires careful preparation of the layout of the physical media and a multi-core/multi-threaded texture streaming pipeline with -for example- priority queues.
- procedural generation of "large" textures: generating a large terrain texture is best done by generating it on the fly. That means stitching a "large" texture together out of smaller textures with one "control texture" that then also requires a dependent texture read.
The advantage of procedural texture generation is that it doesn't require a lot of "streaming" memory bandwidth, while one large texture or also many small textures eat into the amount of available "streaming" memory bandwidth.
Now with a MegaTexture there is the ability to store much more details in the large texture but it comes with the streaming cost. If you have an implementation that doesn't generate the terrain texture procedurally on the fly and you have to stream the terrain data, than the streaming cost might be similar to your current solution, so the MegaTexture might be a win here.
The biggest drawback of Partially Resident Textures / MegaTextures seems to be forgotten in the articles that I have seen so far: someone has to generate them. There might need to be an artists who fills a very large texture with a high amount of detail pixel-by-pixel. To relieve the workload, a technique that is called "Stamping" is used. As the name implies a kind of "stamp" is applied at several places onto the texture. Stamping also means giving up the opportunity to create unique pixels everywhere. In other words the main advantage of a MegaTexture, offering a huge amount of detail, is counteracted by stamping.
In practice this might lead to a situation where your MegaTexture doesn't hold much detail because artists would have to work a long time to add detail and this would be too expensive. Instead the level of detail that is applied to the texture is reduced to an economically feasible amount.
The overall scenario changes, when data exists that -for example- is generated from satellite images of the earth with high resolution. In that case a MegaTexture solution will offer the best possible quality with less art effort and you can build a workflow that directly gets the pre-generated data and brings it into your preferred format and layout.
For many game teams, the usage of MegaTextures will be too expensive. They can't afford the art time to generate the texture in case they can't rely on existing data.

## Tuesday, July 2, 2013

I was looking through some of the links I saved for further reading today.

An article explaining BC compression formats with a lot of detail and clarity can be found here:

There is an interesting blog post by Sebastien Sylvan. He writes about R trees a data structure that allows you for example to do spatially indexing of objects in your game.

He also has other cool articles on hash maps and vector replacmenents.

We still need desktop PCs in the office to swap discrete GPUs whenever we need to. Because we also need them as portable as possible, we decided to build the following setup ourselves:

So far we build two and they work well.

For Blackfoot Blade, we worked with a composer in Finland. I love the music he made and I wanted to share his website here:

Our friends at Bitsquid released a useful open-source library:

I quote from the description that describes the design of the library:

Library Design

foundation has been written with data-oriented programming in mind (POD data is preferred over complicated classes, flat arrays are the preferred data structure, etc). foundation is written in a "back-to-C" style of C++ programming. This means that there is a clear separation between data and code. Data defitions are found in _types.h header files. Function definitions are found in .h header files.

If you haven't found the DirectXTex texture library you need to check it out at

### MVP Award 2013

Yesterday Microsoft awarded me with a MVP award for Visual C++. Now that DirectX is part of Visual C++, I was moved into the Visual C++ category. I am super proud of that. Especially now that Visual C++ finally gets C99 support :-)

## Sunday, June 30, 2013

### Google Console / Visual Studio 2013 will support C99

Google making a console is an interesting news item. Like Apple they can utilize standard mobile phone parts and extend Android to support controllers.
What does it take to make this work:

1. High-end good looking apps: there is no need to have a fallback rendering path, so you can optimize until the last cycle
2. Dedicated section in the app store to highlight the controller-capable apps
3. The NDK needs to be better supported: I mentioned it here in the past, it is good that the NDK exists. This is the most important basic requirement to get existing tech to Android phones ...
4. A good controller with good support goes a long way ...

In other news, Visual Studio 2013 will finally support C99. This is something I always wished for, not only because C99 is a perfect game development language and mighty portable but also because open-source projects quite often favor C99 ... so now we can finally move our code base from C++ to C99 and it will still compile in a C++ environment like Visual Studio. For people who actually write engine code that is cross-platform or shared between teams, this is good news ...

http://arstechnica.com/information-technology/2013/06/c99-acknowledged-at-last-as-microsoft-lays-out-its-path-to-c14/

## Monday, June 24, 2013

### Lighting a Game / Lighting Artists / Physically / Observational Lighting models / Bounce Lighting

Here is a way on how modern game engines can light scenes. I was just describing this in a forum post: the idea is following what the CG movie industry is doing. Placing real-time lights happens similar to CG movies. In CG movies we have thousands of lights and in games we have now dozens or most of the time 100+ lights in scenes. Compared to switching from observational to physical lighting models, this makes the biggest difference. Then each of those lights can also cast bounce lighting which is another switch for the artist. So in essence artists can place lots of lights, switch on / off shadows on each light and switch on / off real-time GI on each light. The light / shadow part was already possible on XBOX 360 / PS3 but now on the next-gen consoles we also have bounce lighting per light. That gives lighting artists a wide range of options to light scenes. A lighting setup like this would be overkill in a game like Blackfoot Blade, where you fly in a helicopter high above ground. We have only a few dozen real-time lights on screen without shadows (each rocket casts a light, the machine gun of the helicopter, even the projectiles from the tanks and the flair). The game runs also on tablets. With any ground based game like an open-world game, lots of lights makes a huge difference to light corners and the environment. It is one of those "better than real" options that lighting artists like. My point is comparing the switch from observational to physically based lighting models with switching from a few lights to lots of lights. The later gives you much more "bang for the buck", so you want to do this first. Any scene in shadows will not look much better with a physically based lighting model if you only use one or a few light sources but with lots of lights you can make it look "better than real".

## Sunday, May 19, 2013

### Re-cap of Deferred Lighting

This is part of a response in a discussion forum I did recently and it summarizes a lot of the things I did a few years ago with renderer design on XBOX 360 / PS3 (more details in the articles below). So I thought I share it here as well:

----------------------------------------------------
Before I start talking about memory bandwidth, let me first outline my favorite renderer design that shipped now in a couple of games: from a high-level view, you want to end up with mainly three stages:

Geometry -> Light / Shadow data -> Materials

Geometry
In this stage you render your objects into a G-Buffer. It might be the most expensive or one of the more expensive parts of your rendering pipeline. You want your G-Buffer to be as small as possible. When we moved from Deferred Shading to Deferred Lighting, one of the motivations was to decrease the size of the G-Buffer. In a typical Deferred Lighting scenario you have three render targets in the G-Buffer: the depth buffer, a color buffer and a normal buffer (those might hold specular information and a material index as well). There are all kinds of ways to compress data like using two channel color formats, two channel normal data etc..
One reason why the G-Buffer needs to be small is what I mentioned above. Every mesh you render in there will be expensive. Obviously I am leaving out a lot of stuff here like pre-Z pass etc..
The other reason why the G-Buffer needed to be small was the fact that you have to read it for each light. On XBOX 360 and PS3 that was a major memory bandwidth challenge and the ultimate reason to move from Deferred Shading to Deferred Lighting. You were able to render now much more lights with the smaller G-Buffer.

Rendering lights into the light / shadow buffer had the advantage of just using two render targets out of the three in the G-Buffer. The depth buffer and the normal buffer with the specular information. With that setup you could increase the number of lights substantially compared to Deferred Shading.
The light / shadow buffer holds the data for all lights and shadows, in other words: brightness data separated for diffuse and specular together with the light color. Please note the third render target in the G-Buffer that holds color is not used here.

Material Stage
Splitting up the high-level view into Geometry -> Light / Shadows -> Materials is done because you want to apply expensive materials like skin, hair, cloth, leather, car paint etc.. and you can't store much data in the G-Buffer to describe those materials. So you apply them in screen or image space like a PostFX.
One of the reasons to move to Deferred Lighting was the increased material variety it offers. In a Deferred Shading setup you have to apply the material terms while you do the lighting calculations which sometimes made those really expensive and with the overlapping lights, you did those too often.
----------------------------------------------------

A lot of the recent work is about materials in screen-space. In the last few years my focus moved away from renderer design to global illumination and re-thinking the current Post-Processing pipelines; while solving other challenges for our customers. I hope I have something to share in those areas very soon ...

### Update of the Link Section

I added the blogs of Angelo Pesce, Fabian Giesen, Christian Schueler, Ignacio Castaño, Morten Mikkelsen and Sebastien Lagarde to the link list on the right.

## Tuesday, May 14, 2013

### GPU Programming at UCSD

My first outline for a new GPU Programming class at UCSD. Let me know what you think:

First Class
Overview
-- DirectX 11.1 Graphics
-- DirectX 11.1 Compute
-- Tools of the Trade - how to setup your development system

Introduction to DirectX 11.1 Compute
-- Memory Model
-- DirectX 10.x support

Second Class
Simple Compute Case Studies
- PostFX Color Filters
- PostFX Parallel Reduction
- DirectX 11 Mandelbrot
- DirectX 10 Mandelbrot

Third Class
DirectCompute performance optimization
- Histogram optimization case study

Fourth Class
Direct3D 11.1 Graphics Pipeline Part 1
- Direct3D 9 vs. Direct3D 11
- Direct3D 11 vs. Direct3D 11.1
- Resources (typeless memory arrays)
- Resource Views
- Resources Access Intention
- State Objects
- Pipeline Stages
-- Input Assembler
-- Tessellation
-- Stream Out
-- Setup / Rasterizer
-- Output Merger
-- Video en- / decoder access

Fifth Class
Direct3D 11.1 Graphics Pipeline Part 2
-- HLSL
--- Keywords
--- Basic Data Types
--- Vector Data Types
--- Swizzling
--- Matrices
--- Type Casting
--- SamplerState
--- Texture Objects
--- Intrinsics
--- Flow Control
-- Case Study: implementing Blinn-Phong lighting with DirectX 11.1
--- Physcially / Observational Lighting Models
--- Local / Global Lighting
--- Lighting Implementation
---- Ambient
---- Diffuse
---- Specular
---- Normal Mapping
---- Point Light
---- Spot Light

Sixth Class
Physically Based Lighting
- Normalized Blinn-Phong Lighting Model
- Cook-Torrance Reflectance Model

Seventh Class
Deferred Lighting, AA
- Rendering Many Lights History
- Light Pre-Pass (LPP)
- LPP Implementation
- Efficient Light rendering on DX 9, 10, 11
- Balance Quality / Performance
- MSAA Implementation on DX 10.0, 10.1, XBOX 360, 11
Screen-Space Materials
- Skin

Eight Class
- “Attaching” a Shadow Map frustum around a view frustum
- CSM Challenges
- Softening the Penumbra

Nineth Class
Order-Independent Transparency
- Depth Peeling
- Reverse Depth Peeling

Tenth Class
Global Illumination Algorithms in Games
- Requirement for Real-Time GI
- Ambient Cubes
- Diffuse Cube Mapping
- Screen-Space Ambient Occlusion
- Screen-Space Global Illumination
- Splatting Indirect Illumination (SII)

### GPU Pro 4 available on Amazon.com

Here is the latest GPU Pro 4 book.

I am helping to create those books now since 12 years.

For GPU Pro 5 we will have a huge amount of mobile devices techniques. Many GPU vendors now put in extensions to allow more modern stuff to happen on mobile devices .. like Deferred Lighting. Overall lots of stuff happening ... I am always surprised about the amount of innovation just happening in rendering in a year. With OpenGL ES 3.0 and the new extensions, we have lots of opportunities to make beautiful looking games. You can port a XBOX 360 game to this platform easily. If you want to contribute an article to a book, drop me an e-mail.

## Thursday, April 26, 2012

### DirectCompute on Intel® Ivy Bridge

Here is a blog entry I wrote on DirectCompute for INTEL working with an Ivy Bridge machine.

Let me know what you think.

## Sunday, April 15, 2012

### Tile-based Deferred and Forward Lighting

DirectCompute opened up new abilities to apply lighting to a scene. In the last three years, dealing with many lights in screen-tiles on DirectX 11 GPUs became a popular discussion topic, following implementations in major PS3 games like "Blur" and "Uncharted".
As long as you look only at the cost of light rendering, dealing with lights per screen-tile can make a huge difference in memory bandwidth consumption. In best case, data is read from the G-Buffer once and wrote into the light or framebuffer once. This is a major improvement compared to the Deferred Lighting approaches on DirectX 9/10 GPUs.
As soon as you add shadows to the equation, the line gets more blurry. Assuming that a next-gen game that requires many dynamic lights also requires many dynamic shadows (see my blog entry below "Shadows - Thoughts on Ellipsoid Light Shadow Rendering"), all the Tiled-based approaches come with a higher cost of shadow rendering compared to the Deferred Lighting approaches for lower-end hardware.
With old-style Deferred Lighting, the shadows are applied with each light source. That means -from a memory bandwidth standpoint- writing into the light or framebuffer happens only once for the light and the shadow and many arithmetic instructions can be shared.
Any Tiled-Based approach will want to create all the shadow data before lighting. To do this, G-Buffer data need to be read for each light that is supposed to cast a shadow. Additionally for each light that is supposed to cast a shadow, the data need to be written into the light or framebuffer.
In other words, if each light casts a shadow, any Tiled-Based approach won't offer much gain anymore when compared to Deferred Lighting on DirectX 9/10 GPUs.
Following this train of thought, it would be better to stick with old-style Deferred Lighting, because it offers a fall-back path that is consistent throughout many hardware platforms.

That being said, I consider the introduction of Tiled-Based Forward rendering a major step forward, because it offers a consistent way to deal with "alpha-blended" objects. So far everyone created a separate lighting system to deal with objects that are not in the depth buffer. This lighting system was usually simpler and not capable to render many lights.
With Tiled-Based Forward rendering, we can replace the simpler system with a lighting system that deals with many lights on "alpha-blended" objects and make it more consistent with the lighting on objects that can be captured in the depth buffer.
That is exciting. Unfortunately this will still be only available on DirectX 11 hardware but it is a huge step forward.

## Friday, February 10, 2012

### Masterclass - GPU Programming

Game connection will offer another Masterclass with me on GDC:

http://www.game-connection.com/gameconn/content/gpu-programming

### GDC / Catching up

Lot's of things are happening. Confetti will have three demos running on a PC on the booth:

1. RawK III that shows our latest version of our Global Illumination middleware, PostFX pipeline and Particle system
2. SpeedTree - we teamed up with SpeedTree and integrated our Skydome middleware into their demo
3. Sponza that demos our Global Illumination middleware in the typical Sponza scene

We updated our website with new movies and screenshots in preparation for GDC.

I will be offering my third Masterclass for graphics programmers on GDC (previous ones were in 2010 and 2011 in Paris).

Confetti helps to organize and sponsors an online conference at

http://altdevconf.org/

It is free and you can sign up for our talk -that will be given on Sunday 2pm- at

https://www4.gotomeeting.com/register/846958367

GPU Pro 3 should be available at GDC.

## Monday, January 30, 2012

### Reviving this blog

I started to revive this blog after posting for a while on AltDevBlogADay. I first brought my old posts over from AltDev and started working on a few new ones.
This also gives me a chance to differ between Confetti's blog and this private blog. Confetti's (www.conffx.com) blog will hold most of the time company related information and this blog will cover more technical information that might not have anything to do with what we do at Confetti.

## Sunday, September 18, 2011

### Graphics Demo Programming

About 8 or ten years ago I started thinking about doing a graphics demo for the demo scene. I started to prepare a minimum skeleton that should compile to the smallest possible exe. Over the years I kept this Skeleton alive going from Windows XP to Windows 7 and from DirectX 8 to DirectX 10.
More than three years ago I put the source code up on Google Code here and kept updating it:

Although the source code is rather short, I played around with many ideas over the years. I read articles by the demo scene about getting smaller exe's just by using Visual Studio. After realizing that my exe got bigger with every new Visual Studio version, I switched to Pelles C; a free development environment with a compiler that follows the C99 standard:

http://www.smorgasbordet.com/pellesc/

My exe is now 838 bytes in size without violating Windows rules about releasing occupied resources. I tried to replace some of the code with assembly code, especially the entry points of the D3D functions and saved a few bytes at some point in time but removed it again because it was too inconvenient.
At some point (probably while it was running on DirectX 9) I implemented a small GPU particle system that didn't add much to the size, which was pretty cool.
One of the interesting things I found out was that HLSL code was packing in some cases smaller than C code for the CPU. I found this remarkable and I thought it would be a cool idea to write a small CPU stub and then go from there in HLSL.
I know there will be times when I go back to this piece of code and wonder what else I can do with it and spend half an hour looking through it. It was certainly the project with some of the lowest priorities in the last ten years ... maybe you can take the source and do something cool with it :-)

There is also a whole demo framework released by Inigo Quelez here

http://www.iquilezles.org/www/material/isystem1k4k/isystem1k4k.htm

http://yupferris.blogspot.com/
http://4klang.untergrund.net/

## Wednesday, September 14, 2011

### DirectX 11.1 Notes

The upgrade from DirectX 11 to DirectX 11.1 is targeting the following areas:
• Enabling Metro-style applications (there’s some additional access control for this which impacted almost every API in the OS so there’s no Direct3D 9 in Metro-style, but Direct3D 11.x is fully supported)
• Technical computing (optional extended double-precision instructions (division, etc.) and optional Sum of Absolute Differences instruction; enabling hardware access via Direct3D from session 0; HLSL compiler improvements)
• Low-power device optimizations (reduced shader precision hints for HLSL/driver, Tiled Graphics Rendering optimizations, 4 bit-per-pixel DXGI formats 565, 5551, 4444)
• Various graphics stack unification efforts (Direct3D 11 Video, Direct3D interop improvements for Direct2D & Media Foundation, etc.)
• Windows on ARM

The main new feature targeting consumers is stereoscopic rendering. The new DirectX 11.1 features are described on the MSDN website. For me the most remarkable things on this page are:
1. Use UAV's at every Pipeline stage
2. Use logical operations in a render target
3. Shader tracing: looks like a new way to measure shader performance

The functions (ID3D11Device::CheckFeatureSupport / ID3D11Device::CheckFormatSupport) that check for supported features and formats were extended as well. D3D11_FEATURE_DATA_ARCHITECTURE_INFO seems to be for tiled-based rendering hardware commonly used in mobile GPUs and D3D11_FEATURE_DATA_SHADER_MIN_PRECISION_SUPPORT too. DXGI_FORMAT was extended to support video formats that now can be processed with shaders by using resource views (SRV/RTV/UAV). There is a new D3D_FEATURE_LEVEL_11_1 defined (i.e. a minor revision of the hardware feature set), but I don’t (yet) have a good link to give you to summarize the required features. Of course, there’s no public drivers or hardware out yet that exposes FL 11.1 anyhow. As before, DirectX 11.1 (the API) works with a range of Feature Levels (the hardware). WARP on Windows 8 supports FL 11.0 (and 10.1 as before) and includes support for the DXGI 1.2 16bpp formats (565, 5551, 4444). The Windows 8 Developer Preview SDK includes the latest HLSL compiler FXC tool and D3DCompiler.DLL, the Debug Layers DLL, and the REF device DLL for DirectX 11.1 on Windows 8. The MSDN documentation now includes details on SM 4.x and SM 5.0 shader assembly (for deciphering the compiler’s disassembly output) plus details on BC6H/BC7 compression formats <http://msdn.microsoft.com/en-us/library/bb943998(v=VS.85).aspx> <http://msdn.microsoft.com/en-us/library/hh447232(v=VS.85).aspx> <http://msdn.microsoft.com/en-us/library/hh308955(v=VS.85).aspx> What is different from DirectX 11 on PC
• D3DX9, D3DX10, D3DX11 are not supported for Metro-style applications.
• The Texconv sample includes the "DirectXTex" library which has all the texture processing functionality, WIC-based IO, DDS codec, BC software compression/decompression, etc. as shared source that was in D3DX11.
• D3DCSX_44.DLL is in the Windows SDK for redist with applications, and I believe is supported in Metro style applications.
• D3DCompiler_44.DLL is available for REDIST with Desktop applications and for development, but is not supported for REDIST in Metro-style applications. We've long recommended not doing run-time compliation, and Metro style enforces this at deployment time.
• XINPUT1_4.DLL and XAUDIO2_8.DLL are included in the OS and are fully supported for Metro style applications.
Here are reference links: Outlines all the new features: http://msdn.microsoft.com/en-us/library/windows/desktop/hh404457.aspx DirectX 11.1 Features <http://msdn.microsoft.com/en-us/library/hh404562(v=VS.85).aspx> DXGI 1.2 <http://msdn.microsoft.com/en-us/library/hh404490(v=VS.85).aspx> WDDM 1.2 <http://go.microsoft.com/fwlink/?LinkId=226814>

## Sunday, July 31, 2011

### Even Error-Distribution: Rules for Designing Graphics Sub-Systems (Part III)

The design rule of "Even Error-Distribution" is very common in everything we do as graphics/engine programmers. Compared to the "No Look-up Tables" and "Screen-Space" rules, it is probably easier to agree on this principle in general. The idea is that whatever technique you implement, you face the observer always with a consistent "error" level. The word error describes here a difference between what we consider the real world experience and the visual experience in a game. Obvious examples are toon and hatch shading, where we do not even try to render anything that resembles the real world but something that is considered beautiful. More complex examples are the penumbras of shadow maps, ambient occlusion or a real-time global illumination approach that has a rather low granularity.
The idea behind this design rule is that whatever you do, do it consistently and hope that the user will adjust to the error and not recognize it after a while anymore. Because the error is evenly distributed throughout your whole game, it is tolerated easier by the user.

To look at it from a different perspective. At Confetti we target most of the available gaming platforms. We can render very similar geometry and textures on different platforms. For example iOS/Android with OpenGL ES 2.0 and then Windows with DirecX 11 or XBOX 360 with its Direct3D. For iOS / Android you want to pick different lighting and shadowing techniques than for the higher end platforms. For shadows it might be stencil shadow volumes on low-end platforms and shadow maps on high end platforms. Those two shadowing techiques have very different performance and visual characteristics. The "error" resulting from stencil shadow volumes is that the shadows are -by default- very sharp and pronounced while shadow maps on the higher end platforms can be softer and more like real life shadows.
A user that watches the same game running on those platforms, will adjust to the "even" error of each of those shadow mapping techniques as long as they do not change on the fly. If you would mix the sharp and the soft shadows, users will complain that the shadow quality changes. If you provide only one or the other shadow, there is a high chance that people will just get used to the shadow appearance.
Similar ideas apply to all the graphics programming techniques we use. Light mapping might be a viable option on low end platforms and provide pixel perfect lighting, a dynamic solution replacing those light maps might have a higher error level and not being pixel perfect. As long as the lower quality version always looks consistent, there is a high chance that users won't complain. If we would change the quality level in-game, we are probably faced with reviews that say that the quality is changing.

Following this idea, one can exclude techniques that change the error level on the fly during game play. There were certainly a lot of shadow map techniques in the past that had different quality levels based on the angle between the camera and the sun. Although in many cases they looked better than the competing techniques, users perceived the cases when their quality was lowest as a problem.
Any technique based on re-projection, were the quality of shadows, Ambient Occlusion or Global Illumination changes while the user watches a scene, would violate the "Even Error-Distribution" rule.
A game that mixes light maps that hold shadow and/or light data and dynamic light and / or regular shadow maps might have the challenge to make sure that there is no visible difference between the light and shadow quality. Quite often the light mapped data looks better than the dynamic techniques and the experience is inconsistent. Evenly distributing the error into the light map data would increase the user experience because he/she is able to adjust better to an even error distribution. The same is true for any form of megatexture approach.
A common problem of mixing light mapped and generated light and shadow data is that in many cases dynamic objects like cars or characters do not receive the light mapped data. Users seems to have adjusted to the difference in quality here because it was consistent.

## Sunday, July 3, 2011

### No Look-up Tables: Rules for Designing Graphics Sub-systems (Part II)

One of the design principles I apply to graphics systems nowadays is to avoid any look-up table. That means not only mathemetical look-up tables -the way we used them in the 80ieth and the 90ieth- but also cached lighting, shadow and other data, that is sometimes stored in light or radiosity maps.

This design principle follows the development of GPUs. While decent GPUs offer with each new iteration a larger number of arithetic instructions, the memory bandwidth is stagnating since several years. Additionally transporting data to the GPU might be slow due to several bottlenecks like DVD speed, PCI Express bus etc.. In many cases it might be more efficient to calculate a result with the help of arithmetic instructions, instead of doing a look-up in a texture or any other memory area. Saving streaming bandwidth throughout the hardware is also a good motivation to avoid look-up textures like this. Quite often any look-up technique doesn't allow a 24 hour game cycle, where the light and shadows have to move accordingly with time.
In many cases using pre-baked textures to store lighting, shadowing or other data also leads to a scenario where the geometry on which the texture is applied is not destructible anymore.

Typical examples are:
- Pre-calculating a lighting equation and storing results in a texture like a 2D, Cube or 3D map
- Large terrain textures, like megatextures. Texture synthesis might be more efficient here
- Light and radiosity maps and other pre-calculated data for Global illumination approaches
- Signed distance fields (if they don't allow 24 hour cycle lights and shadows)
- Voxels as long as they require a large amount of data to be re-read each frame and don't allow dynamic lighting and shadows (24 hour cycle)
- ... and more ...

Following the "No Look-up Table" design principle, one of the options to store intermediate data is to cache it in GPU memory, so that data doesn't need to be generated on the fly. This might be a good option depending on the amount of memory available on the GPU.
Depending on the underlying hardware platform or the requirements of the game, the choice between different caching schemes makes a system like this very flexible.

Here are a collection of ideas that might help to make an informed decision on when to apply a caching scheme, that keeps data around in GPU memory:
- Whenever the data is not visible to the user, it doesn't need to be generated. For example color, light and shadow data only need to be generated if the user can see them. That requires that they are on-screen with sufficient size. Projecting an object into screen-space allows to calculate its size. If it is too small or not visible any data attached to it doesn't need to be generated. This idea does not only apply to geometric objects but also light and shadows. If a shadow is too small on the screen, we do not have to re-generate it.
- Cascaded Shadow maps introduced a "level of shadowing" system that distributes shadow resolution along the view frustum in a way that the shadow resolution distribution dedicates less resolution to objects farer away, while closer up objects recieve relativly more shadow resolution. Similarly lighting quality should increase and decrease based on distance. Cascaded Reflective shadow maps extend the idea on any global illumination data, like one bounce diffuse lighting and ambient occlusion.
- If the quality requirements are rather low because the object is far away or small, screen-space techniques might allow to store data in a higher density. For example Screen-Space Global illumination approaches that are based on the G-Buffer - that is already used to apply Deferred Lights in a scene- can offer an efficient way to light far away objects.

## Monday, June 13, 2011

### Screen-Space: Rules for Designing Graphics Sub-systems (Part I)

Since programming GPUs allow one to design more complex graphics systems, I started to develop a few simple rules that have survived the test of time, while designing graphics sub-systems like Skydome, PostFX, Vegetation, Particle, Global Illumination, Light & Shadow systems etc..
Here are three of them:
1. Screen-Space (Part I - this part)
2. No Look-up Tables (Part II)
3. Even Error Distribution (Part III)

Today we focus on the Screen-Space design rule. It says: "do everything you can in Screen-Space because it is more efficient most of the time". This is easy to say for the wide range of effects that are part of a Post-Processing Pipeline like Depth of Field, Motion Blur, Tone Mapping and color filters, light streaks and others (read more in [Engel07]), as well as anti-aliasing techniques like MLAA that anti-alias the image in screen-space.
With the increased number of arithmetic instructions available and the stagnating growth of memory bandwidth, two new groups of sub-systems can be moved into screen-space.
Accompanying Deferred Lighting systems, more expensive materials like skin and hair can now be applied in screen-space; this way a screen-space material system is possible [Engel], solving some of the bigger challenges to implementing a Deferred Lighting pipeline.
Global Illumination and Shadow filter kernels can be moved into screen-space as well. For example, for a large number of Point or Ellipsoidal Shadow Maps, all the shadow data can be stored in a shadow collector in screen-space and then an expensive filter kernel can be applied to this screen-space texture [Engel2010].

The wide range of abilities available with screen-space filter kernels makes it valuable to look at the challenges while implementing them in general. The common challenges to applying materials or lights and shadows with the help of large-scale filter kernels in screen-space are mostly:
1. Scale filter kernel based on camera distance
2. Add anisotropic "behavior" to the screen-space filter kernel
3. Restricting the filter kernel based on the Z value of the Tap

Scaling Filter Kernel based on Camera Distance
Using a screen-space filter kernel for filtering shadows, GI, emulating sub-surface scattering for skin or rendering hair, requires at some point to scale the filter kernel based on the distance from the camera or, better yet, the near plane to the pixel in question. What has worked in the past is:
// linear depth read more in [Gilham]
// Q = FarClip / (FarClip – NearClip)
// Depth = value from a hyperbolic depth buffer
float  depthLin= (-NearClip * Q) / (Depth - Q);
// scale based on distance to the viewer
// renderer->setShaderConstant4f("TexelSize", vec4(width, height, 1.0f / width, width / height));
sampleStep.xy = float2(1.0f, TexelSize.w) * sqrt(1.0f / ((depthLin.xx * depthLin.xx) * bias));
Scaling only happens based on linearized depth values that are going from 0.0..1.0 between the near and far plane. This considers the camera's near and far plane settings. The bias value is a user defined "magic" value. The last channel in the TexelSize variable holds the x and y direction ratio of the pixel. The inner term - 1.0/distance2- of the equation resembles a simple light attenuation function. We will improve this equation in the near future.

Anisotropic Screen-Space Filter Kernel
Following [Geusebroek], anisotropy can be added to a screen-space filter kernel by projecting into a ellipse following the orientation of the geometry.

Image 1 - Anisotropic Screen-Space Filter Kernel

Normals that are stored in a world-space buffer in a G-Buffer can be compared to the view vector. The elliptical "response" is achieved by taking the square root of this operation.
float Aniso = saturate(sqrt(dot( viewVec, normal )));

Restricting the filter kernel based on the Z value of the Tap
One of the challenges with any screen-space filter kernel is the fact that the wide filter kernel can smear values into the penumbra around "corners" of geometry (read more in [Gumbau].

Image 2 - Error introduced by running a large filter kernel in screen-space

A common way to solve this problem is to compare the depth values of the center of the filter kernel with the depth values of the filter kernel taps and define a certain threshold where we consider the difference between the depth values large enough to early out. A source code snippet for this might look like this.

bool isValidSample = bool( abs(sampleDepth - d) < errDepth );
{
// the sample is considered valid
sumWeightsOK += weights[i+1];     // accumulate valid weights
}
Acknowledgements
I would like to thank Carlos Dominguez for the discussions about how to scale filter kernels based on camera distance.

References
[Engel] Wolfgang Engel, "Deferred Lighting / Shadows / Materials", FMX 2011, http://www.confettispecialfx.com/confetti-on-fmx-in-stuttgart-ii
[Engel07] Wolfgang Engel, "Post-Processing Pipeline", GDC 2007, http://www.coretechniques.info/index_2007.html
[Geusebroek] Jan-Mark Geusebroek, Arnold W. M. Smeulders, J. van de Weijer, “Fast anisotropic Gauss filtering”, IEEE Transactions on Image Processing, Volume 12 (8), page 938-943, 2003
[Gilham] David Gilham, "Real-Time Depth-of-Field Implemented with a Post-Processing only Technique", ShaderX5: Advanced Rendering, Charles River Media / Thomson, pp 163 - 175, ISBN 1-58450-499-4
[Gumbau] Jesus Gumbau, Miguel Chover, and Mateu Sbert, “Screen-Space Soft Shadows”, GPU Pro, pp. 477 - 490

## Sunday, May 29, 2011

### Points, Vertices and Vectors

This post covers some facts about Points, Vertices and Vectors that might be useful. This is a collection of ideas to create a short math primer for engineers that want to explore computer graphics. The resulting material will be used in future computer graphics classes. Your feedback is highly welcome!

Points
A 3D point is a location in space, in a 3D coordinate system. We can find a point P with coordinates [Px, Py, Pz] by starting from the origin at [0, 0, 0] and moving the distance Px, Py and Pz along the x, y and z axis.

Two points define a line segment between them, three points define a triangle with corners at those points, and several interconnected triangles can be used to define the surface of an object; sometimes also called mesh.

Points that are used to define geometric entities like triangles, are often called vertices. In graphics programming, vertices are an array of structures or a structure of arrays and not only describe a position but also include other data like for example color, a normal vector or texture coordinates.

The difference of two points is a vector: V = P - Q

Vectors

While a point is a reference to a location, a vector is the difference between two points which describes a direction and a distance -length-, or a displacement.

Like points, vectors can be represented by three coordinates. Those three values are retrieved by subtracting the tail from the vector from its head.

Δx = (xh - xt)
Δy = (yh - yt)
Δz = (zh - zt)

Figure 1 - Vector components Δx, Δy and Δz

Two vectors are equal if they have the same values. Thus considering a value as a difference of two points, there are any number of vectors with the same direction and length.

Figure 2 - Instances of one vector

The difference between between points and vectors is reiterated by saying they live in a different space, the Euclidean space $\ \mathbb{E}^3$ and the vector space $\ \mathbb{R}^3$. Read more in [Farin].

The primary reason for differentiating between points and vectors is to achieve geometric constructions which are coordinate independent. Such constructions are manipulations applied to objects that produce the same result regardless of the location of the coordinate origin.

Scalar Multiplication, Addition and Subtraction of Vectors

A vector V can be multiplied by a scalar. Multiplying by 2 doubles the vectors components.

$\ v = \left[ {\begin{array}{*{20}{c}} 2\\ 3\\ 4\\ 0\\ \end{array}} \right] then 2v = \left[ {\begin{array}{*{20}{c}} 4\\ 6\\ 8\\ 0\\ \end{array}}\right]$

$\ v = \left[ {\begin{array}{*{20}{c}} n1\\ n2\\ n3\\ 0\\ \end{array}} \right]\, \, then \, \lambda \, v = \left[ {\begin{array}{*{20}{c}} \lambda n1\\ \lambda n2\\ \lambda n3\\ 0\\ \end{array}}\right]\, where \, [\lambda\, \in \,\mathbb{R}^3]$

Similarly dividing the vector by 2 halves its components. The direction of the vector remains unchanged, only its magnitude changes.

The result of adding two vectors V and W can be obtained geometrically.

Figure 3 - Adding two vectors

Placing the tail of w to the head of V leads to the resulting vector, going from V's tail to W's head. In a similar manner vector subtraction can visualized.

Figure 4 - Subtracting two vectors

Similar to addition, the tail of the vector that should be subtracted -W- is placed to the head of V. Then the vector that should be subtracted is negated. The resulting vector runs from V's tail to W's head.

Alternatively, by the parallelogram law, the vector sum can be seen as the diagonal of the parallelogram formed by the two vectors.

Figure 5 - Parallelogram rule

The vectors V - W and V + W are the diagonals of the parallelogram defined by V and W. Arithmetically, vectors are added or subtracted by adding or subtracting the components of each vector.

All the vector additions and subtractions are coordinate independent operations, since vectors are defined as difference of points.

Homogeneous Coordinates

Representing both points and vectors with three coordinates can be confusing. Homogeneous coordinates are a useful tool to make the distinction explicit. Adding a fourth coordinate, named w, allows us to describe a direction or a vector by setting this coordinate to 0. In all other cases we have a point or location.

Dividing a homogeneous point [Px, Py, Pz, Pw] by the w component leads to the corresponding 3D point. If the w component equals to zero, the point would be infinitely far away, which is then interpreted as a direction. Using any non-zero value for w, will lead to points all corresponding to the same 3D point. For example the point (3, 4, 5) has homogeneous coordinates (6, 8, 10, 2) or (12, 16, 20, 4).

The reason why this coordinate system is called "homogeneous" is because it is possible to transform functions f(x, y, z) into the form f(x/w, y/w, z/w) without disturbing the degree of the curve. This is useful in the field of projective geometry. For example a collection of 2D homogeneous points (x/t, y/t, t) exist on a xy-plane where t is the z-coordinate as illustrated in figure 6.

Figure 6 - 2D homogenous coodinates can be visualized as a plane in 3D space

Figure 6 shows a triangle on the t = 1 plane, and a similar triangle much larger on a distant plane. This creates an arbitrary xy plane in three dimensions. The t- or z-coordinate of the plane is immaterial because the x- and y-coordinates are eventually scaled by t.
Homogeneous coordinates are also used to create a translation transform.

In game development, some math libraries have dedicated point and vector classes. The main distinction is made by setting the fourth channel to zero for vectors and one for points [Eberly].

Pythagorean Theorem

The length or magnitude of a vector can be obtained by applying the Pythagorean Theorem. The opposite -b- and adjacent -a- side of a right-angled triangle represents orthogonal directions. The hypotenuse is the shortest path distance between those.

$\ a^2 + b^2 = c^2$

Figure 7 - Pythagorean Theorem

It helps thinking of the Pythagorean Theorem as a tool to compare "things" moving at right angles. For example if a is 3, b equals 4, then c equals 5 [Azad].

The Pythagorean Theorem can also be applied to right-angled triangles chained together.

Figure 8 - Pythagorean Theorem with two triangles chained together

$\ a^2 + b^2 = c^2$

$\ c^2 + d^2 = e^2$

Replacing $\ c^2$ with $\ a^2 + b^2$ leads to

$\ a^2 + b^2 + d^2 = e^2$

$\ e^2$ is now written in three orthogonal components. Instead of lining the triangles flat, we can now tilt the green one a bit and therefore consider an additional dimension.

Figure 9 - Pythagorean Theorem in 3D

Renaming the sides to x, y and z instead of a, b and d we get:

$\ x^2 + y^2 + z^2 = distance^2$

This works with any number of dimensions.

The Pythagorean Theorem is the basis for computing distance between two points. Consider the following two triangles:

Figure 10 - Pythagorean Theorem used for distance calculations

The distance from the tip of the blue triangle at coordinate (4, 3) to the tip of the green triangle at coordinate (8, 5) can be calculated by creating a virtual triangle between those points. Subtracting the points leads to a 2D vector.

$\ |v| = \sqrt {{{(\Delta x)}^2} + {{(\Delta y)}^2}}}$

In this case

Δx = 8 - 4 = 4
Δy = 5 - 3 = 2

$\ |v| = \sqrt {{{(4)}^2} + {{(2)}^2}}}$

$\ |v| = \sqrt {20}$

$\ |v| = 4.47$

Extending the idea to three dimensions shows the well-known equation:

$\ |v| = \sqrt {{{(\Delta x)}^2} + {{(\Delta y)}^2 + {{(\Delta z)}^2}}}$

Unit Vectors

A unit vector has a length or magnitude of 1. This is a useful property for vector multiplications, because those consider the magnitude of a vector and the computation time can be reduced if this magnitude is one (more on this later). A unit column vector might look like this:

$\ v = \left[ {\begin{array}{*{20}{c}} 1\\ 0\\ 0\\ 0\\ \end{array}}\right]$

and

$\ |v| = 1$

Converting a vector into a unit form is called normalizing and is achieved by dividing a vector's components by its magnitude. Its magnitude is retrieved by applying the Pythagorean Theorem.

$|v| = \sqrt {{x^2} + {y^2} + {z^2}}$

$\ {v_{unit}} = \frac{1}{{|v|}}\left[ {\begin{array}{*{20}{c}} x\\ y\\ z\\ \end{array}} \right]$

An example might be:

$\ v = \left[ {\begin{array}{*{20}{c}} 1\\ 2\\ 3\\ 0\\ \end{array}}\right]$

$|v| = \sqrt {{1^2} + {2^2} + {3^2}} = \sqrt {14}$

$\ {v_{unit}} = \frac{1}{{\sqrt{14}}}\left[ {\begin{array}{*{20}{c}} 1\\ 2\\ 3\\ \end{array}} \right] \approx \left[ {\begin{array}{*{20}{c}} 0.267\\ 0.535\\ 0.802\\ 0 \end{array}}\right]$

Cartesian Unit Vectors

Now that we have investigated the scalar multiplication of vectors, vector addition and subtraction and unit vectors, we can combine those to permit the algebraic manipulation of vectors (read more at [Vince][Lengyel]). A tool that helps to achieve this is called Cartesian unit vectors. The three Cartesian unit vectors i, j and k are aligned with the x-, y- and z-axes.

$\ i = \left[ {\begin{array}{*{20}{c}} 1\\ 0\\ 0\\ 0\\ \end{array}}\right] j = \left[ {\begin{array}{*{20}{c}} 0\\ 1\\ 0\\ 0\\ \end{array}}\right] k = \left[ {\begin{array}{*{20}{c}} 0\\ 0\\ 1\\ 0\\ \end{array}}\right]$

Any vector aligned with the x-, y- and z-axes can be defined by a scalar multiple of the unit vectors i, j and k. For example a vector 15 units long aligned with the y-axis is simply 15j. A vector 25 units long aligned with the z axis is 25k.

By employing the rules of vector addition and subtraction, we can compose a vector R by summing three Cartesian unit vectors as follows.

$\ R = ai + bj + ck$

This is equivalent to writing R as

$\ R = \left[ {\begin{array}{*{20}{c}} a\\ b\\ c\\ 0\\ \end{array}}\right]$

The magnitude of R would then be computed as

$|R| = \sqrt {{a^2} + {b^2} + {c^2}}$

Any pair of Cartesian vectors such as R and S can be combined as follows

$\ R = ai + bj + ck$

$\ S = di + ej + fk$

$\ R \pm S = (a \pm d)i + (b \pm e)j + (c \pm f)k$

An example would be

$\ R = 2i + 3j + 4k$

$\ S = 5i + 6j + 7k$

$\ R + S = 7i + 9j + 11k$

$\ |R + S| = \sqrt {{7^2} + {9^2} + {11^2}} \approx 15.84$

Vector Multiplication

Vector multiplication provides some powerful ways of computing angles and surface orientations. While the multiplication of two scalars is a familiar operation, the multiplication of vectors is a multiplication of two 3D lines, which is not an easy operation to visualize. In vector analysis, there are generally two ways to multiply vectors: one results in a scalar value and the other one in a vector.

Scalar or Dot Product

Multiplying the magnitude of two vectors |R| and |S| is a valid operation but it ignores the orientation of the vectors, which is one of their important features. Therefore we want to include the angles between the vectors. In case of the scalar product, this is done by projecting one vector onto the other.

Figure 11 - Projecting R on S

The projection of R on S creates the basis for the scalar product, because it takes into account their relative orientation. The length of R on S is

$\ |R|cos\beta$

Then we can multiply the projected length of R with the magnitude of S

$\ R \cdot S = |S||R|cos\beta$

or commonly written

$\ R \cdot S = |R||S|cos\beta$

The $\cdot$ symbol is used to represent scalar multiplications and to distinguish it from the vector product, which employs the $\ \times$ symbol. Because of this symbol, the scalar product is often referred to as the dot product. This geometric interpretation of the scalar product shows that in case the magnitude of R and S is one -in other words they are unit vectors- the calculation of the scalar product only relies on $\ cos \beta$. The following figure shows a number of dot product scenarios.

Figure 12 - Dot product

The geometric representation of the dot product is useful to imagine how it works but it doesn't map well to computer hardware. The algebraic representation maps better to computer hardware and is calculated with the help of Cartesian components:
$\ R = R_xi + R_yj + R_zk$

$\ S = S_xi + S_yj + S_zk$

$\\ R \cdot S = (R_xi + R_yj + R_zk) \cdot (S_xi + S_yj + S_zk) \\ = R_xi \cdot (S_xi + S_yj + S_zk) + R_yi \cdot (S_xi + S_yj + S_zk) + R_zi \cdot (S_xi + S_yj + S_zk)$

$\\ R \cdot S = R_xS_xi \cdot i + R_xS_yi \cdot j + R_xS_zi \cdot k \\ + R_yS_xj \cdot i + R_yS_yj \cdot j + R_yS_zj \cdot k \\ + R_zS_xk \cdot i + R_zS_yk \cdot j + R_zS_zk \cdot k$

There are various dot product terms such as $i \cdot i, i \cdot j, i \cdot k$ etc. in this equation. With the help of the geometric representation of the dot product it can be determined that terms that are mutually perpendicular like $i \cdot j, i \cdot k, j \cdot k$ are zero because the cosinus of 90 degrees is zero. This leads to

$\\ R \cdot S = R_xS_xi \cdot i + R_yS_yj \cdot j + R_zS_zk \cdot k$

Finally, terms with two vectors that are parallel to themselve lead to a value of one because the cosinus of a degree of zero is one. Additionally the Cartesian vectors are all unit vectors, which leads to

$\\ i \cdot i = |i||i|cos(0)= 1$

So we end up with the familiar equation

$\\ R \cdot S = R_xS_x + R_yS_y + R_zS_z$

An example:

$\ R = \left[ {\begin{array}{*{20}{c}} 2\\ 0\\ 4\\ 0\\ \end{array}}\right] S = \left[ {\begin{array}{*{20}{c}} 5\\ 6\\ 10\\ 0\\ \end{array}}\right]$

The algebraic representation results in:

$\\ R \cdot S = R_xS_x + R_yS_y + R_zS_z$

$\ 2 *5 + 0 * 6 + 4 * 10 = 50$

The geometric representation starts out with:

$\\ R \cdot S= |R||S|cos \beta$

$|R| = \sqrt {{2^2} + {0^2} + {4^2}} \approx 4.472$

$|S| = \sqrt {{5^2} + {6^2} + {10^2}} \approx 12.689$

Solving for the angle between the vectors by plugging in the result of the algebraic representation:

$\\ R \cdot S= |R||S|cos \beta = 2 *5 + 0 * 6 + 4 * 10 = 50$

$\\ R \cdot S= 12.689 * 4.472 cos \beta = 50$

$\\ cos \beta = \frac{50}{12.689 * 4.472} \approx 0.8811$

Solving for $\\beta$ leads to the angle between the two vectors:

$\\ \beta = cos^{-1} (0.8811) \approx 28.22^\circ$

The resulting angle will be always between $\\ 0^\circ$ and $\\ 180^\circ$, because, as the angle between two vectors increases beyond $\\ 180^\circ$ the returned angle $\\ \beta$ is always the smallest angle associated with the geometry.

Scalar Product in Lighting Calculations

Many games utilize the Blinn-Phong lighting model (see Wikipedia; ignore the code on this page). A part of the diffuse component of this lighting model is the Lambert's Law term published in 1760. Lambert stated that the intensity of illumination on a diffuse surface is proportional to the cosine of the angle between the surface normal vector and the light source direction.

Let's assume our light source is located in our reference space for lighting at (20, 30, 40), while our normal vector is normalized and located at (0, 11, 0). The point where the intensity of illumination is measured is located at (0, 10, 0).

Figure 13 - Lighting Calculation

The light and normal vector are calculated by subtracting the position of the point where the intensity is measured -representing their tails- from their heads.

$\ L = \left[ {\begin{array}{*{20}{c}} 20 - 0\\ 30 - 10\\ 40 - 0\\ 0\\ \end{array}}\right] N = \left[ {\begin{array}{*{20}{c}} 0\\ 11 - 10\\ 0\\ 0\\ \end{array}}\right]$

$\\ L \cdot N= |L||N|cos \beta = 20 * 0 + 20 * 1 + 40 * 0 = 20$

$|L| = \sqrt {{20^2} + {20^2} + {40^2}} \approx 48.9898$

$|N| = 1$

$\\ L \cdot N= 48.9898 * 1.0 * cos \beta = 20$

$\\ cos \beta = \frac{20}{48.9898 * 1.0} \approx 0.4082$

Instead of using the original light vector, the following scalar product normalizes the light vector first, before using it in the lighting equation.

$\ {L_{unit}} = \frac{1}{{|L|}}\left[ {\begin{array}{*{20}{c}} x\\ y\\ z\\ \end{array}} \right]$

$\ {L_{unit}} = \frac{1}{{48.9898}}\left[ {\begin{array}{*{20}{c}} 20\\ 20\\ 40\\ 0\\ \end{array}} \right] \approx \left[ {\begin{array}{*{20}{c}} 0.4082\\ 0.4082\\ 0.8165\\ 0 \end{array}}\right]$

To test if the light vectors magnitude is one:
$|L| = \sqrt {{0.4082^2} + {0.4082^2} + {0.8165^2}} \approx 1.0$

Plugging the unit light vector and the unit normal vector into the algebraic representation of the scalar product.

$\\ L \cdot N= |L||N|cos \beta = 0.4082 * 0 + 0.4082 * 1 + 0.8165 * 0 = 0.4082$

Now solving the geometrical representation for the cosine of the angle.

$\\ L \cdot N= |L||N|cos \beta = 0.4082$

$\\ cos \beta = \frac{0.4082}{1.0 * 1.0} = 0.4082$

In case the light and the normal vector are unit vectors, the result of the algebraic scalar product calculation equals the cosinus of the angle. The algebraic scalar product is implemented in the dot product intrinsic available for the CPU and GPU. In other words, in case the involved vectors are unit vectors, a processor can calculate the cosine of the angle faster. This is the reason why normalized vectors might be more efficient in programming computer hardware.

Following Lambert's law, the intensity of illumination on a diffuse surface is proportional to the consine of the angle between the surface normal and the light source direction. That means that the point at (0, 10, 0) receives about 0.4082 of the original light intensity at (20, 30, 40) (attenuation is not considered in this example).

Coming back to image 12, in case, the unit light vector would have a y component that is one or minus one and therefore its x and  y component would be zero, it would point in the same or opposite direction as the normal and therefore the last equation would result in one or minus one. If the unit light vector would have a z or x component equaling to one and therefore the other components would be zero, those equations would result in zero.

The Vector Product

Like the scalar product, the vector or cross product depends on the modulus of two vectors and the angle between them, but the result of the vector product is essentially different: it is another vector, at right angles to both the original vectors.

$\\ R \times S = T$

and

$\\ |T| = |R||S|sin\theta$

For an understanding of the vector product R and S, it helps to imagine a plane through those two vectors as shown in figure 14.

Figure 14 - Vector Product

The angle $\\ \theta$ between the directions of the vectors suffices $\\ 0 \leq \theta \leq 180^\circ$. There are two possible choices for the direction of the vector, each the negation of the other; the one chosen here is determined by the right-hand rule. Hold your right hand so that your forefinger points forward, your middle finger points out to the left, and your thumb points up. If you roughly align your forefinger with R, and your middle finger with S, then the cross product will point in the direction of your thumb.

Figure 15 - Right-Hand rule Vector Product

The resulting vector of the cross product is perpendicular to R and S, that is

$\\ R \cdot T = 0$

and

$\\ S \cdot T = 0$

This makes the vector product an ideal way of computing normals. The two vectors R and S can be orthogonal but do not have to be. A property of the vector product that will be covered later is, that the magnitude of T is the area of the parallelogram defined by R and S.

Let's multiply two vectors together using the vector product.

$\ R = R_xi + R_yj + R_zk$
$\ S = S_xi + S_yj + S_zk$

$\\ R \times S = (R_xi + R_yj + R_zk) \times (S_xi + S_yj + S_zk) \\ = R_xi \times (S_xi + S_yj + S_zk) + R_yi \times (S_xi + S_yj + S_zk) + R_zi \times (S_xi + S_yj + S_zk)$

$\\ R \times S = R_xS_xi \times i + R_xS_yi \times j + R_xS_zi \times k \\ + R_yS_xj \times i + R_yS_yj \times j + R_yS_zj \times k \\ + R_zS_xk \times i + R_zS_yk \times j + R_zS_zk \times k$

There are various vector product terms such as $\\ i \times i, i \times j, i \times k$ etc. in this equation. The terms $\\ i \times i, j \times j, k \times k$ will result in a vector whose magnitude is zero, because the angle between those vectors is $\\ 0^\circ$, and sin$\\ 0^\circ = 0$. This leaves

$\\ R \times S = R_xS_yi \times j + R_xS_zi \times k + R_yS_xj \times i + R_yS_zj \times k + R_zS_xk \times i + R_zS_yk \times j$

The other products between the unit vectors can be reasoned as:

$\\i \times j = k \\ j \times i = -k \\ j \times k = i \\ k \times j = -i \\ k \times i = j \\ i \times k = -j$

Those results show, that the commutative multiplication law is not applicable to vector products. In other words

$\\i \times j != j \times i$

Applying those findings reduces the vector product term to

$\\ R \times S = R_xS_yk - R_xS_zj - R_yS_xk + R_yS_zi + R_zS_xj - R_zS_yi$

Now re-grouping the equation to bring like terms together leads to:

$\\ R \times S = (R_yS_z - R_zS_y)i + (R_zS_x - R_xS_z)j + (R_xS_y - R_yS_x)k$

To achieve a visual pattern for remembering the vector product, some authors reverse the sign of the j scalar term.

$\\ R \times S = (R_yS_z - R_zS_y)i - (R_xS_z - R_zS_x)j + (R_xS_y - R_yS_x)k$

Re-writing the vector product as determinants might help to memorize it as well.

$\\ R \times S = \begin{vmatrix} R_y & R_z \\ S_y & S_z \end{vmatrix} i - \begin{vmatrix} R_x & R_z \\ S_x & S_z\end{vmatrix}j + \begin{vmatrix} R_x & R_y \\ S_x & S_y \end{vmatrix}k$

A 2x2 determinant is the difference between the product of the diagonal terms. With determinants a "recipe" for a vector product consists of the following steps:

1. Write the two vectors that should be multiplied as Cartesian vectors

$\ R = R_xi + R_yj + R_zk$

$\ S = S_xi + S_yj + S_zk$

2. Write the cross product of those two vectors in determinant form, if this helps to memorize the process; otherwise skip to step 3.

$\\ R \times S = \begin{vmatrix} R_y & R_z \\ S_y & S_z \end{vmatrix} i - \begin{vmatrix} R_x & R_z \\ S_x & S_z\end{vmatrix}j + \begin{vmatrix} R_x & R_y \\ S_x & S_y \end{vmatrix}k$

3. Then compute by plugging in the numbers into

$\\ R \times S = (R_yS_z - R_zS_y)i - (R_xS_z - R_zS_x)j + (R_xS_y - R_yS_x)k$

A simple example of a vector product calculation is to show that the assumptions that were made above, while simplifying the vector product, hold up.

$\\i \times j = k \\ j \times i = -k \\ j \times k = i \\ k \times j = -i \\ k \times i = j \\ i \times k = -j$

To show that there is a sign reversal when the vectors are reversed $\\i \times k = -j, k \times i = j$, let's calculate the cross product of those terms.

$\ i = 1i + 0j + 0k$

$\ k = 0i + 0j + 1k$

$\\ i \times k = \begin{vmatrix} 0 & 0 \\ 0 & 1 \end{vmatrix} i - \begin{vmatrix} 1 & 0 \\ 0 & 1\end{vmatrix}j + \begin{vmatrix} 1 & 0 \\ 0 & 0 \end{vmatrix}k$

$\\ i \times k = (0 * 1 - 0 * 0)i - (1 * 1 - 0 * 0)j + (1 * 0 - 0 * 0)k$

The i and k terms are both zero, but the j term is -1, which makes $\\i \times k = -j$. Now reversing the vector product

$\ k = 0i + 0j + 1k$

$\ i = 1i + 0j + 0k$

$\\ k \times i = \begin{vmatrix} 0 & 1 \\ 0 & 0 \end{vmatrix} i - \begin{vmatrix} 0 & 1 \\ 1 & 0\end{vmatrix}j + \begin{vmatrix} 0 & 0 \\ 1 & 0 \end{vmatrix}k$

$\\ k \times i = (0 * 0 - 1 * 0)i - (0 * 0 - 1 * 1)j + (0 * 0 - 0 * 1)k$

Which shows $\\k \times i = j$

Deriving a Unit Normal Vector for a Triangle

Image 16 shows a triangle with vertices defined in anti-clockwise order. The side pointing towards the viewer is defined as the visible side in this scene. That means that the normal is expected to point roughly in the direction of where the viewer is located.

Figure 16 - Deriving a Unit Normal Vector
The vertices of the triangle are:

P1 (0, 2, 1)
P2 (0, 1, 4)
P3 (2, 0, 1)

The two vectors R and S are retrieved by subtracting the vertex at the head from the vertex at its tail.

Δx = (xh - xt)
Δy = (yh - yt)
Δz = (zh - zt)

Bringing the result into the Cartesian form

$\ R = 0-2i + 2-0j + 1-1k$

$\ S = 0-2i + 1-0j + 4-1k$

$\\ R \times S = \begin{vmatrix} 2 & 0 \\ 1 & 3 \end{vmatrix} i - \begin{vmatrix} -2 & 0 \\ -2 & 3\end{vmatrix}j + \begin{vmatrix} -2 & 2 \\ -2 & 1 \end{vmatrix}k$

$\\ R \times S = (2 * 3 - 0 * 1)i - (-2 * 3 - 0 * -2)j + (-2 * 1 - 2 * -2)k$
$\\ N = 6i + 6j + 2k$

$|N| = \sqrt {{6^2} + {6^2} + {2^2}}$
$|N| = \sqrt {{76}} = 8.7178$

$\ {N_{unit}} = \frac{1}{{|N|}}\left[ {\begin{array}{*{20}{c}} 6\\ 6\\ 2\\ \end{array}} \right]$
$\ {N_{unit}} = \frac{1}{{8.7178}}\left[ {\begin{array}{*{20}{c}} 6\\ 6\\ 2\\ \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} 0.6882\\ 0.6882\\ 0.2294\\ \end{array}} \right]$

It is a common mistake to believe that if R and S are unit vectors, the cross product will also be a unit vector. The vector product equation shows that this is only true when the angle between the two vectors is 90 degrees and therefore the sinus of the angle theta is 1.

$\\ |T| = |R||S|sin\theta$

Areas
The vector product might be used to determine the area of a parallelogram or a triangle (with the vertices at P1 - P3). Image 17 shows the two vectors helping to form a parallelogram and a triangle.

Figure 17 - Deriving the Area of a Parallelogramm / Triangle with the Vector Product

The height h is $\\ h = |S|sin\theta$, therefore the area of the parallelogram is

$\\ area = |R|*h = |R||S|sin\theta$

This equals the magnitude of the cross product vector T. Thus when we calculate the vector product of R and S, the length of the normal vector equals the area of the parallelogram formed by those vectors. The triangle forms half of the parallelogram and therefore half of the area.

area of parallelogram = $\\ |T|$
area of triangle =$\\ \frac{1}{{2}}|T|$

or

area of triangle =$\\ \frac{1}{{2}}|R \times S|$

To compute the surface area of a mesh constructed from triangles or parallelograms, the magnitude of its non-normalized normals can be used like this.

$\\ \frac{MagnitudeOfAllNormals }{{2}}$

The sign of the magnitude of the normal shows if the vertices are clockwise or counter-clockwise oriented.

References