Wednesday, December 30, 2009

Direct3D 11 Overview

Here is a first draft for the data flow in the DirectX 11 rendering pipeline:

And here is the DirectCompute overview:

I would consider those now beta.

Tuesday, December 29, 2009

Direct3D 10 Overview

I started working on a Direct3D 10 overview that only covers one page. Here is the latest version.

Please note that this overview has nothing to do with the way the hardware works. It is just a diagram that shows the data flow and the usage of the Direct3D 10 API to stream the data through several logical stages that might be represented in hardware by one unit. If you are interested in the actual hardware design, I would recommend reading

A Closer Look at GPUs

Thursday, December 24, 2009

New Links

I updated my list of links on the right side with some of the websites I keep an eye on.
I never met Brian Karis but he has a few very forward thinking posts on his blog. The same is true for Pierre Terdiman. He covers many non-graphics related tasks and I believe I read his blog and former website since 7 years (?). Aurelio Reis has some cool procedural stuff on his blog. Simon Green worked on some of the coolest stuff that you can find in the NVIDIA SDK. His blog has some interesting entries on how the GPUs nowadays can render CG movie content in real-time while CPUs still need a lot more time to do the same. Then I also added Mike Acton's blog. I wonder how I could have forgotten this as often as Mike and I met in the last few months. He is certainly one of the SPU and Multi-core programming authorities in the industry. I especially like his opinion regarding C++ and data-centric design. Lots of people repeated this mantra in the last two years but I heard it from him before.

Sunday, November 29, 2009

CSE 190 GPU Programming UCSD

I am going to teach GPU Programming in the upcoming quarter at UCSD. Look out for course CSE 190. Here is the announcment:

Course Objectives:
This course will cover techniques on how to implement 3D graphics
techniques in an efficient way on the Graphics Processing Unit (GPU).

Course Description:
This course focuses on algorithms and approaches for programming a
GPU, including vertex, hull, tesselator, domain, geometry, pixel and
compute shaders. After an introduction into each of the algorithms,
the students will learn step-by-step on how to implement those
algorithms on the GPU. Particular subjects may include geometry
manipulations, lighting, shadowing, real-time global illumination,
image space effects and 3D Engine design.

Example Textbook(s):
A list of reading assignments will be given out each week.

Laboratory work:
Programming assignments.

Very exciting :-)

Order-Independent Transparency

Transparent objects that require alpha blending cannot be rendered on top in a G-Buffer. Blending two or more normals, depth or position values leads to wrong results.
In other words deferred lighting of objects that need to be visible through each other is not easily possible because the data for the object that is visible through another object is lost in a G-Buffer that can only store one layer of data for normals, depth and position.
The traditional way to work around this is to have a separate rendering path that deals with rendering and lighting of transparent objects that need to be alpha blended. In essence that means there is a second lighting system that can be forward rendered and usually has a lower quality than the deferred lights.
This system breaks down as soon as you have light numbers that are higher than a few dozen lights because forward rendering can't render so many lights. In that case it would be an advantage to use the same deferred lighting system that is used on opaque objects on transparent objects that would require alpha blending.
The simple case is for example windows where you can look through one window and maybe two more windows behind each other and see what is behind them. For example you look through the window from the outside into a house and then in the house is another glass wall through which you can look and then behind that glass wall is a freshwater tank that is lit ... etc. you got the idea.
This would be the "light" case to solve. Much harder are scenarios in which the number of transparent objects that can be behind each other is much higher ... like with particles or a room of transparent T-pots :-).

On DirectX9 and DirectX 10 class of hardware, one of the solutions that is mentioned to solve the problem of order-independent transparency is called Depth Peeling. It seems this techniques was first described by Abraham Mammen ("Transparency and antialiasing algorithms Implemented with the virtual pixel maps technique", IEEE Computer Graphics and Applications, vol. 9, no. 4, pp. 43-55, July/Aug. 1989) and Paul Diefenbach ("Pipeline rendering: Interaction and realism through hardware-based multi-pass rendering", Ph.D., University of Pennsylvania, 1996, 152 pages)(I don't have access to those papers). A description of the implementation was given by Cass Everitt here. The idea is to extract each unique depth in a scene into layers. Those layers are then composited in depth-sorted order to produce the correct blended image.
In other words: the standard depth test gives us the nearest fragment/pixel. The next pass over the scene gives us the second nearest fragment/pixel; the pass after this pass the third nearest fragment/pixel. The passes after the first pass are rendered by using the depth buffer computed in the first pass and "peel away" depths values that are less than or equal to the values in that depth buffer. All the values that are not "peeled away" are stored in another depth buffer. Pseudo code might look like this:

const float bias 0.0000001;

// peel away pixels from previous layers
// use a small bias to avoid precision issues.
clip(In.pos.z - PreviousPassDepth - bias);

By using the depth values from the previous pass for the following pass, multiple layers of depth can be stored. As soon as all the depth layers are generated, for each of the layers the G-Buffer data needs to be generated. This might be the color and normal render targets. In case we want to store three layers of depth, color and normal data also need to be stored for those three depth layers.
Having a scene that has many transparent objects overlay each other, the number of layers increases substantially and therefore the memory consumption.

A more advanced depth peeling technique was named Dual Depth Peeling and described by Louis Bavoil et al. here. The main advantage of this technique is that it peels a layer from the front and a layer from the back at the same time. This way four layers can be peeled away in two geometry passes.
On hardware that doesn't support independent blending equations in MRTs, the two layers per pass are generated by using MAX blending and writing out each component of a float2(-depth, depth) variable into a dedicated render target that is part of a MRT.

Nicolas Thibieroz describes in "Robust Order-Independent Transparency via Reverse Depth Peeling in DirectX 10" in ShaderX6 a technique called Reverse Depth Peeling. While depth peeling extracts layers in a front-to-back order and stores them for later usage, his technique peels the layers in back-to-front order and can blend with the backbuffer immediately. There is no need to store all the layers compared to depth peeling. Especially on console platforms this is a huge advantage.
The order of operations is:

1. Determine furthest layer
2. Fill-up depth buffer texture
3. Fill-up normal and color buffer
4. Do lighting & shadowing
5. Blend in backbuffer
6. Go to 1 for the next layer

Another technique is giving up MSAA and using the samples to store up to eight layers of data. Kevin Myers et al. uses in the article "Stencil Routed A-Buffer" the stencil buffer to do sub-pixel routing of fragments. This way eight layers can be written in one pass. Because the layers are not ordered by depth they need to be sorted afterwards. The drawbacks are that the algorithm is limited to eight layers, allocates lots of memory (8xMSAA can be depending on the underlying implementation a 8x screen-size render target), requires hardware that supports 8xMSAA and the bitonic sort might be expensive. Giving up MSAA, the "light" case described above would be easily possible with this technique with satisfying performance but it won't work on scenes where many objects are visible behind several other objects.

Another technique extends Dual Depth Peeling by attaching a sorted bucket list. The article "Efficient Depth Peeling via Bucket Sort" by Fang Liu et al. describes an adaptive scheme that requires two geometry passes to store depth value ranges in a bucket list, sorted with the help of a depth histogram. An implementation will be described in the upcoming book GPU Pro. The following image from this article shows the required passes.

The Initial Pass is similar to Dual Depth Peeling. Similar to other techniques that utilize eight render targets, 32:32:32:32 each, the technique has huge memory requirements.

To my knowledge those are the widely known techniques for order-independent transparency on DirectX 10 today. Do you know of any newer techniques suitable for DirectX 10 or DirectX 11 hardware?

Sunday, November 15, 2009

You want to become a Graphics Programmer ...

I regularly receive e-mails with the question what kind of books I recommend if someone wants to become a graphics programmer. Here is my current list (maybe some of you guys can add to this list?):
First of all math is required:
- Vector Calculus
- Vector Calculus, Linear Algebra, and Differential Forms I have the 1999 version of this book
- Computer Graphics Mathematical First Steps
- Mathematics for Computer Graphics

For a general knowledge in programming the CPU:
- Write Great Code Volume 1: Understanding the Machine

For a better knowledge on how to program the GPU:
- DirectX documentation
- NVIDIA GPU Programming Guide
- ATI GPU Programming Guide

To learn about how to program certain effects in an efficient way:
- ShaderX - ShaderX7
- GPU Gems - GPU Gems 3
- GPU Pro and GPU Pro Blog

To start learning DirectX 10 API + Shader Programming:
- Introduction to 3D Programming with DirectX 10
- Programming Vertex, Geometry and Pixel Shaders

To start learning OpenGL & OpenGL ES:
- Khronos group

For general overview:
- Real-Time Rendering
- Fundamentals of Computer Graphics (this one also belongs in the math section)

To get started with C:
- C Programming Language

To learn C++
- C++ for Game Developers
- C++ Cookbook
- there is a long list of more advanced C++ books ...

Thursday, October 22, 2009

River of Lights II

More work-in-progress shots.

Thursday, October 15, 2009

BitMasks / Packing Data into fp Render Targets

Recently I had the need to pack bit fields into 32-bit channels of a 32:32:32:32 fp render target.
First of all we can assume that all registers in the pixel shader operate in 32-bit precision and output data is written into a 32-bit fp render target. The 32-bit (or single-precision) floating point format uses 1 sign, 8-bits of exponent, and 23 bits of mantissa following the IEEE 754 standard.

To maintain maximum precision during floating-point computations, most computations use normalized values. Keeping floating-point numbers normalized is beneficial because it maintains the maximum number of bits of precision in a computation. If several higher-order bits of the mantissa are all zero, the mantissa has that many fewer bits of precision available for computation. Therefore a floating-point computation will be more accurate if it involves only normalized values whose higher-order mantissa bit contains one.

The IEEE 754 32-bit floating-point format specifies special cases in case the bits in the exponent are all set to zeros or ones. If all exponent bits are set, then the number represents either =/- infinity or a NaN (not-a-number), depending on the mantissa value. If all exponent bits are zero, then the number is denormalized and automatically gets flushed to zero as specified in the Direct3D 10 single-precision floating-point specifications (see Nicolas Thibieroz, "Packing Arbitrary Bit Fields into 16-bit Floating-Point Render Targets in DirectX10", ShaderX7).

When packing bit values, those cases need to be avoided.

// Pack three positive normalized numbers between 0.0 and 1.0 into a 32-bit fp
// channel of a render target
float Pack3PNForFP32(float3 channel)
// layout of a 32-bit fp register
// 1 sign bit; 8 bits for the exponent and 23 bits for the mantissa
uint uValue;

// pack x
uValue = ((uint)(channel.x * 65535.0 + 0.5)); // goes from bit 0 to 15
// pack y in EMMMMMMM
uValue |= ((uint)(channel.y * 255.0 + 0.5)) << 16

// pack z in SEEEEEEE
// the last E will never be 1b because the upper value is 254
// max value is 11111110 == 254
// this prevents the bits of the exponents to become all 1
// range is 1.. 254
// to prevent an exponent that is 0 we add 1.0
uValue |= ((uint)(channel.z * 253.0 + 1.5)) << 24

return asfloat(uValue);

// unpack three positive normalized values from a 32-bit float
float3 Unpack3PNFromFP32(float fFloatFromFP32)
float a, b, c, d;
uint uValue;
uint uInputFloat = asuint(fFloatFromFP32);
// unpack a
// mask out all the stuff above 16-bit with 0xFFFF
a = ((uInputFloat) & 0xFFFF) / 65535.0;
b = ((uInputFloat >> 16) & 0xFF) / 255.0;
// extract the 1..254 value range and subtract 1
// ending up with 0..253
c = (((uInputFloat >> 24) & 0xFF) - 1.0) / 253.0;

return float3(a, b, c);

Wednesday, September 30, 2009

River of Lights

Work in progress shot here. More than 8000 lights attached to particles in this hallway.Resolution is 1280x720 and the GPU still runs with 158 frames per second. The whole level has about 16k lights.

Tuesday, August 11, 2009

SIGGRAPH 2009 Impressions: Inferred Lighting

There is a new lighting approach that extends the Light Pre-Pass idea. It is called Inferred Lighting and it was presented by Scott Kircher and Alan Lawrence from Volition. Here is the link

They assume a Light Pre-pass concept as covered here on this blog with three passes. The geometry pass where they fill up the buffer, the lighting pass where light properties are rendered into a light buffer and a material pass in which the whole scene is rendered again, this time re-constructing different materials.
Their approach adds several new techniques to the toolset used to do deferred lighting / Light Pre-Pass.

1. They use a much smaller G-Buffer and Light buffer with a size of 800x540 on the XBOX 360. This way their memory bandwidth usage and pixel shading cost should be greatly reduced.

2. To upscale the final light buffer, they use Discontinuity Sensitive Filtering. During the geometry pass, one 16 bit channel of the DSF buffer is filled with the linear depth of the pixel, the other 16 bit channel is filled with an ID value that semi-uniquely identifies continuous regions. The upper 8 bits are an object ID, assigned per-object (renderable instance) in the scene. Since 8 bits only allows 256 unique object IDs, scenes with more than this number of ob-jects will have some objects sharing the same ID.
The lower 8 bits of the channel contain a normal-group ID. This ID is pre-computed and assigned to each face of the mesh. Anywhere the mesh has continuous normals, the ID is also continuous. A normal is continuous across an edge if and only if the two triangles share the same normal at both vertices of the edge.
By comparing normal-group IDs the discontinuity sensitive filter can detect normal discontinuities without actually having to reconstruct and compare normals. Both the object ID and normal-group ID must exactly match the material pass polygon being rendered before the light buffer sample can be used (depth must also match withinan adjustable threshold).
During the material pass, the pixel shader computes the locations of the four light buffer texels that would normally be accessed if regular bilinear filtering would be used. These four locations are point sampled from the DSF buffer. The depth and ID values retrieved from the DSF buffer are compared against the depth and ID of the object being rendered. The results of this comparison are used to bias the usual bilinear filtering weights so as to discard samples that do not belong to the surface currently rendering. These biased weights are then used in custom bilinear filtering of the light buffer. Since the filter only uses the light buffer samples that belong to the object being rendered, the resulting lighting gives the illusion of being at full resolution. This same method works even when the framebuffer is multisampled (hardware MSAA), however sub-pixel artifacts can occur, due to the pixel shader only being run once per pixel, rather than once per sample.
The authors report that such sub-pixel artifacts are typically not noticeable.

3. The authors of this paper also implemented a technique that allows to render alpha polygons with the Light Pre-Pass / Deferred lighting. It is based on stippling and the usage of the DSF filtering.
During the geometry pass the alpha polygons are rendered using a stipple pattern, so that their G-Buffer samples are interleaved with opaque polygon samples.
In the material pass the DSF for opaque polygons will automatically reject stippled alpha pixels, and alpha polygons are handled by finding the four closest light buffer samples in the same stipple pattern, again using DSF to make sure the samples were not overwritten by some other geometry.
Since the stipple pattern is a 2x2 regular pattern, the effect is that the alpha polygon gets lit at half the resolution of opaque objects. Opaque objects covered by one layer of alpha have a slightly reduced lighting resolution (one out of every four samples cannot be used).

Tuesday, July 28, 2009


SIGGRAPH is next week and I am still preparing my talk. If you are around please come by and say hi. My talks title is "Light Pre-Pass Renderer Mark III" and it is part of the "Advances in Real-Time Rendering in 3D Graphics and Games" day on Monday next week:

I collected all the new development in this area, and added a few new things I found out while working on DirectX 10 / 11 implementations and will post a link to the slides here. Especially on the PS3 there is lots of new and interesting development (Judging from the number of games that will ship with this approach I want to believe that it is the most popular way to apply lots of lights in games now). I received a first draft of an article for ShaderX8 / GPU Pro from Steven Tovey about how they implemented the Light Pre-Pass in the upcoming game Blur on the PS3. They based their approach on work done by Matt Swoboda. The results look very cool. You can check out the screenshots on their website.

There is lots of progress happening with the Oolong Engine for the iPhone / iPod Touch. Check out the change list on

We got OpenGL ES 2.0 running and there is a new tutorial series that looks really cool.

In other news somehow my name was mentioned on "The Escapist". Here is the link for your entertainment:

Friday, July 3, 2009

MSAA on the PS3 with Light Pre-Pass on the SPU

In the previous "MSAA on the PS3" thread Matt Swoboda jumped in and mentioned that they implemented MSAA on the SPU in the Phyre Engine. I knew that they implemented the Light Pre-Pass on the SPU but I completely forgot that they also had a solution to do MSAA on the SPU.
You can find the presentation "Deferred Lighting and Post Processing on PLAYSTATION®" here.
Because it is possible to read and write per sample with the SPU, they can achieve a similar functionality as the per-sample frequency of DirectX 10.1-class graphics hardware where each sample can be treated separately. So they can calculate the lighting for each of the sample values and write the results into each of the samples in the light buffer.

Monday, June 29, 2009

Ambient Occlusion in Screen-Space

Screen-Space Ambient Occlusion (SSAO) is quite popular in the moment. ShaderX7 had several articles and there are lots of approaches to gradually improve the effect.
A good way to look at SSAO or any similar approach is to consider it part of a whole pipeline of effects that can share resources and extend the idea to include one diffuse (and specular) indirect bounce of light by re-using resources.
The overall issues with SSAO are:
1. quite expensive for the image quality improvement. Using the astonishing high amount of frame-time for other effects is an intriguing idea. In other words the performance / quality-improvement ratio is not very good compared to e.g. PostFX where a bunch of effects consumes a similar amount of time.
2. a typical problem is that lighting is ignored by SSAO. Using the classical SSAO implementation under varying illumination introduces objectionable artifacts because the ambient term is darkened equally (obviously you can apply SSAO to the diffuse and specular term like a shadow term ... but then it isn't ambient anymore). If you have a "global ambient" light term like skylights, SSAO will diminish the effect. It also leads to problems with dynamic shadows.

Overall I believe a fundamental shift to more generic method is necessary to solve those issues. This is one of the things I am looking into ... so expect an update at some point in the future.

Wednesday, June 17, 2009

MSAA on the PS3 with Deferred Lighting / Shading / Light Pre-Pass

The Killzone 2 team came up with an interesting way to use MSAA on the PS3. You can find it on page 39 of the following slides:

What they do is read both samples in the multisampled render target, do the lighting calculations for both of them and then average the result and write it into the multi-sampled (... I assume it has to be multi-sampled because the depth buffer is multisampled) accumulation buffer. That somehow decreases the effectiveness of MSAA because the pixel averages all samples regardless of whether they actually pass the depth-stencil test. The multisampled accumulation buffer may therefore contain different values per sample when it was supposed to contain a unique value representing the average of all sample. Then on the other side they might only store a value in one of the samples and resolve afterwards ... which would mean the pixel shader runs only once.
This is also called "on-the-fly resolves".

It is better to write into each sample a dedicated value by using the sampling mask but then you run in case of 2xMSAA your pixel shader 2x ... DirectX10.1+ has the ability to run the pixel shader per sample. That doesn't mean it fully runs per sample. The MSAA unit seems to replicate the color value accordingly. That's faster but not possible on the PS3. I can't remember if the XBOX 360 has the ability to run the pixel shader per-sample but this is possible.

Saturday, June 13, 2009

Multisample Anti-Aliasing

Utilizing the Multisample Anti-Aliasing (MSAA) functionality of graphics hardware for deferred lighting can be challenging. Nicolas Thibieroz wrote an excellent article about MSAA published in ShaderX7 with the title "Deferred Shading with Multisampling Anti-Aliasing in DirectX10".
The following figure from the ShaderX7 article shows how MSAA works:

The pixel represented by a square has two triangles (blue and yellow) crossing some of its sample points. The black dot represents the pixel sample location (pixel center); this is were the pixel shader is executed. The cross symbol corresponds to the location of the multisamples where the depth tests are performed. Samples passing the depth test receive the output of the pixel shader. Those samples are replicated by the MSAA back-end into a multisampled render target that represents each pixel with -in that case- four samples. That means the render target size for an intended resolution of 1280x720 would be 2560x1440 representing each pixel with four samples but the pixel shader only writes 1280x720 times (assuming there is no overdraw) while the MSAA back-end replicates for each pixel four samples into the multisampled render target.
With deferred lighting there can be several of those multi-sampled render targets as part of a Multiple-Render-Target (MRT). In the so called Geometry stage, data is written into this MRT; therefore called G-Buffer. In case of 4xMSAA each of the render targets of the G-Buffer would be 2560x1440 in size.
In case of Deferred Lighting / Light Pre-Pass the G-Buffer holds normal and depth data. This data can never be resolved because resolving it would lead to incorrect results as shown by Nicolas in his article.
After the Geometry phase comes the Lighting or Shading phase in a Deferred Lighting/Light Pre-Pass/Deferred Shading renderer. In an ideal world you could blit each sample (not pixel) into the multisampled render target -that holds the result of the Shading phase- by reading the G-Buffer sample and performing all the calculations necessary on it.
In other words to achieve the best possible MSAA quality with those renderer designs, lighting equations would need to be applied on a per-sample basis into a multisampled render target and then later resolved.
This is possible with DirectX 10.1 graphics hardware (AMD's 10.1 capable cards; didn't try if S3 cards that support 10.1 can do this as well) that allows to execute a pixel shader at sample frequency.
To make this a viable option, this operation needs to be restricted to samples that belong to pixel edges. There are two passes necessary to make this work. One pass will use the pixel shader that runs operations performed on samples and in a second pass the pixel shader is run that performs operations per-pixel, which means the result of the pixel shader calculation is output to all samples passing the depth-stencil test.
To restrict the pixel shader that performs operations per-sample, a stencil test is used.
One interesting idea covered in the article is to detect edges with centroid sampling (available already on DirectX9 class graphics hardware). During the G-Buffer phase the vertex shader writes a variable unique to every pixel (e.g. pixel position data) into two outputs, while the associated pixel shader declares two inputs: one without and one with centroid sampling enabled. The pixel shader then compares the centroid-enabled input with the one without it. Differing values mean that samples were only partially covered by the triangle, indicating an edge pixel. A "centroid value" of 1.0 is then written out to a selected area of the G-Buffer (previously cleared to 0.0) to indicate that the covered samples belong to an edge pixel. Those values are then averaged while being resolved to find out the value per pixel. If the result is not exactly 0, then the current pixel is an edge pixel. This is shown in the following image from the article.
On the left the pixel shader input will always be evaluated at the center of the pixel regardless of whether it is covered by the triangle. On the right with centroid sampling, the two rightmost depth samples are covered by the triangle. The comparison of the values in the pixel shader will lead to the result that the samples were only partially covered by the triangle, indicating an edge pixel.
Because DirectX10 capable graphics hardware does not support the pixel shader running at sample frequency, a different solution needs to be developed here.
The best MSAA quality in that case is achieved by running the pixel shader multiple times per pixel, only enabling output to a single sample each pass. This can be achieved by using the OMSetBlendState() API. The results of this method would be identical to the DirectX 10.1 method but obviously due to the increased number of rendering passes and slightly reduced texture cache effectiveness more expensive.

Saturday, May 23, 2009

Deferred Lighting / Particle System

Here is a shot of a GPU based particle system with lights attached to each particle. I used Emil Persson's example Deferred Shading program as a basis to implement a Light Pre-Pass renderer with 4k lights and 4k particles. It runs fairly well on a GeForce 9600 GT here:

Monday, May 18, 2009

Light Pre-Pass: Knee-Deep

Several companies adopted the Light Pre-Pass idea, modified it or came up with similar ideas:
  • Crytek: they call it Deferred lighting contrary to Deferred shading. The technique is mentioned in the new Cry Engine 3 presentation here
  • Garagegames in their new Torque 3D engine currently in beta. Read the article from Pat Wilson in ShaderX7 and the garagegames website
  • Insomniac came up with a Pre-lighting approach that is similar to this. See Mark Lee's presentation from GDC 2009 here
  • DICE is using it since a long time already
  • I believe EA used it in Dead Space :-)
  • Carsten Dachsbacher described a similar idea in his article "Splatting of Indirect Illumination" here and in ShaderX5
One of the interesting areas in this context is the ability to implement a one-bounce global illumination effect with the data in the G-Buffer and the light buffer ...

Thursday, April 30, 2009

3D Supershape

Over the last few years I was looking into the 3D Supershape formula described by Paul Bourke here and originally developed by Johan Gielis. I love the shape of the objects that are a result of those and therefore I always wanted to use it to create my own demos after I saw the one from Jetro Lauha ( Here is my first attempt to generate C source out of the equations:

Suitable C pseudo code could be:

float r = pow(pow(fabs(cos(m * o / 4)) / a, n2) + pow(fabs(sin(m * o / 4)) / b, n3), 1 / n1);

The result of this calculation is in polar coordinates. Please note the difference between the equation and the C code. The equation has a negative power value, the C doesn't. To extend this result into 3D, the spherical product of several superformulas is used. For example, the 3D parametric surface is obtained multiplying two superformulas S1and S2. The coordinates are defined by the relations:

The sphere mapping code uses two r values:

point->x = (float)(cosf(t) * cosf(p) / r1 / r2);
point->y = (float)(sinf(t) * cosf(p) / r1 / r2);
point->z = (float)(sinf(p) / r2);

Because r1 and r2 had a positive power value in the C code above we have to divide by those variables here. Here is a Mathematica render of this code:

Wednesday, April 29, 2009

Rockstar Games

Today GTA IV was launched a year ago and it is my last day where I am employed at Rockstar Games. After fantastic more than four years I felt like I should get a break to go back to some research topics and see my kids growing for a while :-), so I gave my notice two weeks ago.

Beagle Board

I got the whole development environment going and wrote a few small little graphics demos for it. All the PowerVR demos I tried ran on it nicely. Very cool!
If you are interested in a next-gen mobile development platform I would defitely recommend looking into this at

Any further development has now moved to lowest priority ... maybe at some point I will play around more with Angstroem. There is an online image builder

Monday, April 20, 2009 Ubuntu 8.04

In the last few days I setup a development environment for a BeagleBoard (see I wanted to hold the next-gen environment for future phones and the OpenPandora in my hands today. Overall the size of the board is astonishingly small and you can power it with the USB port. The board runs Angstroem -a Linux OS-, it has the OMAP3530 processor on there. It has a dedicated video decode DSP, the PowerVR SGX chipset, a sound chip and a few other things that I haven't used so far. You can even plug in a keyboard and a mouse and you have a full-blown computer with 256 MB RAM and 256 MB SDRAM.
To get this going I had to install a Linux OS on one of my PCs; Ubuntu 8.04. To relieve the pain of having to google all the Linux commands again and again I try to write down a few notes for myself here:
- minicom is not installed by default. You have to install it yourself. To do this you have to open up Applications -> Add/Remove and refresh the package list (you need an internet connection for this) and then install the build essentials first and then minicom by typing into a terminal:
sudo apt-get install build-essential
sudo apt-get install minicom
- to look for the RS232 serial device you can use
dmesg | grep tty
I found adding environment variables to the PATH statement different on Ubuntu 8.04. You can set an environment variable by using
export VARNAME=some_string
export PATH=$PATH:some/other/path
To check if it is set you can use
echo $PATH
For the PLATFORM you set it by typing
export PLATFORM=LinuxOMAP3
you use
to check if it is correct.
Similar for library pathes you type
export LIBDIR=$PWD
from the directory where the lib files are. To check that this works you can use
echo $LIBDIR
To make all those variable values persistent you can copy those statements at the end of the .bashrc file. Some other things I found convenient were:
gksudo gedit
start the editor with sudo.
Copying a file from one in another directory can be done by using the cp command like this
$ cp -i goulash recipes/hungarian
cp: overwrite recipes/hungarian/goulash (y/n)?

You can copy a directory path in the terminal by dragging the file from the file browser into the terminal command line.

Saturday, March 21, 2009

ShaderX7 on Sale

ShaderX7 has more than 800 pages. I like the following screenshot from
ShaderX8 is already announced. Proposals are due by May 19th, 2009. Please send them to wolf at An example proposal, writing guidelines and a FAQ can be downloaded from The schedule is available on

Thanks to Eric Haines for reminding me to add this to this page :-)

Wednesday, March 18, 2009


I switched from Maple to Mathematica last week. One of my small little projects is to store all the graphics algorithms I liked to visualize in the last few years in one file. A kind of condensed memory of the things I worked on. Here is an example for a simple Depth of Field effect (as already covered in my GDC 2007 talk):

Distance runs on the axis called Z value. So 0 is close to the camera and 1.0 is far away. You can see how the near and far blur plane fade in and out with increasing of the value called Range. The equation to plot this in mathematica is rather simple. In practice it is a quite efficient approach to achieve the effect.

Plot3D[R*Abs[0.5 - z], {z, 1, 0}, {R, 0, 1},
PlotStyle -> Directive[Pink, Specularity[White, 50], Opacity[0.8]],
PlotLabel -> "Depth of Field", AxesLabel -> {"Z value", "Range"}]

My plan is to develop a few new algorithms and show the results here. It will be an exercise in thinking about new things for me. If you have any suggestions on what I should cover, please do not hesitate to post them in the comment line.

Sunday, February 22, 2009

Team Leadership in the Game Industry

A few of my friends contributed to the book "Team Leadership in the Game Industry" by Seth Spaulding II. So I was curious what you can write about leaders in this industry. Having spent most of my professional life outside of the game industry I believe I developed a different frame of reference than many of my colleagues.

First of all: the book is great and definitely worth a read. It is written in a very informative, instructive and entertaining way (... if you know the guys that contributed to it you know that it is worth it :-) ).

With that being said, let's start with the review by looking at the Table of Content. I know that I usually spent more time than other people with reading the TOC. This is the best way for me to figure out what a book has to offer. A good TOC shows you the big picture of a book and allows you to see the pattern that the author chose on how to approach the topic. In most cases it even allows you to proof the underlying logic.
The book consists of 9 chapters. Each chapter consists of a analysis of facts by the author followed by an interview of a game industry veteran. The topics span from "How We got here" over "Anatomy of a Game-Dev Company", "How Leaders are Chosen ...", "A Litmus Test for Leads", "Leadership Types and Traits ..." and then they go into more detail with the "The Project Team Leader ...", "The Department Leader ...", "Difficult Employees ...", "The Effect of Great Team Leadership" followed by a "Sample Skill Ladder" for artists in the appendix.

You might feel the need to discuss some of the details covered in each chapter but it is clear that this is the right formal approach to slice up the delicate topic of leadership in our industry.

When I first skipped through the book I wanted to figure out what kind of values the author has. After all a good leader makes it clear what kind of values he/she follows. I found it in the introduction. Here is the quote: "As will be seen, a major cause of people leaving a company is the perceived poor quality of their supervisors and senior management. The game business is a talent-based industry -the stronger and deeper your talent is, the better chances are of creating a great game. It is very difficult, in any hiring environment, to build the right mix of cross-disciplinary talent who function as a team at a high level; indeed, most companies never manage it. Once you get talented individuals on board, it's critical not to lose them. Finding and nurturing compentent leaders who have the trust of the team will generate more retention than any addition of pool tables, movie nights, or verbal commitments to the value of "quality of life"."
You might think this is the most obvious thing to say in the game industry.

Obviously the book wants to cover the process to setup a creative and great environment for all humans involved in the process of creating great games. Creating a great working environment starts with picking the right leaders that enable people by helping them to give their best. A great leader serves his/her people. He/she sees the best in everyone and has the ability to expose this talent. Many interviewees in the book also mention that humor is a leadership skill. I trained junior managers for BMW, Daimler, ABB and other companies back in Germany for two years on weekends and I always thought this is a strong skill. Making people laugh starts a lot of processes in the body that make people more relaxed and in general brighten up their day. Whoever can do this can certainly improve the morale and therefore efficiency of a team in seconds ... priceless.

Managing a creative team is a completely different story than -for example- a sales team. The human factor in the relationship between people plays an important role. They have to create something together, while a sales person is on his own out in the field and comes back with a number and relies on a relationship with a potential customer that only lasts a few hours face-to-face time, a creative team stays together for years and has to overcome all the things that come up when humans have to live in a small space together. There is a complex social network in place that defines the relationships between those humans and it is important to keep the team running with all the constantly changing love/hate -and in-between- relationships on board. People on the team might even deal with difficult personal relationships and you end up with a mixture of chaos and randomness typical for family or close friends scenarios. In that context it was interesting to see what the interviewees thought about the question if leaders are born and / or can be trained to be successful in the game industry. Obviously someone who was active as a boy-scout leader, speaker/president of the students association at his university or volunteered to work with other people in general, already showed some level of social committment that is a good starting point for a leader ship role in our industry.

So defining and following the right values is a fundamental requirement for a book on leadership. Obviously after having set the values comes the part where those values need to be applied and used and this is where the book shines. It is hands-down and even if you do not agree with the author in every detail the fact that he wrote all this down earns the highest respect.

So now that I made it obvious that I am excited about this book, let's think about how it might be improved in the future. A potential improvement I could see is to start the book with a target description. Not that the author fails to describe a target but I would appreciate it to go into more detail in this area.
What is the company you would want to work for? What is the environment you want to offer to make people as productive as possible? Obviously it is a hen / egg problem. Good people want to work in good teams and good teams consist of good people ... there are social -soft skills- and knowledge -hard skills- attached to each person of that team.
A good team starts with a good leader who sets values and standards and hires the right people.

Assuming you are the leader of this future team, how would you create the environment for your dream team? How do you want people to feel when they are part of this team? What should they take home every night when they are exhausted? What do you want them to tell their wifes / better halves how it is to work with you as their leader?
A happy employee -fully enforced to be creative :-) - should tell his wife/girlfriend that he works very hard but is treated fair and enjoys the family related benefits of the company.
He should tell his friends that he is working in a team where information is shared and where his potential is not only used as much as possible but also amplified. He needs to feel like he is growing with the team and the tasks.
He should tell his colleagues that he enjoys working with them and the team and that he enjoys coming into work every day and that he is excited about the project he is working on ...

So if we make that into a list of items we could describe how an employee should feel about working in a company with good Leaders. Might be a great starting point for discussing leader core abilities.

Monday, February 2, 2009

Larrabee on GDC

I am really looking forward to Mike Abrash's and Tom Forsyth's talks at GDC about Larrabee:

Talking about the Larrabee instruction set will be super cool ... can't wait to see this.

Sunday, February 1, 2009

ShaderX7 Update

I updated the ShaderX7 website at

There is now the first draft of the cover and the Table of Content. Enjoy! :-)

As before I will rest for a second when the new book comes out and think about what happened since I founded the series now eight years ago ... my perception of time slows down for this second :-) and I hear myself saying:"Chewbacca start the hyperdrive, let's go to the next planet, I need to play cards, drink alcohol and find some entertainment ... how about Tantoine?"

Sunday, January 25, 2009

iP* programming tip #9

This issue of the iPhone / iPod Touch programmig tips series focuses on some aspects of VFP assembly programming. My friend Noel Llopis brought an oversight in the VFP math library to my attention, that I still need to fix. So I start with the description of the problem here and promise to fix it soon in the VFP library :-)
First let's start with the references. My friend Aaron Leiby has a blog entry on how to start programming the VFP unit here:

A typical inline assembly template might look like this:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
The last two lines of code hold the input and output operands and the so called clobbers, that are used to inform the compiler on which registers are used.
Here is a simple GCC assembly example -that doesn't use VFP assembly- that shows how the input and output operands are specified:

asm("mov %0, %1, ror #1" : "=r" (result) " : "r" (value));

The idea is that "=r" holds the result and "r" is the input. %0 refers to "=r" and %1 refers to "r".
Each operand is referenced by numbers. The first output operand is numbered 0, continuing in increasing order. There is a max number of operands ... I don't know what the max number is for the iPhone platform.

Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. So GCC will not assume that the values it loads into these registers will be valid.
In other words a clobber list tells the compiler which registers were used but not passed as operands. If a register is used as a scratch register this register need to be mentioned in there. Here is an example:
asm volatile("ands    r3, %1, #3"     "\n\t"
"eor %0, %0, r3" "\n\t"
"addne %0, #4"
: "=r" (len)
: "0" (len)
: "cc", "r3"
r3 is used as a scratch register here. It seems the cc pseudo register tells the compiler about the clobber list. If the asm code changes memory the "memory" pseudo register informs the compiler about this.

asm volatile("ldr     %0, [%1]"         "\n\t"
"str %2, [%1, #4]" "\n\t"
: "=&r" (rdv)
: "r" (&table), "r" (wdv)
: "memory"
This special clobber informs the compiler that the assembler code may modify any memory location. Btw. the volatile attribute instructs the compiler not to optimize your assembler code.

If you want to add something to this tip ... please do not hesitate to write it in the comment line. I will add it then with your name.

Friday, January 9, 2009

Partial Derivative Normal Maps

To make my collection of normal map techniques more complete on this blog I also have to mention a special normal mapping technique that Insomniac's Mike Acton brought to my attention a long time ago (I wasn't sure if I am allowed to publish it ... but now they have slides on their website).
The idea is to store the paritial derivate of the normal in two channels of the map like this

dx = (-nx/nz);
dy = (-ny/nz);

Then you can reconstruct the normal like this:

nx = -dx;
ny = -dy;
nz = 1;

The advantage is that you do not have to reconstruct Z, so you can skip one instruction in each pixel shader that uses normal maps.
This is especially cool on the PS3 while on the XBOX 360 you can also create a custom texture format to let the texture fetch unit do the scale and bias and save a cycle there.
More details can be found at

Look for Partial Derivative Normal Maps.

Sunday, January 4, 2009

Handling Scene Geometry

I recently bumped into a post by Roderic Vicaire on the forums. It is here.
Obviously there is no generic solution to handle all scene geometry in the same way but depending on the game his naming conventions make a lot of sense (read "Scenegraphs say no" in Tom Forsyth's blog).
- SpatialGraph: used for finding out what is visible and should be drawn. Should make culling fast
- SceneTree: used for hierarchical animations, e.g. skeletal animation or a sword held in a character's hand
- RenderQueue: is filled by the SpatialGraph. Renders visible stuff fast. It sorts sub arrays per key, each key holding data such as depth, shaderID etc. (see Christer Ericson's blog entry "Sort based-draw call bucketing" for this)