Friday, July 3, 2009
MSAA on the PS3 with Light Pre-Pass on the SPU
You can find the presentation "Deferred Lighting and Post Processing on PLAYSTATION®" here.
Because it is possible to read and write per sample with the SPU, they can achieve a similar functionality as the per-sample frequency of DirectX 10.1-class graphics hardware where each sample can be treated separately. So they can calculate the lighting for each of the sample values and write the results into each of the samples in the light buffer.
Monday, June 29, 2009
Ambient Occlusion in Screen-Space
A good way to look at SSAO or any similar approach is to consider it part of a whole pipeline of effects that can share resources and extend the idea to include one diffuse (and specular) indirect bounce of light by re-using resources.
The overall issues with SSAO are:
1. quite expensive for the image quality improvement. Using the astonishing high amount of frame-time for other effects is an intriguing idea. In other words the performance / quality-improvement ratio is not very good compared to e.g. PostFX where a bunch of effects consumes a similar amount of time.
2. a typical problem is that lighting is ignored by SSAO. Using the classical SSAO implementation under varying illumination introduces objectionable artifacts because the ambient term is darkened equally (obviously you can apply SSAO to the diffuse and specular term like a shadow term ... but then it isn't ambient anymore). If you have a "global ambient" light term like skylights, SSAO will diminish the effect. It also leads to problems with dynamic shadows.
Overall I believe a fundamental shift to more generic method is necessary to solve those issues. This is one of the things I am looking into ... so expect an update at some point in the future.
Wednesday, June 17, 2009
MSAA on the PS3 with Deferred Lighting / Shading / Light Pre-Pass
The Killzone 2 team came up with an interesting way to use MSAA on the PS3. You can find it on page 39 of the following slides:
http://www.dimension3.sk/mambo/Articles/Deferred-Rendering-In-Killzone/View-category.php
What they do is read both samples in the multisampled render target, do the lighting calculations for both of them and then average the result and write it into the multi-sampled (... I assume it has to be multi-sampled because the depth buffer is multisampled) accumulation buffer. That somehow decreases the effectiveness of MSAA because the pixel averages all samples regardless of whether they actually pass the depth-stencil test. The multisampled accumulation buffer may therefore contain different values per sample when it was supposed to contain a unique value representing the average of all sample. Then on the other side they might only store a value in one of the samples and resolve afterwards ... which would mean the pixel shader runs only once.
This is also called "on-the-fly resolves".
It is better to write into each sample a dedicated value by using the sampling mask but then you run in case of 2xMSAA your pixel shader 2x ... DirectX10.1+ has the ability to run the pixel shader per sample. That doesn't mean it fully runs per sample. The MSAA unit seems to replicate the color value accordingly. That's faster but not possible on the PS3. I can't remember if the XBOX 360 has the ability to run the pixel shader per-sample but this is possible.
Saturday, June 13, 2009
Multisample Anti-Aliasing
The following figure from the ShaderX7 article shows how MSAA works:
The pixel represented by a square has two triangles (blue and yellow) crossing some of its sample points. The black dot represents the pixel sample location (pixel center); this is were the pixel shader is executed. The cross symbol corresponds to the location of the multisamples where the depth tests are performed. Samples passing the depth test receive the output of the pixel shader. Those samples are replicated by the MSAA back-end into a multisampled render target that represents each pixel with -in that case- four samples. That means the render target size for an intended resolution of 1280x720 would be 2560x1440 representing each pixel with four samples but the pixel shader only writes 1280x720 times (assuming there is no overdraw) while the MSAA back-end replicates for each pixel four samples into the multisampled render target.
With deferred lighting there can be several of those multi-sampled render targets as part of a Multiple-Render-Target (MRT). In the so called Geometry stage, data is written into this MRT; therefore called G-Buffer. In case of 4xMSAA each of the render targets of the G-Buffer would be 2560x1440 in size.
In case of Deferred Lighting / Light Pre-Pass the G-Buffer holds normal and depth data. This data can never be resolved because resolving it would lead to incorrect results as shown by Nicolas in his article.
After the Geometry phase comes the Lighting or Shading phase in a Deferred Lighting/Light Pre-Pass/Deferred Shading renderer. In an ideal world you could blit each sample (not pixel) into the multisampled render target -that holds the result of the Shading phase- by reading the G-Buffer sample and performing all the calculations necessary on it.
In other words to achieve the best possible MSAA quality with those renderer designs, lighting equations would need to be applied on a per-sample basis into a multisampled render target and then later resolved.
This is possible with DirectX 10.1 graphics hardware (AMD's 10.1 capable cards; didn't try if S3 cards that support 10.1 can do this as well) that allows to execute a pixel shader at sample frequency.
To make this a viable option, this operation needs to be restricted to samples that belong to pixel edges. There are two passes necessary to make this work. One pass will use the pixel shader that runs operations performed on samples and in a second pass the pixel shader is run that performs operations per-pixel, which means the result of the pixel shader calculation is output to all samples passing the depth-stencil test.
To restrict the pixel shader that performs operations per-sample, a stencil test is used.
One interesting idea covered in the article is to detect edges with centroid sampling (available already on DirectX9 class graphics hardware). During the G-Buffer phase the vertex shader writes a variable unique to every pixel (e.g. pixel position data) into two outputs, while the associated pixel shader declares two inputs: one without and one with centroid sampling enabled. The pixel shader then compares the centroid-enabled input with the one without it. Differing values mean that samples were only partially covered by the triangle, indicating an edge pixel. A "centroid value" of 1.0 is then written out to a selected area of the G-Buffer (previously cleared to 0.0) to indicate that the covered samples belong to an edge pixel. Those values are then averaged while being resolved to find out the value per pixel. If the result is not exactly 0, then the current pixel is an edge pixel. This is shown in the following image from the article.
On the left the pixel shader input will always be evaluated at the center of the pixel regardless of whether it is covered by the triangle. On the right with centroid sampling, the two rightmost depth samples are covered by the triangle. The comparison of the values in the pixel shader will lead to the result that the samples were only partially covered by the triangle, indicating an edge pixel.
Because DirectX10 capable graphics hardware does not support the pixel shader running at sample frequency, a different solution needs to be developed here.
The best MSAA quality in that case is achieved by running the pixel shader multiple times per pixel, only enabling output to a single sample each pass. This can be achieved by using the OMSetBlendState() API. The results of this method would be identical to the DirectX 10.1 method but obviously due to the increased number of rendering passes and slightly reduced texture cache effectiveness more expensive.
Saturday, May 23, 2009
Deferred Lighting / Particle System

Monday, May 18, 2009
Light Pre-Pass: Knee-Deep
- Crytek: they call it Deferred lighting contrary to Deferred shading. The technique is mentioned in the new Cry Engine 3 presentation here
- Garagegames in their new Torque 3D engine currently in beta. Read the article from Pat Wilson in ShaderX7 and the garagegames website
- Insomniac came up with a Pre-lighting approach that is similar to this. See Mark Lee's presentation from GDC 2009 here
- DICE is using it since a long time already
- I believe EA used it in Dead Space :-)
- Carsten Dachsbacher described a similar idea in his article "Splatting of Indirect Illumination" here and in ShaderX5
Thursday, April 30, 2009
3D Supershape

Suitable C pseudo code could be:
float r = pow(pow(fabs(cos(m * o / 4)) / a, n2) + pow(fabs(sin(m * o / 4)) / b, n3), 1 / n1);
The result of this calculation is in polar coordinates. Please note the difference between the equation and the C code. The equation has a negative power value, the C doesn't. To extend this result into 3D, the spherical product of several superformulas is used. For example, the 3D parametric surface is obtained multiplying two superformulas S1and S2. The coordinates are defined by the relations:
The sphere mapping code uses two r values:
point->x = (float)(cosf(t) * cosf(p) / r1 / r2);point->y = (float)(sinf(t) * cosf(p) / r1 / r2);
point->z = (float)(sinf(p) / r2);
Wednesday, April 29, 2009
Rockstar Games
Beagle Board
Any further development has now moved to lowest priority ... maybe at some point I will play around more with Angstroem. There is an online image builder
http://amethyst.openembedded.
Monday, April 20, 2009
BeagleBoard.org Ubuntu 8.04
To get this going I had to install a Linux OS on one of my PCs; Ubuntu 8.04. To relieve the pain of having to google all the Linux commands again and again I try to write down a few notes for myself here:
- minicom is not installed by default. You have to install it yourself. To do this you have to open up Applications -> Add/Remove and refresh the package list (you need an internet connection for this) and then install the build essentials first and then minicom by typing into a terminal:
sudo apt-get install build-essential
sudo apt-get install minicom
- to look for the RS232 serial device you can use
dmesg | grep tty
I found adding environment variables to the PATH statement different on Ubuntu 8.04. You can set an environment variable by using
export VARNAME=some_string
e.g
export PATH=$PATH:some/other/path
To check if it is set you can use
echo $PATH
For the PLATFORM you set it by typing
export PLATFORM=LinuxOMAP3
you use
echo $PLATFORM
to check if it is correct.
Similar for library pathes you type
export LIBDIR=$PWD
from the directory where the lib files are. To check that this works you can use
echo $LIBDIR
To make all those variable values persistent you can copy those statements at the end of the .bashrc file. Some other things I found convenient were:
gksudo gedit
start the editor with sudo.
Copying a file from one in another directory can be done by using the cp command like this
$ cp -i goulash recipes/hungarian
cp: overwrite recipes/hungarian/goulash (y/n)?
You can copy a directory path in the terminal by dragging the file from the file browser into the terminal command line.
Saturday, March 21, 2009
ShaderX7 on Sale
ShaderX8 is already announced. Proposals are due by May 19th, 2009. Please send them to wolf at shaderx.com. An example proposal, writing guidelines and a FAQ can be downloaded from www.shaderx6.com/ShaderX6.zip. The schedule is available on http://www.shaderx8.com/. Thanks to Eric Haines for reminding me to add this to this page :-)
Wednesday, March 18, 2009
Mathematica
Distance runs on the axis called Z value. So 0 is close to the camera and 1.0 is far away. You can see how the near and far blur plane fade in and out with increasing of the value called Range. The equation to plot this in mathematica is rather simple. In practice it is a quite efficient approach to achieve the effect.Plot3D[R*Abs[0.5 - z], {z, 1, 0}, {R, 0, 1},
PlotStyle -> Directive[Pink, Specularity[White, 50], Opacity[0.8]],
PlotLabel -> "Depth of Field", AxesLabel -> {"Z value", "Range"}]
My plan is to develop a few new algorithms and show the results here. It will be an exercise in thinking about new things for me. If you have any suggestions on what I should cover, please do not hesitate to post them in the comment line.
Sunday, February 22, 2009
Team Leadership in the Game Industry
First of all: the book is great and definitely worth a read. It is written in a very informative, instructive and entertaining way (... if you know the guys that contributed to it you know that it is worth it :-) ).
With that being said, let's start with the review by looking at the Table of Content. I know that I usually spent more time than other people with reading the TOC. This is the best way for me to figure out what a book has to offer. A good TOC shows you the big picture of a book and allows you to see the pattern that the author chose on how to approach the topic. In most cases it even allows you to proof the underlying logic.
The book consists of 9 chapters. Each chapter consists of a analysis of facts by the author followed by an interview of a game industry veteran. The topics span from "How We got here" over "Anatomy of a Game-Dev Company", "How Leaders are Chosen ...", "A Litmus Test for Leads", "Leadership Types and Traits ..." and then they go into more detail with the "The Project Team Leader ...", "The Department Leader ...", "Difficult Employees ...", "The Effect of Great Team Leadership" followed by a "Sample Skill Ladder" for artists in the appendix.
You might feel the need to discuss some of the details covered in each chapter but it is clear that this is the right formal approach to slice up the delicate topic of leadership in our industry.
When I first skipped through the book I wanted to figure out what kind of values the author has. After all a good leader makes it clear what kind of values he/she follows. I found it in the introduction. Here is the quote: "As will be seen, a major cause of people leaving a company is the perceived poor quality of their supervisors and senior management. The game business is a talent-based industry -the stronger and deeper your talent is, the better chances are of creating a great game. It is very difficult, in any hiring environment, to build the right mix of cross-disciplinary talent who function as a team at a high level; indeed, most companies never manage it. Once you get talented individuals on board, it's critical not to lose them. Finding and nurturing compentent leaders who have the trust of the team will generate more retention than any addition of pool tables, movie nights, or verbal commitments to the value of "quality of life"."
You might think this is the most obvious thing to say in the game industry.
Obviously the book wants to cover the process to setup a creative and great environment for all humans involved in the process of creating great games. Creating a great working environment starts with picking the right leaders that enable people by helping them to give their best. A great leader serves his/her people. He/she sees the best in everyone and has the ability to expose this talent. Many interviewees in the book also mention that humor is a leadership skill. I trained junior managers for BMW, Daimler, ABB and other companies back in Germany for two years on weekends and I always thought this is a strong skill. Making people laugh starts a lot of processes in the body that make people more relaxed and in general brighten up their day. Whoever can do this can certainly improve the morale and therefore efficiency of a team in seconds ... priceless.
Managing a creative team is a completely different story than -for example- a sales team. The human factor in the relationship between people plays an important role. They have to create something together, while a sales person is on his own out in the field and comes back with a number and relies on a relationship with a potential customer that only lasts a few hours face-to-face time, a creative team stays together for years and has to overcome all the things that come up when humans have to live in a small space together. There is a complex social network in place that defines the relationships between those humans and it is important to keep the team running with all the constantly changing love/hate -and in-between- relationships on board. People on the team might even deal with difficult personal relationships and you end up with a mixture of chaos and randomness typical for family or close friends scenarios. In that context it was interesting to see what the interviewees thought about the question if leaders are born and / or can be trained to be successful in the game industry. Obviously someone who was active as a boy-scout leader, speaker/president of the students association at his university or volunteered to work with other people in general, already showed some level of social committment that is a good starting point for a leader ship role in our industry.
So defining and following the right values is a fundamental requirement for a book on leadership. Obviously after having set the values comes the part where those values need to be applied and used and this is where the book shines. It is hands-down and even if you do not agree with the author in every detail the fact that he wrote all this down earns the highest respect.
So now that I made it obvious that I am excited about this book, let's think about how it might be improved in the future. A potential improvement I could see is to start the book with a target description. Not that the author fails to describe a target but I would appreciate it to go into more detail in this area.
What is the company you would want to work for? What is the environment you want to offer to make people as productive as possible? Obviously it is a hen / egg problem. Good people want to work in good teams and good teams consist of good people ... there are social -soft skills- and knowledge -hard skills- attached to each person of that team.
A good team starts with a good leader who sets values and standards and hires the right people.
Assuming you are the leader of this future team, how would you create the environment for your dream team? How do you want people to feel when they are part of this team? What should they take home every night when they are exhausted? What do you want them to tell their wifes / better halves how it is to work with you as their leader?
A happy employee -fully enforced to be creative :-) - should tell his wife/girlfriend that he works very hard but is treated fair and enjoys the family related benefits of the company.
He should tell his friends that he is working in a team where information is shared and where his potential is not only used as much as possible but also amplified. He needs to feel like he is growing with the team and the tasks.
He should tell his colleagues that he enjoys working with them and the team and that he enjoys coming into work every day and that he is excited about the project he is working on ...
So if we make that into a list of items we could describe how an employee should feel about working in a company with good Leaders. Might be a great starting point for discussing leader core abilities.
Monday, February 2, 2009
Larrabee on GDC
Sunday, February 1, 2009
ShaderX7 Update
http://www.shaderx7.com/
There is now the first draft of the cover and the Table of Content. Enjoy! :-)
As before I will rest for a second when the new book comes out and think about what happened since I founded the series now eight years ago ... my perception of time slows down for this second :-) and I hear myself saying:"Chewbacca start the hyperdrive, let's go to the next planet, I need to play cards, drink alcohol and find some entertainment ... how about Tantoine?"
Sunday, January 25, 2009
iP* programming tip #9
First let's start with the references. My friend Aaron Leiby has a blog entry on how to start programming the VFP unit here:
A typical inline assembly template might look like this:
asm ( assembler templateThe last two lines of code hold the input and output operands and the so called clobbers, that are used to inform the compiler on which registers are used.
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
);
Here is a simple GCC assembly example -that doesn't use VFP assembly- that shows how the input and output operands are specified:
asm("mov %0, %1, ror #1" : "=r" (result) " : "r" (value));
The idea is that "=r" holds the result and "r" is the input. %0 refers to "=r" and %1 refers to "r".
Each operand is referenced by numbers. The first output operand is numbered 0, continuing in increasing order. There is a max number of operands ... I don't know what the max number is for the iPhone platform.
Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. So GCC will not assume that the values it loads into these registers will be valid.
In other words a clobber list tells the compiler which registers were used but not passed as operands. If a register is used as a scratch register this register need to be mentioned in there. Here is an example:
asm volatile("ands r3, %1, #3" "\n\t"
"eor %0, %0, r3" "\n\t"
"addne %0, #4"
: "=r" (len)
: "0" (len)
: "cc", "r3"
);
r3 is used as a scratch register here. It seems the cc pseudo register tells the compiler about the clobber list. If the asm code changes memory the "memory" pseudo register informs the compiler about this.asm volatile("ldr %0, [%1]" "\n\t"
"str %2, [%1, #4]" "\n\t"
: "=&r" (rdv)
: "r" (&table), "r" (wdv)
: "memory"
);This special clobber informs the compiler that the assembler code may modify any memory location. Btw. the volatile attribute instructs the compiler not to optimize your assembler code.If you want to add something to this tip ... please do not hesitate to write it in the comment line. I will add it then with your name.
Friday, January 9, 2009
Partial Derivative Normal Maps
Sunday, January 4, 2009
Handling Scene Geometry
Obviously there is no generic solution to handle all scene geometry in the same way but depending on the game his naming conventions make a lot of sense (read "Scenegraphs say no" in Tom Forsyth's blog).
- SpatialGraph: used for finding out what is visible and should be drawn. Should make culling fast
- SceneTree: used for hierarchical animations, e.g. skeletal animation or a sword held in a character's hand
- RenderQueue: is filled by the SpatialGraph. Renders visible stuff fast. It sorts sub arrays per key, each key holding data such as depth, shaderID etc. (see Christer Ericson's blog entry "Sort based-draw call bucketing" for this)
Sunday, December 28, 2008
Major Oolong Update
http://www.oolongengine.com
I updated the memory manager, the math library, upgraded to the latest POWERVR POD format and added to each example VBO support. Please also note that in previous updates a new memory manager was added, the VFP math library was added and a bunch of smaller changes were done as well.
The things on my list are: looking into the sound manager ... it seems like the current version allocates memory in the frame and adding the DOOM III level format as a game format. Obviously zip support would be nice as well ... let's see how far I get.
Thursday, December 25, 2008
Programming Vertex, Geometry and Pixel Shaders
http://wiki.gamedev.net/index.php/D3DBook:Book_Cover
If you have any suggestions, comments or additions to this book, please give me a sign or write it into the book comment pages.
Wednesday, December 24, 2008
Good Middleware
Tuesday, December 23, 2008
Quake III Arena for the iPhone
http://code.google.com/p/quake3-iphone/
There is a list of issues. If you have more spare time than me, maybe you can help out.
iP* programming tip #8
The problem is that each event is defined by the region it happens on the screen. When the user slides his finger, he is leaving this region. In other words if you handle on-screen touches as touch is on and finger lifted is off, if the finger is moved away and then lifted, the event is still on.
The work around is that if the user slides away with his finger the previous location of this finger is used to check if the current location is in the even region. If it is not, it defaults to switch off.
Touch-screen support for a typical shooter might work like this:
In touchesBegan, touchesMoved and touchesEnd there is a function call like this:
// Enumerates through all touch objects
for (UITouch *touch in touches)
{
[self _handleTouch:touch];
touchCount++;
}
_handleTouch might look like this:
- (void)_handleTouch:(UITouch *)touch
{
CGPoint location = [touch locationInView:self];
CGPoint previousLocation;
// if we are in a touchMoved phase use the previous location but then check if the current
// location is still in there
if (touch.phase == UITouchPhaseMoved)
previousLocation = [touch previousLocationInView:self];
else
previousLocation = location;
...
// fire event
// lower right corner .. box is 40 x 40
if (EVENTREGIONFIRE(previousLocation))
{
if (touch.phase == UITouchPhaseBegan)
{
// only trigger once
if (_bitMask ^ Q3Event_Fire)
{
[self _queueEventWithType:Q3Event_Fire value1:K_MOUSE1 value2:1];
_bitMask|= Q3Event_Fire;
}
}
else if (touch.phase == UITouchPhaseEnded)
{
if (_bitMask & Q3Event_Fire)
{
[self _queueEventWithType:Q3Event_Fire value1:K_MOUSE1 value2:0];
_bitMask^= Q3Event_Fire;
}
}
else if (touch.phase == UITouchPhaseMoved)
{
if (!(EVENTREGIONFIRE(location)))
{
if (_bitMask & Q3Event_Fire)
{
[self _queueEventWithType:Q3Event_Fire value1:K_MOUSE1 value2:0];
_bitMask^= Q3Event_Fire;
}
}
}
}
...
Tracking if the switch is on or off can be done with a bit mask. The event is send off to the game with a separate _queueEventWithType method.
Sunday, December 14, 2008
iP* programming tip #7
- glEnable(GL_POINT_SPRITES_OES) - this is the global switch that turns point sprites on. Once enabled, all points will be drawn as point sprites.
- glTexEnvi(GL_POINT_SPRITES_OES, GL_COORD_REPLACE_OES, GL_TRUE) - this enables [0..1] texture coordinate generation for the four corners of the point sprite. It can be set per-texture unit. If disabled, all corners of the quad have the same texture coordinate.
- glPointParametervf(GLenum pname, const GLfloat * params) - this is used to set the point attenuation as described below.
user_clamp represents GL_POINT_SIZE_MIN and GL_POINT_SIZE_MIN settings of the glPointParametervf(). impl_clamp represents an implementation-dependent point size range.GL_POINT_DISTANCE_ATTENUATION is used to pass in params as an array containing the distance attenuation coefficients a, b, and c, in that order.
In case multisampling is used (not officially supported), the point size is clamped to have a minimum threshold, and the alpha value of the point is modulated by the following equation:
GL_POINT_FADE_THRESHOLD_SIZE specifies the point alpha fade threshold.Check out the Oolong engine example Particle System for an implementation. It uses 600 point sprites with nearly 60 fps. Increasing the number of point sprites to 3000 lets the framerate drop to around 20 fps.
Friday, December 12, 2008
Free ShaderX Books
http://tog.acm.org/resources/shaderx/

