Diary of a Graphics Programmer

Thursday, April 30, 2009

3D Supershape

Over the last few years I was looking into the 3D Supershape formula described by Paul Bourke here and originally developed by Johan Gielis. I love the shape of the objects that are a result of those and therefore I always wanted to use it to create my own demos after I saw the one from Jetro Lauha (http://jet.ro/creations). Here is my first attempt to generate C source out of the equations:

Suitable C pseudo code could be:

float r = pow(pow(fabs(cos(m * o / 4)) / a, n2) + pow(fabs(sin(m * o / 4)) / b, n3), 1 / n1);

The result of this calculation is in polar coordinates. Please note the difference between the equation and the C code. The equation has a negative power value, the C doesn't. To extend this result into 3D, the spherical product of several superformulas is used. For example, the 3D parametric surface is obtained multiplying two superformulas S1and S2. The coordinates are defined by the relations:

The sphere mapping code uses two r values:

point->x = (float)(cosf(t) * cosf(p) / r1 / r2);
point->y = (float)(sinf(t) * cosf(p) / r1 / r2);
point->z = (float)(sinf(p) / r2);

Because r1 and r2 had a positive power value in the C code above we have to divide by those variables here. Here is a Mathematica render of this code:

Wednesday, April 29, 2009

Rockstar Games

Today GTA IV was launched a year ago and it is my last day where I am employed at Rockstar Games. After fantastic more than four years I felt like I should get a break to go back to some research topics and see my kids growing for a while :-), so I gave my notice two weeks ago.

Beagle Board

I got the whole development environment going and wrote a few small little graphics demos for it. All the PowerVR demos I tried ran on it nicely. Very cool!

If you are interested in a next-gen mobile development platform I would defitely recommend looking into this at

http://beagleboard.org/

Any further development has now moved to lowest priority ... maybe at some point I will play around more with Angstroem. There is an online image builder

http://amethyst.openembedded.net/~koen/narcissus/

Monday, April 20, 2009

BeagleBoard.org Ubuntu 8.04

In the last few days I setup a development environment for a BeagleBoard (see beagleboard.org). I wanted to hold the next-gen environment for future phones and the OpenPandora in my hands today. Overall the size of the board is astonishingly small and you can power it with the USB port. The board runs Angstroem -a Linux OS-, it has the OMAP3530 processor on there. It has a dedicated video decode DSP, the PowerVR SGX chipset, a sound chip and a few other things that I haven't used so far. You can even plug in a keyboard and a mouse and you have a full-blown computer with 256 MB RAM and 256 MB SDRAM.
To get this going I had to install a Linux OS on one of my PCs; Ubuntu 8.04. To relieve the pain of having to google all the Linux commands again and again I try to write down a few notes for myself here:
- minicom is not installed by default. You have to install it yourself. To do this you have to open up Applications -> Add/Remove and refresh the package list (you need an internet connection for this) and then install the build essentials first and then minicom by typing into a terminal:
sudo apt-get install build-essential
sudo apt-get install minicom
- to look for the RS232 serial device you can use
dmesg | grep tty
I found adding environment variables to the PATH statement different on Ubuntu 8.04. You can set an environment variable by using
export VARNAME=some_string
e.g
export PATH=$PATH:some/other/path
To check if it is set you can use
echo $PATH
For the PLATFORM you set it by typing
export PLATFORM=LinuxOMAP3
you use
echo $PLATFORM
to check if it is correct.
Similar for library pathes you type
export LIBDIR=$PWD
from the directory where the lib files are. To check that this works you can use
echo $LIBDIR
To make all those variable values persistent you can copy those statements at the end of the .bashrc file. Some other things I found convenient were:
gksudo gedit
start the editor with sudo.
Copying a file from one in another directory can be done by using the cp command like this
$ cp -i goulash recipes/hungarian
cp: overwrite recipes/hungarian/goulash (y/n)?
You can copy a directory path in the terminal by dragging the file from the file browser into the terminal command line.

Saturday, March 21, 2009

ShaderX7 on Sale

ShaderX7 has more than 800 pages. I like the following screenshot from Amazon.com:

ShaderX8 is already announced. Proposals are due by May 19th, 2009. Please send them to wolf at shaderx.com. An example proposal, writing guidelines and a FAQ can be downloaded from www.shaderx6.com/ShaderX6.zip. The schedule is available on http://www.shaderx8.com/.

Thanks to Eric Haines for reminding me to add this to this page :-)

Wednesday, March 18, 2009

Mathematica

I switched from Maple to Mathematica last week. One of my small little projects is to store all the graphics algorithms I liked to visualize in the last few years in one file. A kind of condensed memory of the things I worked on. Here is an example for a simple Depth of Field effect (as already covered in my GDC 2007 talk):

Distance runs on the axis called Z value. So 0 is close to the camera and 1.0 is far away. You can see how the near and far blur plane fade in and out with increasing of the value called Range. The equation to plot this in mathematica is rather simple. In practice it is a quite efficient approach to achieve the effect.

Plot3D[R*Abs[0.5 - z], {z, 1, 0}, {R, 0, 1},
PlotStyle -> Directive[Pink, Specularity[White, 50], Opacity[0.8]],
PlotLabel -> "Depth of Field", AxesLabel -> {"Z value", "Range"}]

My plan is to develop a few new algorithms and show the results here. It will be an exercise in thinking about new things for me. If you have any suggestions on what I should cover, please do not hesitate to post them in the comment line.

Sunday, February 22, 2009

Team Leadership in the Game Industry

A few of my friends contributed to the book "Team Leadership in the Game Industry" by Seth Spaulding II. So I was curious what you can write about leaders in this industry. Having spent most of my professional life outside of the game industry I believe I developed a different frame of reference than many of my colleagues.

First of all: the book is great and definitely worth a read. It is written in a very informative, instructive and entertaining way (... if you know the guys that contributed to it you know that it is worth it :-) ).

With that being said, let's start with the review by looking at the Table of Content. I know that I usually spent more time than other people with reading the TOC. This is the best way for me to figure out what a book has to offer. A good TOC shows you the big picture of a book and allows you to see the pattern that the author chose on how to approach the topic. In most cases it even allows you to proof the underlying logic.
The book consists of 9 chapters. Each chapter consists of a analysis of facts by the author followed by an interview of a game industry veteran. The topics span from "How We got here" over "Anatomy of a Game-Dev Company", "How Leaders are Chosen ...", "A Litmus Test for Leads", "Leadership Types and Traits ..." and then they go into more detail with the "The Project Team Leader ...", "The Department Leader ...", "Difficult Employees ...", "The Effect of Great Team Leadership" followed by a "Sample Skill Ladder" for artists in the appendix.

You might feel the need to discuss some of the details covered in each chapter but it is clear that this is the right formal approach to slice up the delicate topic of leadership in our industry.

When I first skipped through the book I wanted to figure out what kind of values the author has. After all a good leader makes it clear what kind of values he/she follows. I found it in the introduction. Here is the quote: "As will be seen, a major cause of people leaving a company is the perceived poor quality of their supervisors and senior management. The game business is a talent-based industry -the stronger and deeper your talent is, the better chances are of creating a great game. It is very difficult, in any hiring environment, to build the right mix of cross-disciplinary talent who function as a team at a high level; indeed, most companies never manage it. Once you get talented individuals on board, it's critical not to lose them. Finding and nurturing compentent leaders who have the trust of the team will generate more retention than any addition of pool tables, movie nights, or verbal commitments to the value of "quality of life"."
You might think this is the most obvious thing to say in the game industry.

Obviously the book wants to cover the process to setup a creative and great environment for all humans involved in the process of creating great games. Creating a great working environment starts with picking the right leaders that enable people by helping them to give their best. A great leader serves his/her people. He/she sees the best in everyone and has the ability to expose this talent. Many interviewees in the book also mention that humor is a leadership skill. I trained junior managers for BMW, Daimler, ABB and other companies back in Germany for two years on weekends and I always thought this is a strong skill. Making people laugh starts a lot of processes in the body that make people more relaxed and in general brighten up their day. Whoever can do this can certainly improve the morale and therefore efficiency of a team in seconds ... priceless.

Managing a creative team is a completely different story than -for example- a sales team. The human factor in the relationship between people plays an important role. They have to create something together, while a sales person is on his own out in the field and comes back with a number and relies on a relationship with a potential customer that only lasts a few hours face-to-face time, a creative team stays together for years and has to overcome all the things that come up when humans have to live in a small space together. There is a complex social network in place that defines the relationships between those humans and it is important to keep the team running with all the constantly changing love/hate -and in-between- relationships on board. People on the team might even deal with difficult personal relationships and you end up with a mixture of chaos and randomness typical for family or close friends scenarios. In that context it was interesting to see what the interviewees thought about the question if leaders are born and / or can be trained to be successful in the game industry. Obviously someone who was active as a boy-scout leader, speaker/president of the students association at his university or volunteered to work with other people in general, already showed some level of social committment that is a good starting point for a leader ship role in our industry.

So defining and following the right values is a fundamental requirement for a book on leadership. Obviously after having set the values comes the part where those values need to be applied and used and this is where the book shines. It is hands-down and even if you do not agree with the author in every detail the fact that he wrote all this down earns the highest respect.

So now that I made it obvious that I am excited about this book, let's think about how it might be improved in the future. A potential improvement I could see is to start the book with a target description. Not that the author fails to describe a target but I would appreciate it to go into more detail in this area.
What is the company you would want to work for? What is the environment you want to offer to make people as productive as possible? Obviously it is a hen / egg problem. Good people want to work in good teams and good teams consist of good people ... there are social -soft skills- and knowledge -hard skills- attached to each person of that team.
A good team starts with a good leader who sets values and standards and hires the right people.

Assuming you are the leader of this future team, how would you create the environment for your dream team? How do you want people to feel when they are part of this team? What should they take home every night when they are exhausted? What do you want them to tell their wifes / better halves how it is to work with you as their leader?
A happy employee -fully enforced to be creative :-) - should tell his wife/girlfriend that he works very hard but is treated fair and enjoys the family related benefits of the company.
He should tell his friends that he is working in a team where information is shared and where his potential is not only used as much as possible but also amplified. He needs to feel like he is growing with the team and the tasks.
He should tell his colleagues that he enjoys working with them and the team and that he enjoys coming into work every day and that he is excited about the project he is working on ...

So if we make that into a list of items we could describe how an employee should feel about working in a company with good Leaders. Might be a great starting point for discussing leader core abilities.

Monday, February 2, 2009

Larrabee on GDC

I am really looking forward to Mike Abrash's and Tom Forsyth's talks at GDC about Larrabee:

http://software.intel.com/en-us/articles/intel-at-gdc/

Talking about the Larrabee instruction set will be super cool ... can't wait to see this.

Sunday, February 1, 2009

ShaderX7 Update

I updated the ShaderX7 website at

http://www.shaderx7.com/

There is now the first draft of the cover and the Table of Content. Enjoy! :-)

As before I will rest for a second when the new book comes out and think about what happened since I founded the series now eight years ago ... my perception of time slows down for this second :-) and I hear myself saying:"Chewbacca start the hyperdrive, let's go to the next planet, I need to play cards, drink alcohol and find some entertainment ... how about Tantoine?"

Sunday, January 25, 2009

iP* programming tip #9

This issue of the iPhone / iPod Touch programmig tips series focuses on some aspects of VFP assembly programming. My friend Noel Llopis brought an oversight in the VFP math library to my attention, that I still need to fix. So I start with the description of the problem here and promise to fix it soon in the VFP library :-)
First let's start with the references. My friend Aaron Leiby has a blog entry on how to start programming the VFP unit here:

http://aleiby.blogspot.com/2008/12/iphone-vfp-for-n00bs.html

A typical inline assembly template might look like this:

asm ( assembler template
         : output operands                  /* optional */
         : input operands                   /* optional */
         : list of clobbered registers      /* optional */
         );

The last two lines of code hold the input and output operands and the so called clobbers, that are used to inform the compiler on which registers are used.
Here is a simple GCC assembly example -that doesn't use VFP assembly- that shows how the input and output operands are specified:

asm("mov %0, %1, ror #1" : "=r" (result) " : "r" (value));

The idea is that "=r" holds the result and "r" is the input. %0 refers to "=r" and %1 refers to "r".
Each operand is referenced by numbers. The first output operand is numbered 0, continuing in increasing order. There is a max number of operands ... I don't know what the max number is for the iPhone platform.

Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. So GCC will not assume that the values it loads into these registers will be valid.
In other words a clobber list tells the compiler which registers were used but not passed as operands. If a register is used as a scratch register this register need to be mentioned in there. Here is an example:

asm volatile("ands    r3, %1, #3"     "\n\t"
          "eor     %0, %0, r3"     "\n\t"
          "addne   %0, #4"      
          : "=r" (len)        
          : "0" (len)         
          : "cc", "r3"
         );

r3 is used as a scratch register here. It seems the cc pseudo register tells the compiler about the clobber list. If the asm code changes memory the "memory" pseudo register informs the compiler about this.

asm volatile("ldr     %0, [%1]"         "\n\t"
           "str     %2, [%1, #4]"     "\n\t"
           : "=&r" (rdv)
           : "r" (&table), "r" (wdv)
           : "memory"
          );

This special clobber informs the compiler that the assembler code may modify any memory location. Btw. the volatile attribute instructs the compiler not to optimize your assembler code.

If you want to add something to this tip ... please do not hesitate to write it in the comment line. I will add it then with your name.

Friday, January 9, 2009

Partial Derivative Normal Maps

To make my collection of normal map techniques more complete on this blog I also have to mention a special normal mapping technique that Insomniac's Mike Acton brought to my attention a long time ago (I wasn't sure if I am allowed to publish it ... but now they have slides on their website).

The idea is to store the paritial derivate of the normal in two channels of the map like this

dx = (-nx/nz);

dy = (-ny/nz);

Then you can reconstruct the normal like this:

nx = -dx;

ny = -dy;

nz = 1;

normalize(n);

The advantage is that you do not have to reconstruct Z, so you can skip one instruction in each pixel shader that uses normal maps.

This is especially cool on the PS3 while on the XBOX 360 you can also create a custom texture format to let the texture fetch unit do the scale and bias and save a cycle there.

More details can be found at

http://www.insomniacgames.com/tech/articles/1108/files/Ratchet_and_Clank_WWS_Debrief_Feb_08.pdf

Look for Partial Derivative Normal Maps.

Sunday, January 4, 2009

Handling Scene Geometry

I recently bumped into a post by Roderic Vicaire on the www.gamedev.net forums. It is here.
Obviously there is no generic solution to handle all scene geometry in the same way but depending on the game his naming conventions make a lot of sense (read "Scenegraphs say no" in Tom Forsyth's blog).
- SpatialGraph: used for finding out what is visible and should be drawn. Should make culling fast
- SceneTree: used for hierarchical animations, e.g. skeletal animation or a sword held in a character's hand
- RenderQueue: is filled by the SpatialGraph. Renders visible stuff fast. It sorts sub arrays per key, each key holding data such as depth, shaderID etc. (see Christer Ericson's blog entry "Sort based-draw call bucketing" for this)

Sunday, December 28, 2008

Major Oolong Update

Two days ago I commited a major Oolong update. Please check out the Oolong Engine blog at

http://www.oolongengine.com

I updated the memory manager, the math library, upgraded to the latest POWERVR POD format and added to each example VBO support. Please also note that in previous updates a new memory manager was added, the VFP math library was added and a bunch of smaller changes were done as well.
The things on my list are: looking into the sound manager ... it seems like the current version allocates memory in the frame and adding the DOOM III level format as a game format. Obviously zip support would be nice as well ... let's see how far I get.

Thursday, December 25, 2008

Programming Vertex, Geometry and Pixel Shaders

A christmas present: we just went public with "Programming Vertex, Geometry and Pixel Shaders". I am a co-author of this book and we published it free on www.gamedev.net at

http://wiki.gamedev.net/index.php/D3DBook:Book_Cover

If you have any suggestions, comments or additions to this book, please give me a sign or write it into the book comment pages.

Wednesday, December 24, 2008

Good Middleware

Kyle Wilson wrote up a summary about how good middleware should be:

http://gamearchitect.net/2008/09/19/good-middleware/

An interesting read.

Tuesday, December 23, 2008

Quake III Arena for the iPhone

Just realized that one of the projects I contributed some code to went public in the meantime. You can get the source code at

http://code.google.com/p/quake3-iphone/

There is a list of issues. If you have more spare time than me, maybe you can help out.

iP* programming tip #8

This is the christmas issue of the iPhone / iPod touch programming tips. This time we deal with the touch interface. The main challenge I found with the touch screen support is that it is hard to use it to track for example forward / backward / left / right and fire at the same time. Let's say the user presses fire and then he presses forward, what happens when he accidentally slides his finger a bit?
The problem is that each event is defined by the region it happens on the screen. When the user slides his finger, he is leaving this region. In other words if you handle on-screen touches as touch is on and finger lifted is off, if the finger is moved away and then lifted, the event is still on.
The work around is that if the user slides away with his finger the previous location of this finger is used to check if the current location is in the even region. If it is not, it defaults to switch off.
Touch-screen support for a typical shooter might work like this:
In touchesBegan, touchesMoved and touchesEnd there is a function call like this:

// Enumerates through all touch objects
for (UITouch *touch in touches)
{
[self _handleTouch:touch];
touchCount++;
}

_handleTouch might look like this:

- (void)_handleTouch:(UITouch *)touch
{
CGPoint location = [touch locationInView:self];
CGPoint previousLocation;

// if we are in a touchMoved phase use the previous location but then check if the current
// location is still in there
if (touch.phase == UITouchPhaseMoved)
previousLocation = [touch previousLocationInView:self];
else
previousLocation = location;

...
// fire event
// lower right corner .. box is 40 x 40
if (EVENTREGIONFIRE(previousLocation))
{
if (touch.phase == UITouchPhaseBegan)
{
// only trigger once
if (_bitMask ^ Q3Event_Fire)
{
[self _queueEventWithType:Q3Event_Fire value1:K_MOUSE1 value2:1];

_bitMask|= Q3Event_Fire;
}
}
else if (touch.phase == UITouchPhaseEnded)
{
if (_bitMask & Q3Event_Fire)
{
[self _queueEventWithType:Q3Event_Fire value1:K_MOUSE1 value2:0];

_bitMask^= Q3Event_Fire;
}
}
else if (touch.phase == UITouchPhaseMoved)
{
if (!(EVENTREGIONFIRE(location)))
{
if (_bitMask & Q3Event_Fire)
{
[self _queueEventWithType:Q3Event_Fire value1:K_MOUSE1 value2:0];

_bitMask^= Q3Event_Fire;
}
}
}
}
...

Tracking if the switch is on or off can be done with a bit mask. The event is send off to the game with a separate _queueEventWithType method.

Sunday, December 14, 2008

iP* programming tip #7

This time I will cover Point Sprites in the iPhone / iPod touch programming tip. The idea is that a set of points -as the simplest primitive in OpenGL ES rendering- describes the positions of Point Sprites, and their appearance comes from the current texture map. This way, Point Sprites are screen-aligned sprites that offer a reduced geometry footprint and transform cost because they are represented by one point == vertex. This is useful for particle systems, lens flare, light glow and other 2-D effects.

glEnable(GL_POINT_SPRITES_OES) - this is the global switch that turns point sprites on. Once enabled, all points will be drawn as point sprites.
glTexEnvi(GL_POINT_SPRITES_OES, GL_COORD_REPLACE_OES, GL_TRUE) - this enables [0..1] texture coordinate generation for the four corners of the point sprite. It can be set per-texture unit. If disabled, all corners of the quad have the same texture coordinate.
glPointParametervf(GLenum pname, const GLfloat * params) - this is used to set the point attenuation as described below.

The point size of a point sprite can be derived with the formula:

user_clamp represents GL_POINT_SIZE_MIN and GL_POINT_SIZE_MIN settings of the glPointParametervf(). impl_clamp represents an implementation-dependent point size range.
GL_POINT_DISTANCE_ATTENUATION is used to pass in params as an array containing the distance attenuation coefficients a, b, and c, in that order.
In case multisampling is used (not officially supported), the point size is clamped to have a minimum threshold, and the alpha value of the point is modulated by the following equation:

GL_POINT_FADE_THRESHOLD_SIZE specifies the point alpha fade threshold.
Check out the Oolong engine example Particle System for an implementation. It uses 600 point sprites with nearly 60 fps. Increasing the number of point sprites to 3000 lets the framerate drop to around 20 fps.

Friday, December 12, 2008

Free ShaderX Books

Eric Haines provided a home for the three ShaderX books that are now available for free. Thanks so much for this! Here is the URL

http://tog.acm.org/resources/shaderx/

Thursday, December 11, 2008

iP* programming tip #6

This time we are covering another fixed-function technique used in DirectX 7/8 times: Matrix Palettes support is an extension of OpenGL ES 1.1 that is supported on the iPhone.
It allows the usage of a set of matrices to transform the vertices and the normals. Each vertex has a set of indices into the palette, and a corresponding set of n weights.
The vertex is transformed by the modelview matrices specified by the vertices respective indices. These results are subsequently scaled by the weights of the respective units and then summed to create the eyespace vertex.

A similar procedure is followed for normals. They are transformed by the inverse transpose of the modelview matrix.

The main OpenGL ES functions that support Matrix Palette are

glMatrixMode(GL_MATRIX_PALETTE) - Set the matrix mode to palette
glCurrentPaletteMatrix(n) - Set the currently active palette matrix and loads each matrix in the palette
To enable vertex arrays
glEnableClientState(MATRIX_INDEX_ARRAY)
glEnableClientState(WEIGHT_ARRAY)
To load the index and weight per-vertex data
glWeightPointer()
glMatrixIndexPointer()

On the iPhone there are up to nine bones per sub-mesh supported (check GL_MAX_PALETTE_MATRICES_OES). Check out the Oolong example MatrixPalette for an implementation.

GDC Talk

My GDC talk was accepted. I am happy ... yeaaahhh :-)

Tuesday, December 9, 2008

Cached Shadow Maps

A friend just asked me about how to design a shadow map system for many lights with shadows. A quite good explanation was given in the following post already in 2003:

http://www.gamedev.net/community/forums/viewreply.asp?ID=741199

Yann Lombard explains on how to pick a light source first that should cast a shadow. He is using distance, intensity, influence and other parameters to pick light sources.

He has a cache of shadow maps that can have different resolutions. His cache solution is pretty generic. I would build a more dedicated cache just for shadow maps.
After having picked the light sources that should cast shadows, I would only constantly update shadows in that cache that change. This depends on if there is an object with a dynamic flag in the shadow view frustum.
If you think about it how it happens when you approach a scene with lights that cast shadows:
1. the lights are picked that are close enough and appropriate to cast shadows -> shadow maps are updated
2. then while we move on, for the lights in 1. we only update shadow maps if there is an object in shadow view that is moving / dynamic; we start than with the next bunch of shadows while the shadows in 1 are still in view
3. and so on.

Saturday, December 6, 2008

Dual-Paraboloid Shadow Maps

Here is an interesting post on Dual-Paraboloid Shadow maps. Pat Wilson describes a single pass approach here

http://www.gamedev.net/community/forums/topic.asp?topic_id=517022

This is pretty cool. Culling stuff into the two hemispheres is obsolete here. Other than this the usual comparison between cube maps and dual-paraboloid maps applies:

the number of drawcalls is the same ... so you do not save on this front
you loose memory bandwidth with cube maps because in worst case you render everything into six maps that are probably bigger than 256x256 ... in reality you won't render six times and therefore have less drawcalls than dual-paraboloid maps
the quality is much better for cube maps
the speed difference is not that huge because dual paraboloid maps use things like texkill or alpha test to pick the right map and therefore rendering is pretty slow without Hierarchical Z.

I think both techniques are equivalent for environment maps .. for shadows you might prefer cube maps; if you want to save memory dual-paraboloid maps is the only way to go.

Update: just saw this article on dual-paraboloid shadow maps:

http://osman.brian.googlepages.com/dpsm.pdf

The basic idea is that you do the WorldSpace -> Paraboloid transformation in the pixel shader during your lighting pass. That avoids having the paraboloid co-ordinates interpolated incorrectly.

iP* programming tip #5

Let's look today at the "pixel shader" level of the hardware functionality. The iPhone Application programming guide says that the application should not use more than 24 MB for textures and surfaces. It seems like those 24 MB are not in video card memory. I assume that all of the data is stored in system memory and the graphics card memory is not used.
Overall the iP* platform supports

The maximum texture size is 1024x1024
2D texture are supported; other texture formats are not
Stencil buffers aren’t available

As far as I know stencil buffer support is available in hardware. That means the Light Pre-Pass renderer can only be implemented with the help of the scissor (hopefully available). As a side note: one of the other things that do not seem to be exposed is MSAA rendering. With the unofficial SDK it seems like you can use MSAA.
Texture filtering is described on page 99 of the iPhone Application programming guide. There is also an extension for anisotropic filtering supported, that I haven't tried.

The pixel shader of the iP* platform is programmed via texture combiners. There is an overview on all OpenGL ES 1.1 calls at

http://www.khronos.org/opengles/sdk/1.1/docs/man/

The texture combiners are described in the page on glTexEnv. Per-Pixel Lighting is a popular example:

glTexEnvf(GL_TEXTURE_ENV,
// N.L
.. GL_TEXTURE_ENV_MODE, GL_COMBINE);
.. GL_COMBINE_RGB, GL_DOT3_RGB); // Blend0 = N.L

.. GL_SOURCE0_RGB, GL_TEXTURE); // normal map
.. GL_OPERAND0_RGB, GL_SRC_COLOR);
.. GL_SOURCE1_RGB, GL_PRIMARY_COLOR); // light vec
.. GL_OPERAND1_RGB, GL_SRC_COLOR);

// N.L * color map
.. GL_TEXTURE_ENV_MODE, GL_COMBINE);
.. GL_COMBINE_RGB, GL_MODULATE); // N.L * color map

.. GL_SOURCE0_RGB, GL_PREVIOUS); // previous result: N.L
.. GL_OPERAND0_RGB, GL_SRC_COLOR);
.. GL_SOURCE1_RGB, GL_TEXTURE); // color map
.. GL_OPERAND1_RGB, GL_SRC_COLOR);

Check out the Oolong example "Per-Pixel Lighting" in the folder Examples/Renderer for a full implementation.

Friday, December 5, 2008

iP* programming tip #4

All of the source code presented in this series is based on the Oolong engine. I will refer to the examples when it is appropriate so that everyone can look the code up or try it on its own. This tip covers the very simple basics of a iP* app. Here is the most basic piece of code to start a game:

// “View” for games in applicationDidFinishLaunching
// get screen rectangle
CGRect rect = [[UIScreen mainScreen] bounds];

// create one full-screen window
_window = [[UIWindow alloc] initWithFrame:rect];

// create OpenGL view
_glView = [[EAGLView alloc] initWithFrame: rect pixelFormat:GL_RGB565_OES depthFormat:GL_DEPTH_COMPONENT16_OES preserveBackBuffer:NO];

// attach the view to the window
[_window addSubView:_glView];

// show the window
[_window makeKeyAndVisible];

The screen dimensions are retrieved from a screen object. Erica Sadun compares the UIWindow functionality to a TV set and the UIView to actors in a TV show. I think this is a good way to memorize the functionality. In our case EAGLView, that comes with the Apple SDK, inherits from UIView and adds all the OpenGL ES functionality to it. We attach this view than to the window and make everything visible.
Oolong assumes a full-screen window that does not rotate. It is always in widescreen view. The reason for this is that otherwise the accelerometer usage -to drive a camera with the accelerometer for example- wouldn't be possible.
There is a corresponding dealloc method to this code that frees all the allocated resources again.
The anatomy of a Oolong engine example uses mainly two files. A file with "delegate" in the name and the main application file. The main application file has the following methods:
- InitApplication()
- QuitApplication()
- UpdateScene()
- RenderScene()
The first pair of methods do one-time device dependent resource allocations and deallocations, while the UpdateScene() prepares scene rendering and the last method actually does what the name says. If you would like to extend this framework to handle orientation changes, you would add a pair of methods with names like InitView() and ReleaseView() and handle all orientation dependent code in there. Those methods would always been called when the orientation changes -only once- and at the start of the application.

One other basic topic is the usage of C++. In Apple speak this is called Objective-C++. Cocoa Touch wants to be addressed with Obj-C. So native C or C++ code is not possible. For game developers there is lots of existing C/C++ code to be re-used and its usage makes games easier to port to several platforms (quite common to launch an IP on several platforms at once). The best solution to this dilemma is to use Objective-C where necessary and then wrap to C/C++.
If a file has the postfix *.mm, the compiler can handle Objective-C, C and C++ code pieces at the same time to a certain degree. If you look in Oolong for files with such a postfix you will find many of them. There are whitepapers and tutorials available for Objective-C++ that describe the limitations of the approach. Because garbage collection is not used on the iP* device I want to believe that the challenges to make this work on this platform are smaller. Here are a few examples on how the bridge between Objective-C and C/C++ is build in Oolong. In our main application class in every Oolong example we bridge from the Objective-C code used in the "delegate" file to the main application file like this:

// in Application.h
class CShell
{
..
bool UpdateScene();

// in Application.mm
bool CShell::UpdateScene()
..

// in Delegate.mm
static CShell *shell = NULL;

if(!shell->Update()) printf(“Update error\n”);

An example on how to call an Objective-C method from C++ can look like this (C wrapper):

// in PolarCamera.mm -> C wrapper
void UpdatePolarCamera()
{
[idFrame UpdateCamera];
}
-(void) UpdateCamera
{
..
// in Application.mm
bool Cshell::UpdateScene()
{
UpdatePolarCamera();
..

The idea is to retrieve the id for a class and then use this id to address a function in the class from the outside.
If you want to see all this in action, open up the skeleton example in the Oolong Engine source code. You can find it at
Examples/Renderer/Skeleton
Now that we are at the end of this tip I would like to refer to a blog that my friend Canis wrote. He talks about memory management here. This blog entry applies to the iP* platforms quite well:

http://www.wooji-juice.com/blog/cocoa-6-memory.html

Wednesday, December 3, 2008

iP* programming tip #3

Today I will cover the necessary files of an iP* application and the folders that potentially hold data on the device from your application.

.app folder holds everything without required hierarchy
.lproj language support
Executable
Info.plist – XML property list holds product identifier > allows communicate with other apps and register with Springboard
Icon.png (57x57) set UIPrerenderedIcon to true in Info.plist to not receive the gloss / shiny effect
Default.png … should match game background; no “Please wait” sign ... smooth fade
XIB (NIB) files precooked addressable user interface classes >remove NSMainNibFile key from Info.plist if you do not use it
Your files; for example in demoq3/quake3.pak

If the game boots very fast a good mobile phone experience could be guaranteed by making a screenshot when the user ends the app and then using that screenshot while booting up the game and bringing it to the state it was before.
Every iP* app is sandboxed. That means that only certain folders, network resources and hardware can be accessed. Here is a list of folders that might be affected by your application:

Preferences files are in var/mobile/Library/Preferences based on the product identifier (e.g. com.engel.Quake.plist); updated when you use something like NSUserDefaults to add persistance to game data like save and load
App plug-in /System/Library (not available)
Documents in /Documents
Each app has a tmp folder
Sandbox spec e.g. in /usr/share/sandbox > don’t touch 

The sandbox paradigm is also responsible for a mechanism that stops your game if it eats up too many resources of the iPhone. I wonder under which conditions this is going to happen.

Tuesday, December 2, 2008

HLSL 5.0 OOP / Dynamic Shader Linking

I just happen to bump into a few slides on the new HLSL 5.0 syntax. The slides are at

http://www.microsoft.com/downloads/details.aspx?FamilyId=32906B12-2021-4502-9D7E-AAD82C00D1AD&displaylang=en

I thought I comment on those slides because I do not get the main idea. The slides mention a combinatiorial explosion for shaders. They show on slide 19 three arrows that go in all three directions. One is called Number of Lights, another one Environmental Effects and the third one is called Number of Materials.

Regarding the first one: even if one has never worked on a game, everyone knows the words Deferred Lighting. If you want many lights you want to do the lighting in a way that the same shader is used for each light type. Assuming that we have a directional, point and spot light this brings me to three shaders (I actually use currently three but I might increase this to six).

One arrow talks about Environmental Effects. Most environmental effects nowadays are part of PostFX or a dedicated sky dome system. That adds two more shaders.

The last arrow says Number of Materials. Usually we have up to 20 different shaders for different materials.

This brings me to -let's say 30 - 40- different shaders in a game. I can't consider this a combinatorial explosion so far.

On slide 27 it is mentioned that the major driving point for introducing OOP is the dynamic shader linkage. It seems like there is a need for dynamic shader linkage because of the combinatorial explosion of the shaders.

So in essence the language design of the HLSL language is driven by the fact that we have too many shaders and someone assumes that we can't cope with the shear quantity. To fix this we need dynamic shader linkage and to make this happen we need OOP in HLSL.

It is hard for me to follow this logic. It looks to me like we are doing a huge step back here. Not focusing on the real needs and adding code bloat.

Dynamic shader linkers are proven to be useless since a long time in game development; the previous attempts in this area were buried with DirectX 9 SDKs. The reason for this is that they do not allow to hand-optimize code which is a very important thing to do to make your title competitive. As soon as you change one of the shader fragments this has impact on the performance of other shaders. Depending on if you hit a performance sweetspot or not you can get a very different performance out of graphics cards.

Because the performance of your code base becomes less predictable, you do not want to use a dynamic shader linker if you want to create competitive games in the AAA segment.

Game developers need more control over the performance of the underlying hardware. We are already forced to use NV API and other native APIs to ship games on the PC platform with acceptable feature set and performance (especially SLI configs) because DirectX does not expose the functionality. For the DirectX 9 platform we look into Cuda and Cal support for PostFX.

This probably does not have much impact on the HLSL syntax but in general I would prefer having more abilities to squeeze out more performance from graphics cards over any OOP extension that does not sound like it increases performance. At the end of the day the language is a tool to squeeze out as much performance as possible from the hardware. What else do you want to do with it?

iP* programming tip #2

Today's tip will deal with the setup of your development environment. As a Mac newbie I was having a hard time to get used to the environment more than a year ago -when I started Mac development- and I still suffer under windowitis. I know that Apple does not want to copy MS's Visual Studio but most people who are used to work with Visual Studio would put that on their holiday wishlist :-)
Here are a few starting points to get used to the environment:

To work in one window only, use the "All-in-One" mode if you miss Visual Studio (http://developer.apple.com/tools/xcode/newinxcode23.html)
You have to load Xcode, but not load any projects. Go straight to Preferences/General Tab, and you'll see "Layout: Default". Switch that to "Layout: All-In-One". Click OK. Then, you can load your projects.
Apple+tilde – cycle between windows in the foreground
Apple+w - closes the front window in most apps
Apple+tab – cycle through windows

Please note that Apple did a revolutionary thing on the new MacBook Pro's (probably also the new MacBook's) ... there is no Apple key anymore. It is now called command key.

For everyone who prefers hotkeys to start applications you might check out Quicksilver. Automatically hiding and showing the Dock gives you more workspace. If you are giving presentations about your work, check out Stage Hand for the iPod touch / iPhone.

For reference you should have POWERVR SDK for Linux downloaded. It is a very helpful reference regarding the MBX chip in your target platforms.

Not very game or graphics programming related but very helpful is Erica Sadun's book "The iPhone Developer's Cookbook". She does not waste your time with details you are not interested in and comes straight to the point. Just reading the first section of the book is already pretty cool.
You want to have this book if you want to dive into any form of Cocoa interface programming.
The last book I want to recommend is Andrew M. Duncan's "Objective-C Pocket Reference". I have this usually lying on my table if I stumble over Objective-C syntax. If you are a C/C++ programmer you probably do not need more than this. There are also Objective-C tutorials on the iPhone developer website and on the general Apple website.

If you have any other tip that I can add to the website I would mention it with your name.

Update: PpluX send me the following link:

http://www.pplux.com/2008/11/24/the-return-to-the-dark-side/

He describes here how he disables deep sleep mode and modifies the usage of spaces.

The next iP* programming tip will be more programming related ... I promise :-)

Sunday, November 30, 2008

iP* programming tip #1

This is the first of a series of iPhone / iPod programming tips.
Starting iPhone development requires first the knowledge of the underlying hardware and what it can do for you. Here are the latest hardware specs I am aware of (a rumour was talking about iPods that run the CPU with 532 MHz ... I haven't found any evidence for this).

GPU: PowerVR MBXLite with VGPLite with 103 Mhz
~DX8 hardware with vs_1_1 and ps_1_1 functionality
Vertex shader is not exposed
Pixel shader is programmed with texture combiners
16 MB VRAM – not mentioned anywhere
CPU: ARM 1176 with 412 Mhz (can do 600 Mhz)
VFP unit 128-bit Multimedia unit ~= SIMD unit
128 MB RAM; only 24 MB for apps allowed
320x480 px at 163 ppi screen
LIS302DL, a 3-axis accelerometer with 412 Mhz (?) update rate
Multi-Touch: up to five fingers
PVRTC texture compression: color map 2-bit per pixel and normal map 4-bit per-pixel

The interesting part is that the CPU can do up to 600 Mhz, so it would be possible to increase the performance here in the future.
I wonder how the 16 MB VRAM are handled. I assume that this is the place where the VBO and textures are stored. Regarding the max size of apps of 24 MB; I wonder what happens if an application generates geometry and textures dynamically ... when does the sandbox of the iPhone / iPod touch stop the application. I did not find any evidence for this.

WARP - Running DX10 and DX11 Games on CPUs

As a MVP I was involved into testing this new Windows Advanced Rasterization Platform. They just published the first numbers

http://msdn.microsoft.com/en-us/library/dd285359.aspx

Running Crysis on a 8 core CPU with a resolution of 800x600 at 7.2 fps is an achievement. If this would be hand-optimized very well, it would be the best way to write code for. 4 - 8 cores will be a common target platform in the next two years. Because it can be switched off if there is a GPU, this is a perfect target for game developers. What this means is that you can write a game with the DirectX 10 API and not only target all the GPUs out there but also machines without GPU ... this is one of the best developments for the PC market since a long time. I am excited!

The other interesting consequence from this development is: if INTELs "Bread & Butter" chips run games with the most important game API, it would be a good idea if INTEL would put a bunch of engineers behind this and optimize WARP (in case they haven't already done so). This is the big game market consisting of games like "The Sims" and "World of Warcraft" and similar games that we are talking about here. The high-end PC gaming market is much smaller.

Thursday, November 6, 2008

iPhone ARM VFP code

The iPhone has a kind of SIMD unit. It is called VFP unit and it is pretty hard to figure out how to program it. Here is a place where you can find soon lots of VFP asm code.

http://code.google.com/p/vfpmathlibrary/

With help from Matthias Grundmann I wrote my first piece of VFP code. Here it is:

void MatrixMultiplyF( MATRIXf &mOut, const MATRIXf &mA, const MATRIXf &mB) { #if 0 MATRIXf mRet; /* Perform calculation on a dummy matrix (mRet) */ mRet.f[ 0] = mA.f[ 0]*mB.f[ 0] + mA.f[ 1]*mB.f[ 4] + mA.f[ 2]*mB.f[ 8] + mA.f[ 3]*mB.f[12]; mRet.f[ 1] = mA.f[ 0]*mB.f[ 1] + mA.f[ 1]*mB.f[ 5] + mA.f[ 2]*mB.f[ 9] + mA.f[ 3]*mB.f[13]; mRet.f[ 2] = mA.f[ 0]*mB.f[ 2] + mA.f[ 1]*mB.f[ 6] + mA.f[ 2]*mB.f[10] + mA.f[ 3]*mB.f[14]; mRet.f[ 3] = mA.f[ 0]*mB.f[ 3] + mA.f[ 1]*mB.f[ 7] + mA.f[ 2]*mB.f[11] + mA.f[ 3]*mB.f[15]; mRet.f[ 4] = mA.f[ 4]*mB.f[ 0] + mA.f[ 5]*mB.f[ 4] + mA.f[ 6]*mB.f[ 8] + mA.f[ 7]*mB.f[12]; mRet.f[ 5] = mA.f[ 4]*mB.f[ 1] + mA.f[ 5]*mB.f[ 5] + mA.f[ 6]*mB.f[ 9] + mA.f[ 7]*mB.f[13]; mRet.f[ 6] = mA.f[ 4]*mB.f[ 2] + mA.f[ 5]*mB.f[ 6] + mA.f[ 6]*mB.f[10] + mA.f[ 7]*mB.f[14]; mRet.f[ 7] = mA.f[ 4]*mB.f[ 3] + mA.f[ 5]*mB.f[ 7] + mA.f[ 6]*mB.f[11] + mA.f[ 7]*mB.f[15]; mRet.f[ 8] = mA.f[ 8]*mB.f[ 0] + mA.f[ 9]*mB.f[ 4] + mA.f[10]*mB.f[ 8] + mA.f[11]*mB.f[12]; mRet.f[ 9] = mA.f[ 8]*mB.f[ 1] + mA.f[ 9]*mB.f[ 5] + mA.f[10]*mB.f[ 9] + mA.f[11]*mB.f[13]; mRet.f[10] = mA.f[ 8]*mB.f[ 2] + mA.f[ 9]*mB.f[ 6] + mA.f[10]*mB.f[10] + mA.f[11]*mB.f[14]; mRet.f[11] = mA.f[ 8]*mB.f[ 3] + mA.f[ 9]*mB.f[ 7] + mA.f[10]*mB.f[11] + mA.f[11]*mB.f[15]; mRet.f[12] = mA.f[12]*mB.f[ 0] + mA.f[13]*mB.f[ 4] + mA.f[14]*mB.f[ 8] + mA.f[15]*mB.f[12]; mRet.f[13] = mA.f[12]*mB.f[ 1] + mA.f[13]*mB.f[ 5] + mA.f[14]*mB.f[ 9] + mA.f[15]*mB.f[13]; mRet.f[14] = mA.f[12]*mB.f[ 2] + mA.f[13]*mB.f[ 6] + mA.f[14]*mB.f[10] + mA.f[15]*mB.f[14]; mRet.f[15] = mA.f[12]*mB.f[ 3] + mA.f[13]*mB.f[ 7] + mA.f[14]*mB.f[11] + mA.f[15]*mB.f[15]; /* Copy result in pResultMatrix */ mOut = mRet; #else #if (TARGET_CPU_ARM) const float* src_ptr1 = &mA.f[0]; const float* src_ptr2 = &mB.f[0]; float* dst_ptr = &mOut.f[0]; asm volatile( // switch on ARM mode // involves uncoditional jump and mode switch (opcode bx) // the lowest bit in the address signals whether are (bit cleared) // or tumb should be selected (bit set) ".align 4 \n\t" "mov r0, pc \n\t" "bx r0 \n\t" ".arm \n\t" // set vector length to 4 // example fadds s8, s8, s16 means that the content s8 - s11 // is added to s16 - s19 and stored in s8 - s11 "fmrx r0, fpscr \n\t" // loads fpscr status reg to r4 "bic r0, r0, #0x00370000 \n\t" // bit clear stride and length "orr r0, r0, #0x00030000 \n\t" // set length to 4 (11) "fmxr fpscr, r0 \n\t" // upload r4 to fpscr // Note: this stalls the FPU // result[0][1][2][3] = mA.f[0][0][0][0] * mB.f[0][1][2][3] // result[0][1][2][3] = result + mA.f[1][1][1][1] * mB.f[4][5][6][7] // result[0][1][2][3] = result + mA.f[2][2][2][2] * mB.f[8][9][10][11] // result[0][1][2][3] = result + mA.f[3][3][3][3] * mB.f[12][13][14][15] // s0 - s31 // if Fd == s0 - s7 -> treated as scalar all the other treated like vector // load the whole matrix into memory - transposed -> second operand first "fldmias %2, {s8-s23} \n\t" // load first column to scalar bank "fldmias %1!, {s0 - s3} \n\t" // first column times matrix "fmuls s24, s8, s0 \n\t" "fmacs s24, s12, s1 \n\t" "fmacs s24, s16, s2 \n\t" "fmacs s24, s20, s3 \n\t" // save first column "fstmias %0!, {s24-s27} \n\t" // load second column to scalar bank "fldmias %1!, {s4-s7} \n\t" // second column times matrix "fmuls s28, s8, s4 \n\t" "fmacs s28, s12, s5 \n\t" "fmacs s28, s16, s6 \n\t" "fmacs s28, s20, s7 \n\t" // save second column "fstmias %0!, {s28-s31) \n\t" // load third column to scalar bank "fldmias %1!, {s0-s3} \n\t" // third column times matrix "fmuls s24, s8, s0 \n\t" "fmacs s24, s12, s1 \n\t" "fmacs s24, s16, s2 \n\t" "fmacs s24, s20, s3 \n\t" // save third column "fstmias %0!, {s24-s27} \n\t" // load fourth column to scalar bank "fldmias %1!, {s4-s7} \n\t" // fourth column times matrix "fmuls s28, s8, s4 \n\t" "fmacs s28, s12, s5 \n\t" "fmacs s28, s16, s6 \n\t" "fmacs s28, s20, s7 \n\t" // save fourth column "fstmias %0!, {s28-s31} \n\t" // reset vector length to 1 "fmrx r0, fpscr \n\t" // loads fpscr status reg to r4 "bic r0, r0, #0x00370000 \n\t" // bit clear stride and length "fmxr fpscr, r0 \n\t" // upload r4 to fpscr // switch to tumb mode // lower bit of destination is set to 1 "add r0, pc, #1 \n\t" "bx r0 \n\t" ".thumb \n\t" // binds variables to registers : "=r" (dst_ptr), "=r" (src_ptr1), "=r" (src_ptr2) : "0" (dst_ptr), "1" (src_ptr1), "2" (src_ptr2) : "r0" ); #endif #endif }

Monday, October 20, 2008

Midnight Club: Los Angeles

Tomorrow is the day. Midnight Club Los Angeles will launch tomorrow. This is the third game I worked on for Rockstar. If you are into racing games you need to check it out :-)

Thursday, October 16, 2008

Hardware GPU / SPU / CPU

I follow all the discussions about the future of game hardware with talks about Larrabee and GPUs and the death of 3D APIs and -depending on the view point- different hardware designs.

The thing I figure is that all this is quite interesting and inspiring but our cycles of change in computer graphics and graphics programming are pretty long. Most of the stuff we do is based on research papers that were released more than 30 years ago and written on typewriters.

Why should any new piece of hardware change all this in a very short amount of time?

There is a game market out there that grows in double digit percentage numbers on all kind of hardware. How much of this market and its growth would be influenced by any new hardware?

Some of the best distributed game hardware is pretty old and following most standards, sub-performant. Nevertheless it offers entertainment that people enjoy.

So how important is it if we program a CPU/SPU/GPU or whatever we call the next thing. Give me a washing machine with a display and I make an entertainment machine with robo rumble out of it.

Thursday, October 2, 2008

S3 Graphics Chrome 440 GTX

I bought a new S3 Chrome 440 GTX in the S3 online store. I wanted to know how this card is doing, especially because it is DirectX 10.1 compatible. The other reason why I bought it was that it has a HDMI output. Just putting it into my desktop machine was interesting. I removed a 8800 GTS which was really heavy and than this card that was so small and didn't even need an extra power supply. It looks like some of my graphics cards from the end of the 90th when they started to put fans on the cards. With the small fan it should be possible to passively cool that card easily.

I just went through the DirectX 10 SDK examples. Motion Blur is about 5.8 fps and NBodyGravity is about 1.8 fps. The instancing example runs with 11.90 fps. I use the VISTA 64-bit beta drivers 7.15.12.0217-18.05.03. The other examples run fast enough. The CPU does not seem to become overly busy.
Just saw that there is a newer driver. The latest driver which is WHQL'ed has the version number 248. The motion blur example runs with 6.3 fps with some artefacts (the beta driver had that as well), Instancing ran with 11.77 fps and the NBodyGravity example with 1.83 fps ... probably not an accurate way to measure this stuff at all but at least it gives a rough idea.

The integrated INTEL chip 4500 MHD in my notebook is slower than this but then it supports at least DX10 and the notebook is super light :-) ... for development it just depends for me on the feature support (Most of the time I prototype effects on PCs).
While playing around with the two chipsets I just found out that the mobile INTEL chip also runs the new DirectX 10.1 SDK example Depth of Field with more than 20 fps. This is quite impressive. The Chrome 440 GTX is running this example with more than 100 fps. The new Raycast Terrain example runs with 19.6 fps on the Chrome and with less 7.6 fps on the Mobile INTEL chip set. The example that is not running on the Mobile INTEL chip is the ProceduralMaterial example. It runs with less than 1 fps on the Chrome 440 GTX.
Nevertheless it seems like both companies did their homework with the DirectX SDK.
So I just ran a bunch of ShaderX7 example programs against the cards. While the INTEL Mobile chip shows errors in some of the DirectX9 examples and crashes in some of the DirectX 10 stuff, the Chrome seems to even take the DirectX 10.1 examples that I have, that usually only run on ATI hardware ... nice!
One thing that I haven't thought of is GLSL support. I thought that only ATI and NVIDIA have GLSL support but S3 seems to have it as well. INTEL's mobile chip does not have it so ...

I will try out the 3D Futuremark Vantage Benchmark. It seems a Chrome 400 Series is in there with a score of 222. Probably not too bad considering the fact that they probably not pay Futuremark for being a member of their program.
Update October 4th: the S3 Chrome 440 GTX did 340 as the Graphics score in the trial version of the 3D Mark Vantage.

Wednesday, October 1, 2008

Old Interview

Just bumped into an old interview I gave to Gamedev.net. I still think everything in there is valid

http://www.gamedev.net/reference/business/features/wolfgang/wolfgang.asp

While reading it I thought it is kind of boring. Many of my answers are so obvious ... maybe this is just my perception. How can you make it into the game industry? Probably on the same way you can make it into any industry. Lots of education or luck or just being at the right time at the right place and then being creative, a good thinker etc.. There is no magic trick I think ... it all comes with lots of sweat.

Tuesday, September 30, 2008

64-bit VISTA Tricks

I got a new notebook today with 64-bit VISTA pre-installed. It will replace a Desktop that had 64-bit VISTA on there. My friend Andy Firth provided me with the following tricks to make my life easier (it has a 64 GB solid state in there, so no hard-drive optimizations):

Switch Off User Account Control

This gets rid of the on-going "are you sure" questions.

Go to Control Panel. Click on User Account and switch it off.

Disable Superfetch

Press Windows key + R. Start services.msc and scroll down until you find Superfetch. Double click on it and change the startup type to Disabled.

Sunday, September 28, 2008

Light Pre-Pass: More Blood

I spent some more time with the Light Pre-Pass renderer. Here are my assumptions:

N.H^n = (N.L * N.H^n * Att) / (N.L * Att)

This division happens in the forward rendering path. The light source has its own shininess value in there == the power n value. With the specular component extracted, I can apply the material shininess value like this.

(N.H^n)^nm

Then I can re-construct the Blinn-Phong lighting equation. The data stored in the Light Buffer is treated like one light source. As a reminder, the first three channels of the light buffer hold:

N.L * Att * DiffuseColor

Color = Ambient + (LightBuffer.rgb * MatDiffInt) + MatSpecInt * (N.H^n)^mn * N.L * Att

So how could I do this :-)

N.H^n = (N.L * N.H^n * Att) / (N.L * Att)

N.L * Att is not in any channel of the Light buffer. How can I get this? The trick here is to convert the first three channels of the Light Buffer to luminance. The value should be pretty close to N.L * Att.
This also opens up a bunch of ideas for different materials. Every time you need the N.L * Att term you replace it with luminance. This should give you a wide range of materials.
The results I get are very exciting. Here is a list of advantages over a Deferred Renderer:
- less cost per light (you calculate much less in the Light pass)
- easier MSAA
- more material variety
- less read memory bandwidth -> fetches only two instead of the four textures it takes in a Deferred Renderer
- runs on hardware without ps_3_0 and MRT -> runs on DX8.1 hardware

Sunday, September 21, 2008

Shader Workflow - Why Shader Generators are Bad

[quote]As far as I can tell from this discussion, no one has really proposed an alternative to shader permutations, merely they've been proposing ways of managing those permutations.[/quote]

If you define shader permutations as having lots of small differences but using the same code than you have to live with the fact that whatever is send to the hardware is a full-blown shader, even if you have exactly the same skinning code in every other shader.
So the end result is always the same ... whatever you do on the level above that.
What I describe is a practical approach to handle shaders with a high amount of material variety and a good workflow.
Shaders are some of the most expensive assets in production value and time spend of the programming team. They need to be the highest optimized piece of code we have, because it is much harder to squeeze out performance from a GPU than from a CPU.
Shader generators or a material editor (.. or however you call it) are not an appropriate way to generate or handle shaders because they are hard to maintain, offer not enough material variety and are not very efficient because it is hard to hand optimize code that is generated on the fly.
This is why developers do not use them and do not want to use them. It is possible that they play a role in indie or non-profit development so because those teams are money and time constraint and do not have to compete in the AAA sector.
In general the basic mistake people make that think that ueber-shaders or material editors or shader generators would make sense is that they do not understand how to program a graphics card. They assume it would be similar to programming a CPU and therefore think they could generate code for those cards.
It would make more sense to generate code on the fly for CPUs (... which also happens in the graphics card drivers) and at other places (real-time assemblers) than for GPUs because GPUs do not have anything close to linear performance behaviours. The difference between a performance hotspot and a point where you made something wrong can be 1:1000 in time (following a presentation from Matthias Wloka). You hand optimize shaders to hit those hotspots and the way you do it is that you analyze the results provided by PIX and other tools to find out where the performance hotspot of the shader is.

Thursday, September 18, 2008

ARM VFP ASM development

Following Matthias Grundmann's invitation to join forces I setup a Google code repository for this:

here

The idea is to have a math library that is optimized for the VFP unit of an ARM processor. This should be useful on the iPhone / iPod touch.

Friday, September 12, 2008

More Mobile Development

Now that I had so much fun with the iPhone I am thinking about new challenges in the mobile phone development area. The Touch HD looks like a cool target. It has a DX8-class ATI graphics card in there. Probably on par with the iPhone graphics card and you can program it in C/C++ which is important for the performance.
Depending on how easy it will be to get Oolong running on this I will extend Oolong to support this platform as well.

Wednesday, September 10, 2008

Shader Workflow

I just posted a forum message about what I consider an ideal shader workflow in a team. I thought I share it here:

Setting up a good shader workflow is easy. You just setup a folder that is called shaderlib, then you setup a folder that is called shader. In shaderlib there are files like lighting.fxh, utility.fxh, normals.fxh, skinning.fxh etc. and in the directory shader there are files like metal.fx, skin.fx, stone.fx, eyelashes.fx, eyes.fx. In each of those *.fx files there is a technique for whatever special state you need. You might have in there techniques like lit, depthwrite etc..
All the "intelligence" is in the shaderlib directory in the *.fxh files. The fx files just stitch together function calls. The HLSL compiler resolves those function calls by inlining the code.
So it is easy to just send someone the shaderlib directory with all the files in there and share your shader code this way.
In the lighting.fxh include file you will have all kinds of lighting models like Ashikhmin-Shirley, Cook-Torrance or Oren-Nayar and obviously Blinn-Phong or just a different BRDF that can mimic a certain material especially good. In normals.fxh you have routines that can fetch normals in different ways and unpack them. Obviously all the DXT5 and DXT1 tricks are in there but also routines that let you fetch height data to generate normals from it. In utility.fxh you have support for different color spaces, special optimizations for different platforms, like special texture fetches etc. In skinning.fxh you have all code related to skinning and animation ... etc.
If you give this library to a graphics programmer he obviously has to put together the shader on his own but he can start looking at what is requested and use different approaches to see what fits best for the job. He does not have to come up with ways on how to generate a normal from height or color data or how to deal with different color spaces.
For a good, efficient and high quality workflow in a game team, this is what you want.

Tuesday, September 9, 2008

Calculating Screen-Space Texture Coordinates for the 2D Projection of a Volume

Calculating screen space texture coordinates for the 2D projection of a volume is more complicated than for an already transformed full-screen quad. Here is a step-by-step approach on how to achieve this:

1. Transforming position into projection space is done in the vertex shader by multiplying the concatenated World-View-Projection matrix.

2. The Direct3D run-time will now divide those values by Z; stored in the W component. The resulting position is then considered in clipping space, where the x and y value is clipped to the [-1.0, 1.0] range.

xclip = xproj / wproj
yclip = yproj / wproj

3. Then the Direct3D run-time transforms position into viewport space from the value range [-1.0, 1.0] to the range [0.0, ScreenWidth/ScreenHeight].

xviewport = xclipspace * ScreenWidth / 2 + ScreenWidth / 2
yviewport = -yclipspace * ScreenHeight / 2 + ScreenHeight / 2

This can be simplified to:

xviewport = (xclipspace + 1.0) * ScreenWidth / 2
yviewport = (1.0 - yclipspace ) * ScreenHeight / 2

The result represents the position on the screen. The y component need to be inverted because in world / view / projection space it increases in the opposite direction than in screen coordinates.

4. Because the result should be in texture space and not in screen space, the coordinates need to be transformed from clipping space to texture space. In other words from the range [-1.0, 1.0] to the range [0.0, 1.0].

u = (xclipspace + 1.0) * 1 / 2
v = (1.0 - yclipspace ) * 1 / 2

5. Due to the texturing algorithm used by Direct3D, we need to adjust texture coordinates by half a texel:

u = (xclipspace + 1.0) * ½ + ½ * TargetWidth
v = (1.0 - yclipspace ) * ½ + ½ * TargetHeight

Plugging in the x and y clipspace coordinates results from step 2:

u = (xproj / wproj + 1.0) * ½ + ½ * TargetWidth
v = (1.0 - yproj / wproj ) * ½ + ½ * TargetHeight

6. Because the final calculation of this equation should happen in the vertex shader results will be send down through the texture coordinate interpolator registers. Interpolating 1/ wproj is not the same as 1 / interpolated wproj. Therefore the term 1/ wproj needs to be extracted and applied in the pixel shader.

u = 1/ wproj * ((xproj + wproj) * ½ + ½ * TargetWidth * wproj)
v = 1/ wproj * ((wproj - yproj) * ½ + ½ * TargetHeight* wproj)

The vertex shader source code looks like this:

Float4 vPos = float4(0.5 * (float2(p.x + p.w, p.w – p.y) + p.w * inScreenDim.xy), pos.zw)

The equation without the half pixel offset would start at No. 4 like this:

u = (xclipspace + 1.0) * 1 / 2
v = (1.0 - yclipspace ) * 1 / 2

Plugging in the x and y clipspace coordinates results from step 2:

u = (xproj / wproj + 1.0) * ½
v = (1.0 - yproj / wproj ) * ½

Moving 1 / wproj to the front leads to:

u = 1/ wproj * ((xproj + wproj) * ½)
v = 1/ wproj * ((wproj - yproj) * ½)

Because the pixel shader is doing the 1 / wproj, this would lead to the following vertex shader code:

Float4 vPos = float4(0.5 * (float2(p.x + p.w, p.w – p.y)), pos.zw)

All this is based on a response of mikaelc in the following thread:

Lighting in a Deferred Renderer and a response by Frank Puig Placeres in the following thread:

Reconstructing Position from Depth Data

Sunday, September 7, 2008

Gauss Filter Kernel

Just found a good tutorial on how to setup a Gauss filter kernel here:

OpenGL Bloom Tutorial

The interesting part is that he shows a way on how to generate the offset values and he also mentions a trick that I use for a long time. He reduces the filter kernel size by utilizing the hardware linear filtering. So he can go down from 5 to 3 taps. I usually use bilinear filtering to go down from 9 to 4 taps or 25 to 16 taps (with non-separable filter kernels) ... you got the idea.

Eric Haines just reminded me of the fact that this is also described in ShaderX2 - Tips and Tricks on page 451. You can find the -now free- book at

http://www.gamedev.net/reference/programming/features/shaderx2/Tips_and_Tricks_with_DirectX_9.pdf

BTW: Eric Haines contacted all the authors of this book to get permission to make it "open source". I would like to thank him for this.
Check out his blog at

http://www.realtimerendering.com/blog/

Monday, August 18, 2008

Beyond Programmable Shading

I was on SIGGRAPH to attend the "Beyond Programmable Shading" day. I spent the whole morning there and left during the last talk in the morning.
Here is the URL for the Larrabee day:

http://s08.idav.ucdavis.edu/

The talks are quite inspiring. I was hoping to see actual Larrabee hardware in action but they did not have any.
I liked Chas Boyd's DirectX 11 talk because he made it clear that there are different software designs for different applications and having looked into DirectX 11 now for a while it seems like there is a great API coming up soon that solves some of the outstanding issues we had with DirectX9 (DirectX 10 will be probably skipped by many in the industry).

The other thing that impressed me is AMD's CAL. The source code looks very elegant for the amount of performance you can unlock with it. Together with Brook+ it lets you control a huge number of cards. It seems like Cuda will be able to easier handle many GPUs at once soon too. PostFX are a good candidate for those APIs. CAL and CUDA can live in harmony with DirectX9/10 and DirectX 11 will even have a compute shader model that is the equivalent to CAL and CUDA. Compute shaders are written in HLSL … so a consistent environment.

Thursday, July 31, 2008

ARM Assembly

So I decided to increase my relationship to iPhone programming a bit and bought an ARM assembly book to learn how to program ARM assembly. The target is to figure out how to program the MMX like instruction set that comes with the processor. Then I would create a vectorized math library ... let's see how this goes.

Tuesday, July 29, 2008

PostFX - The Nx-Gen Approach

More than three years ago I wrote a PostFX pipeline (with a large number of effects) that I constantly improved up until the beginning of last year (www.coretechniques.info .. look for the outline of algorithms in the PostFX talk from 2007). Now it shipped in a couple of games. So what is nx-gen here?
On my main target platforms (360 and PS3) it will be hard to squeeze out more performance. There is probably lots of room in everything related to HDR but overall I wouldn't expect any fundamental changes. The main challenge with the pipeline was not on a technical level, but to explain to the artists how they can use it. Especially the tone mapping functionality was hard to explain and it was also hard to give them a starting point where they can work from.
So I am thinking about making it easier for the artists to use this pipeline. The main idea is to follow the camera paradigm. Most of the effects (HDR, Depth of Field, Motion Blur, color filters) of the pipeline are expected to mimic a real-world camera so why not make it use like a real-world camera?
The idea is to only expose functionality that is usually exposed by a camera and name all the sliders accordingly. Furthermore there will be different camera models with different basic properties as a starting point for the artists. It should also be possible to just switch between those on the fly. So a whole group of properties changes on the flip of a switch. This should make it easier to use cameras for cut scenes etc.

iPhone development - Oolong Engine

Just read that John Carmack likes the iPhone as a dev platform. That reminds me of the fact how I started my iPhone engine Oolong Engine in September 2007. Initially I wanted to do some development for the Dreamcast. I got a Dreamcast devkit, a CD burner and all the manuals from friends to start with this. My idea behind all this was to do graphics demos on this platform because I was looking for a new challenge. When I had all the pieces together to start my Dreamcast graphics demo career, a friend told me the specs of the iPhone ... and it became obvious that this would be even a better target :-) ... at the time everyone assumed that Apple will never allow to program for this platform. This was exactly what I was looking for. What can be better than a restricted platform that can't be used by everyone that I can even take with me and show it to the geekiest of my friends :-)
With some intial help from a friend (thank you Andrew :-)) I wrote the initial version of the Oolong engine and had lots of fun figuring out what is possible on the platform and what not. Then at some point Steve Jobs surprised us with the announcement that there will be an SDK and judging from Apple's history I was believing that they probably won't allow to develop games for the platform.
So now that we have an official SDK I am surprised how my initial small scale geek project turned out :-) ... suddenly I am the maintainer of a small little engine that is used in several productions.

Light Pre-Pass - First Blood :-)

I was looking for a simple way to deal with different specular values coming from different materials. It seems that one of the most obvious ways is the most efficient way to deal with this. If you are used to start with a math equation first -as I do- it is not easy to see this solution.
To recap: what ends up in the four channels of the light buffer for a point light is the following:

Diffuse.r * N.L * Att | Diffuse.g * N.L * Att | Diffuse.b * N.L * Att | N.H^n * N.L * Att

So n represents the shininess value of the light source. My original idea to apply now different specular values in the forward rendering pass later was to divide by N.L * Att like this:

(N.H^n * N.L * Att) \ (N.L * Att)

This way I would have re-constructed the N.H^n term and I could easily do something like this:

(N.H^n)^mn

where mn represents the material specular. Unfortunately this requires to store the N.L * Att term in a separate render target channel. The more obvious way to deal with it is to just do this:

(N.H^n * N.L * Att)^mn

... maybe not quite right but it looks good enough for what I would want to achieve.

Friday, June 13, 2008

Stable Cascaded Shadow Maps

I really like Michal Valient's article "Stable Cascaded Shadow Maps". It is a very practical approach to make Cascaded Shadow Maps more stable.
What I also like about it is the ShaderX idea. I wrote an article in ShaderX5 describing a first implementation (.... btw. I re-wrote that three times since than), Michal picks up from there and brings it to the next level.
There will be now a ShaderX7 article in which I will describe a slight improvement to Michal's approach. Michal picks the right shadow map with a rather cool trick. Mine is a bit different but it might be more efficient. So what I do to pick the right map is send down the sphere that is constructed for the light view frustum. I then check if the pixel is in the sphere. If it is I pick that shadow map, if it isn't I go to the next sphere. I also early out if it is not in a sphere by returning white.
At first sight it does not look like a trick but if you think about the spheres lined up along the view frustum and the way they intersect, it is actually pretty efficient and fast.
On my target platforms, especially on the one that Michal likes a lot, this makes a difference.

Thursday, June 12, 2008

Screen-Space Global Illumination

I am thinking a lot about Crytek's Screen-Space Ambient Occlusion (SSAO) and the idea of extending this into a global illumination term.
When combined with a Light Pre-Pass renderer, there is the light buffer with all the N.L * Att values that can be used as intensity and then there is the end-result of opaque rendering pass and we have a normal map lying around. Doing the light bounce along the normal and using the N.L*Att entry in the light buffer as intensity should do the trick. The way the values are fetched would be similar to SSAO.

Wednesday, May 28, 2008

UCSD Talk on Light Pre-Pass Renderer

So the Light Pre-Pass renderer had its first public performance :-) ... I talked yesterday at UCSD about this new renderer design. There will be a ShaderX7 article as well.

Pat Wilson from Garagegames is sharing his findings with me. He came up with an interesting way to store LUV colors.
Renaldas Zioma told me that a similar idea was used in Battlezone 2.

This is exciting :-)

The link to the slides is at the end of the March 16th post.

Thursday, May 15, 2008

DX 10 Graphics demo skeleton

I setup a google code website with one of my small little side projects that I worked on more than a year ago. To compete in graphics demo competitions you need a very small exe. I wanted to figure out how to do this with DX10 and this is the result :-) ... follow the link

http://code.google.com/p/graphicsdemoskeleton/

What is it: it is just a minimum skeleton to start creating your own small-size apps with DX10. At some point I had a particle system running in 1.5kb this way (that was with DX9). If you think about the concept of small exes there is one interesting thing I figured out. When I use DX9 and I compile HLSL shader code to a header file and include it to use it, it is smaller than the equivalent C code. So what I was thinking was: hey let's write a math library in HLSL and use the CPU only with the stub code to launch everything and let it run on the GPU :-)

Tuesday, April 29, 2008

Today is the day: GTA IV is released

I am really excited about this. This is the second game I worked on for Rockstar and it is finally coming out ...

Monday, April 21, 2008

RGB -> XYZ conversion

Here is the official way to do it:
http://www.w3.org/Graphics/Color/sRGB

They use

// 0.4125 0.3576 0.1805
// 0.2126 0.7152 0.0722
// 0.0193 0.1192 0.9505

to convert from RGB to XYZ and

// 3.2410 -1.5374 -0.4986
// -0.9692 1.8760 0.0416
// 0.0556 -0.2040 1.0570

to convert back.

Here is how I do it:
const FLOAT3x3 RGB2XYZ = {0.5141364, 0.3238786, 0.16036376,
0.265068, 0.67023428, 0.06409157,
0.0241188, 0.1228178, 0.84442666};

Here is how I convert back:
const float3x3 XYZ2RGB = { 2.5651,-1.1665,-0.3986,
-1.0217, 1.9777, 0.0439,
0.0753, -0.2543, 1.1892};

You should definitely try out different ways to do this :-)

Monday, April 14, 2008

Ported my iPhone Engine to OS 2.0

I spend three days last week to port the Oolong engine over to the latest iPhone / iPod touch OS.

http://www.oolongengine.com

My main development device is still a iPod touch because I am worried about not being able to make phone calls anymore.

Tuesday, April 8, 2008

Accepted for the iPhone Developer Program

Whooo I am finally accepted! I have access to the iPhone developer program. Now I can start to port my Oolong Engine over :-)

Tuesday, March 25, 2008

Some Great Links

I just came accross some cool links today while looking for material that shows multi-core programming and how to generate an indexed triangle list from a triangle soup.
I did not know that you can setup a virtual Cell chip on your PC. This course looks interesting:

http://www.cc.gatech.edu/~bader/CellProgramming.html

John Ratcliff's Code Suppository is a great place to find fantastic code snippets:

http://www.codesuppository.blogspot.com/

Here is a great paper to help with first steps in multi-core programming:

http://www.digra.org/dl/db/06278.34239.pdf

A general graphics programming course is available here:

http://users.ece.gatech.edu/~lanterma/mpg/

I will provide this URL to people who ask me about how to learn graphics programming.