Sunday, January 25, 2009

iP* programming tip #9

This issue of the iPhone / iPod Touch programmig tips series focuses on some aspects of VFP assembly programming. My friend Noel Llopis brought an oversight in the VFP math library to my attention, that I still need to fix. So I start with the description of the problem here and promise to fix it soon in the VFP library :-)
First let's start with the references. My friend Aaron Leiby has a blog entry on how to start programming the VFP unit here:

A typical inline assembly template might look like this:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
The last two lines of code hold the input and output operands and the so called clobbers, that are used to inform the compiler on which registers are used.
Here is a simple GCC assembly example -that doesn't use VFP assembly- that shows how the input and output operands are specified:

asm("mov %0, %1, ror #1" : "=r" (result) " : "r" (value));

The idea is that "=r" holds the result and "r" is the input. %0 refers to "=r" and %1 refers to "r".
Each operand is referenced by numbers. The first output operand is numbered 0, continuing in increasing order. There is a max number of operands ... I don't know what the max number is for the iPhone platform.

Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. So GCC will not assume that the values it loads into these registers will be valid.
In other words a clobber list tells the compiler which registers were used but not passed as operands. If a register is used as a scratch register this register need to be mentioned in there. Here is an example:
asm volatile("ands    r3, %1, #3"     "\n\t"
"eor %0, %0, r3" "\n\t"
"addne %0, #4"
: "=r" (len)
: "0" (len)
: "cc", "r3"
r3 is used as a scratch register here. It seems the cc pseudo register tells the compiler about the clobber list. If the asm code changes memory the "memory" pseudo register informs the compiler about this.

asm volatile("ldr     %0, [%1]"         "\n\t"
"str %2, [%1, #4]" "\n\t"
: "=&r" (rdv)
: "r" (&table), "r" (wdv)
: "memory"
This special clobber informs the compiler that the assembler code may modify any memory location. Btw. the volatile attribute instructs the compiler not to optimize your assembler code.

If you want to add something to this tip ... please do not hesitate to write it in the comment line. I will add it then with your name.

Friday, January 9, 2009

Partial Derivative Normal Maps

To make my collection of normal map techniques more complete on this blog I also have to mention a special normal mapping technique that Insomniac's Mike Acton brought to my attention a long time ago (I wasn't sure if I am allowed to publish it ... but now they have slides on their website).
The idea is to store the paritial derivate of the normal in two channels of the map like this

dx = (-nx/nz);
dy = (-ny/nz);

Then you can reconstruct the normal like this:

nx = -dx;
ny = -dy;
nz = 1;

The advantage is that you do not have to reconstruct Z, so you can skip one instruction in each pixel shader that uses normal maps.
This is especially cool on the PS3 while on the XBOX 360 you can also create a custom texture format to let the texture fetch unit do the scale and bias and save a cycle there.
More details can be found at

Look for Partial Derivative Normal Maps.

Sunday, January 4, 2009

Handling Scene Geometry

I recently bumped into a post by Roderic Vicaire on the forums. It is here.
Obviously there is no generic solution to handle all scene geometry in the same way but depending on the game his naming conventions make a lot of sense (read "Scenegraphs say no" in Tom Forsyth's blog).
- SpatialGraph: used for finding out what is visible and should be drawn. Should make culling fast
- SceneTree: used for hierarchical animations, e.g. skeletal animation or a sword held in a character's hand
- RenderQueue: is filled by the SpatialGraph. Renders visible stuff fast. It sorts sub arrays per key, each key holding data such as depth, shaderID etc. (see Christer Ericson's blog entry "Sort based-draw call bucketing" for this)