Sunday, January 25, 2009

iP* programming tip #9

This issue of the iPhone / iPod Touch programmig tips series focuses on some aspects of VFP assembly programming. My friend Noel Llopis brought an oversight in the VFP math library to my attention, that I still need to fix. So I start with the description of the problem here and promise to fix it soon in the VFP library :-)
First let's start with the references. My friend Aaron Leiby has a blog entry on how to start programming the VFP unit here:

A typical inline assembly template might look like this:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
The last two lines of code hold the input and output operands and the so called clobbers, that are used to inform the compiler on which registers are used.
Here is a simple GCC assembly example -that doesn't use VFP assembly- that shows how the input and output operands are specified:

asm("mov %0, %1, ror #1" : "=r" (result) " : "r" (value));

The idea is that "=r" holds the result and "r" is the input. %0 refers to "=r" and %1 refers to "r".
Each operand is referenced by numbers. The first output operand is numbered 0, continuing in increasing order. There is a max number of operands ... I don't know what the max number is for the iPhone platform.

Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. So GCC will not assume that the values it loads into these registers will be valid.
In other words a clobber list tells the compiler which registers were used but not passed as operands. If a register is used as a scratch register this register need to be mentioned in there. Here is an example:
asm volatile("ands    r3, %1, #3"     "\n\t"
"eor %0, %0, r3" "\n\t"
"addne %0, #4"
: "=r" (len)
: "0" (len)
: "cc", "r3"
r3 is used as a scratch register here. It seems the cc pseudo register tells the compiler about the clobber list. If the asm code changes memory the "memory" pseudo register informs the compiler about this.

asm volatile("ldr     %0, [%1]"         "\n\t"
"str %2, [%1, #4]" "\n\t"
: "=&r" (rdv)
: "r" (&table), "r" (wdv)
: "memory"
This special clobber informs the compiler that the assembler code may modify any memory location. Btw. the volatile attribute instructs the compiler not to optimize your assembler code.

If you want to add something to this tip ... please do not hesitate to write it in the comment line. I will add it then with your name.


Dan Glastonbury said...

When dealing with GCC and inline asm on PS2, I always found it better to use one asm statement per instruction, using temporary vars to hold values between instructions.

Since GCC inserts the asm into it's expression trees, the register allocation worked quite well. (Must better than MSVC which is hands off in the face of inline asm)

I think Dylan Cuthbert had a page on this, which probably died once he started Q-Games.

Unknown said...

It's pretty awesome that you posted this today, as I just spent my entire weekend working on implementing my vector and matrix code using VFP. :)

cc is for the conditional code register. If you use any instructions that modify it (cmp, tst, or any of the 'optional s' suffix instructions, movs, orrs, etc.) then it needs to be in the clobber list.

The constraints "=r" and "r" tell GCC what type of register you want, and how to use it. The "r" signifies a general register. GCC also supports "f" on ARM for using a floating point register, as well as a few others.

Also, the iPhone debugger (or gdb in general) does not update the register view for VFP registers. They'll always say 0x0, with a couple of odd values on 2 or 3, but nothing useful. Even using the gdb console's "info all-registers" didn't report any real values. Hopefully they fix that soon.

dopplex said...

As one other source of reference for this, I found this on the ARM site. Not something for casual reading at 972 pages, but a very useful reference.

Has decently sized segments on VFP in specific.

dmost said...

Have you experimented with RunFast mode at all?

I cant quite figure out from the docs what it does, but it seems that it makes stalls based on dependencies less of a problem - because it doesn't need to preserve source registers in case of an instruction failing.

Aaron Leiby said...

Wolfgang, did you ever describe the problem that Noel pointed out? I'm guessing it has to do with specifying clobbers for the vfp registers as well. I was initially not doing this (going off your original example, and assuming that the compiler wasn't touching them - whoops) and wound up getting some impossible to track down stack corruption. I see you've added a nice set of vpf_clobber macros to your library. I'm going to have to grab that since typing them out by hand get tedious real quick.

Unknown said...

This may be that Dylan Cuthbert link:

dmost said...

this is also a good resource for arm inline assembler.

Anonymous said...
This comment has been removed by a blog administrator.