YA Assembly Issue

SRLM · 2013-03-15 00:24

The Problem Summary: the attached code behaves differently when compiled with -Os vs. -O0.

Here is when it's working correctly (-O0):
attachment.php?attachmentid=99955&d=1363331977

attachment.php?attachmentid=99955&d=1363331977

Here is when it's not working correctly (-Os):
attachment.php?attachmentid=99954&d=1363331977

The only difference between the two runs was running with optimization on or off. The interesting bit is the inline assembly that I added (the full working example is attached, complete with an elf compiled with -Os):

int result;

    __asm__ volatile (
   	"       andn dira, %[SDAMask]       \n\t" /* DIRA &= ~SDAMask (float SDA high) */ 
   	"       andn dira, %[SCLMask]       \n\t" /* DIRA &= ~SCLMask (float SCL high) */
   	"       and  ina,  %[SDAMask] wz, nr\n\t" /* If != 0, ack'd, else nack */
   	"if_z   mov  %[result], #1          \n\t"
   	"if_nz  mov  %[result], #0          \n\t"
   	"       or   dira, %[SCLMask]       \n\t" /* Set scl low */
   	"       or   dira, %[SDAMask]       \n\t" /* Set sda low */
   	
    : /* outputs */
    	[result]   "=r" (result)
    : /* inputs */
    	[SDAMask] "r" (SDAMask),
    	[SCLMask] "r" (SCLMask)
    );

To compile, I use the following command (no Makefile):

propeller-elf-g++ -mlmm -Wa,-alh,-L   -O0 -m32bit-doubles -mfcache -fno-exceptions -fno-rtti -fpermissive -ffunction-sections -fno-strict-aliasing -std=gnu++0x  -o main.elf i2cbase.cpp i2cbase_temp.cpp

I have also attached the generated assembly (from the "-Wa,-alh,-L" flag). That's in the ZIP.

I am using the following version:

propeller-elf-g++ (propellergcc_v0_3_5_1920) 4.6.1

kuroneko · 2013-03-15 01:01

Interesting mixup! Can't help you with that but while you wait you might want to think about and ina, %[SDAMask] wz,nr.

jac_goudsmit · 2013-03-15 08:13

I didn't look at the difference between the two assembler outputs (.asm files) but I bet that the trap you're walking into is that on the Propeller, the INA pins are not connected to the ALU, so you can't do any logic (or arithmetic) operations on it directly. Only the "shadow" register that's in the same location is connected to the ALU.

Your code does an "AND INA, value" so it takes the value of the memory longword at location INA (not the pins) and masks it with the value.

You have to copy the value of INA to another register (or to itself, to copy the pins to the shadow register) to test the actual pins. Example:

    __asm__ volatile (
       "        andn dira, %[SDAMask]       \n\t" /* DIRA &= ~SDAMask (float SDA high) */ 
       "        andn dira, %[SCLMask]       \n\t" /* DIRA &= ~SCLMask (float SCL high) */
       "        mov  ina, ina\n\t" /* Copy INA to RAM shadow register so it can be used in Arithmetic/Logic operation */
       "        and  ina,  %[SDAMask] wz, nr\n\t" /* If != 0, ack'd, else nack */
       "if_z    mov  %[result], #1          \n\t"
       "if_nz  mov  %[result], #0          \n\t"
       "        or   dira, %[SCLMask]       \n\t" /* Set scl low */
       "       or   dira, %[SDAMask]       \n\t" /* Set sda low */
       
    : /* outputs */
        [result]   "=r" (result)
    : /* inputs */
        [SDAMask] "r" (SDAMask),
        [SCLMask] "r" (SCLMask)
    );

I bet the unoptimized code copies INA to itself or to a different cog RAM register before it does the test, so it works fine but only because of the side effect of copying-before-testing.

I know you're working on PASM documentation on the PropGCC website, and I think this fact needs to be emphasized: The INA register cannot be bit-tested directly.

I'm sure this is what all PASM programmers try in their first program, and I bet many of them have scratched their heads why their program doesn't work. I ran into it myself only recently and was very surprised that the hardware was implemented this way.

By the way, in inline assembler I much prefer to use semicolons to separate lines instead of \n; in long sequences where you want to inspect the output listing from the compiler, I use \n but I put them at the start of each line so they don't distract too much and the C source code looks prettier. But that's a matter of taste I guess. Example:

    __asm__ volatile (
"\n        andn dira,      %[SDAMask]"        // DIRA &= ~SDAMask (float SDA high)
"\n        andn dira,      %[SCLMask]"        // DIRA &= ~SCLMask (float SCL high)
"\n        mov  ina,       ina"               // Copy INA to RAM shadow register so it can be used in Arithmetic/Logic operation
"\n        and  ina,       %[SDAMask] wz, nr" // If != 0, ack'd, else nack
"\n        muxz result,    %[TRUE]"           // Result = 0xFFFFFFFF if ZF is set, 0 if ZF is clear
"\n        or   dira,      %[SCLMask]"        // Set scl to OUTA
"\n        or   dira,      %[SDAMask]"        // Set sda to OUTA

: /* outputs */
           [result]        "=r"         (result)
: /* inputs */
           [SDAMask]       "r"          (SDAMask),
           [SCLMask]       "r"          (SCLMask),
           [TRUE]          "r"          (~0)
    );

That last code snipped also demonstrates the use of MUX to make your code one instruction shorter. You can also make it even shorter by combining SDAMask and SCLMask into one parameter and do the ANDN/OR instructions for both bits at the same time, but I'm not familiar enough with I2C to know if the delay between the two output pin changes was deliberate. Also, your comments near the end of the inline assembly section say "set scl low" / "set sda low" and that's not what those instructions do of course, unless you know that the pins in OUTA are low, which, from the viewpoint of the SendByte function as it is now, is unknown.

===Jac

ersmith · 2013-03-15 08:34

From the generated assembly it looks like in the optimized case it's using r5 both to hold "result" and to hold "SDAMask", which is a problem because "SDAMask" is used after "result". I think you need to modify the constraints on the inline assembly to prevent this. I think you need to write "=&result" for the output, to indicate that result is "earlyclobber" (written before all inputs have been read), since we need to read SDAMask and SCLMask after result is written.

Eric

SRLM · 2013-03-15 11:00

ersmith wrote: »

...I think you need to modify the constraints on the inline assembly to prevent this. I think you need to write "=&result" for the output, to indicate that result is "earlyclobber"...

That was it. Thanks!

kuroneko wrote: »

Interesting mixup! Can't help you with that but while you wait you might want to think about and ina, %[SDAMask] wz,nr.

jac_goudsmit wrote: »

...
You have to copy the value of INA to another register (or to itself, to copy the pins to the shadow register) to test the actual pins. Example:
...
I know you're working on PASM documentation on the PropGCC website, and I think this fact needs to be emphasized: The INA register cannot be bit-tested directly.
...
That last code snipped also demonstrates the use of MUX to make your code one instruction shorter. You can also make it even shorter by combining SDAMask and SCLMask into one parameter and do the ANDN/OR instructions for both bits at the same time, but I'm not familiar enough with I2C to know if the delay between the two output pin changes was deliberate. Also, your comments near the end of the inline assembly section say "set scl low" / "set sda low" and that's not what those instructions do of course, unless you know that the pins in OUTA are low, which, from the viewpoint of the SendByte function as it is now, is unknown.

===Jac

The ina trap is a good one, and one that I was not aware of. Thanks for the catch!

That PASM page (link) was mostly to help me learn, and to get something out there. I don't have any unposted documentation. I just write when the spirit takes me, and post (I've done the two tutorials, and some of the FAQs). I think you (and altosack?) are working on more complete documentation, and I'm perfectly fine if it replaces what I put up.

Thanks for the MUX instruction! As for the SDAMask and SCLMask, the separation there is intentional. I may even have to add in delays to slow it down. As for the low bit, the code that I posted is the minimal working example that showed the issue that I had. The full code does set OUTA low for SCLMask and SDAMask.

Does PropGCC use the shadow registers for anything, or is it safe to overwrite them? I think that some minimal footprint tools (like Cluso's and Jazzed's debuggers) use the shadow registers to store data.

kuroneko · 2013-03-15 17:33

jac_goudsmit wrote: »

"\n        mov  ina,       ina"               // Copy INA to RAM shadow register so it can be used in Arithmetic/Logic operation
"\n        and  ina,       %[SDAMask] wz, nr" // If != 0, ack'd, else nack

Make that and %[SDAMask], ina wz,nr or test %[SDAMask], ina wz.

jazzed · 2013-03-15 18:22

SRLM wrote: »

Does PropGCC use the shadow registers for anything, or is it safe to overwrite them?

They may end up being used in one of the memory model debug kernels.

ersmith · 2013-03-16 04:52

jazzed wrote: »

They may end up being used in one of the memory model debug kernels.

You nailed it, Steve -- the INA and CNT shadow registers are used by the debug code (only linked if you ask for it with -g). Ordinary PropGCC generated code does not use the shadow registers for anything.

Eric

YA Assembly Issue

Comments