Inline assembly (PASM), injecting delays

ypapelis · 2012-09-13 08:37

I have been very successfully converting my PASM programs into C (love it!), but one place where I am consistently getting in trouble is finding ways to inject precise delays in the code. I am using COG mode for C, so the C output program is running in a cog by itself, although the same problem probably would occur in LMM (or other models) for loops that end up cached into COG memory.

Using waitcnt/waitcnt2 can only generate delays higher than about 7 micro seconds, which does not cut it for tweaking clock pulses and clock-data pulse offsets. In pasm, for 'large' delays I could call a subroutine that loops around, and I could get a precise delay knowing the loop iterations. For shorter delays, simply inserting 'NOP' instructions did the trick. I tried to call a delay function in cogc, but unless the function is doing something 'meaty', it gets 'optimized' away, and not knowing (or having to look into the assembly is inconvenient). Incrementing a volatile variable appears to do the trick but is awkward. So I was wondering, is there a way to inject PASM code inside C? Some other embedded C systems provide pseudo functions that do exactly that. Or any other suggestion on how to achieve instruction-level delays would be welcomed!

jazzed · 2012-09-13 10:20

ypapelis wrote: »

I have been very successfully converting my PASM programs into C (love it!), but one place where I am consistently getting in trouble is finding ways to inject precise delays in the code. I am using COG mode for C, so the C output program is running in a cog by itself, although the same problem probably would occur in LMM (or other models) for loops that end up cached into COG memory.

Using waitcnt/waitcnt2 can only generate delays higher than about 7 micro seconds ....

I've been able to make pins toggle as fast as 400ns (possibly 250ns) in COGC code using waitcnt2 (with 80MHz clock).

Here is a COG mode example.

/**
 * @file cogdelay.c
 *
 * This program demonstrates creating COG mode delays with one
 * statement between delays. The minimum delay is dictated
 * by how long it takes to execute instructions between delays.
 
 * It should be noted that if the time it takes for code to execute
 * is greater than wait, the code will appear to freeze for about
 * one minute with an 80MHz clock.
 */
#include <propeller.h>


/**
 * COG mode cogdelay main function
 *
 * The basis of this delay demo is the propeller.h waitcnt2 macro
 * which translates to __builtin_waitcnt(count, wait).
 *
 *   #define waitcnt2(a, b) __builtin_propeller_waitcnt((a),(b))
 *
 *   Wait until system counter reaches A value.
 *   waitcnt2 Parameters:
 *      a Target value
 *      b Adjust value
 *
 * Demo runs in an infinite loop and allows for scope measurement.
 */
int main(void)
{
    unsigned int count;
    unsigned int wait;
    unsigned int pin;
    // set pin to output
    pin = 1 << 15;
    DIRA |= pin;
    
    // wait as small as 250ns is possible (at 80MHz) building in COG mode.
    
    wait = (CLKFREQ/5000000); // waitcnt2 delay becomes 400ns
    for(;;) {
        // initial delay just for reasonable scope triggering
        count = wait*50+CNT;
        OUTA &= ~pin;
        count = waitcnt2(count, wait);
        // now count contains the target count for the next waitcnt2 call.
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
    }
}

Even in LMM mode 2us delays are possible. CMM programs are slower, so the demo below won't work by default - change wait = CLKFREQ/250000; to get about a 5us wait.

/**
 * @file delaydemo.c
 *
 * This program demonstrates creating 2 microsecond or more delays
 * with one statement between delays. The minimum delay is dictated
 * by how long it takes to execute instructions between delays. The
 * use of fcache (FastCache) also determines how small the delays can
 * be - this demo allows fcache to be disabled, but it can be enabled.
 *
 * It should be noted that if the time it takes for code to execute
 * is greater than wait, the code will appear to freeze for about
 * one minute with an 80MHz clock.
 */
#include <propeller.h>


/**
 * delaydemo main function
 * HUBTEXT forces function to be in HUB RAM even with XMM modes.
 * Not designed for COG mode.
 *
 * The basis of this delay demo is the propeller.h waitcnt2 macro
 * which translates to __builtin_waitcnt(count, wait).
 *
 *   #define waitcnt2(a, b) __builtin_propeller_waitcnt((a),(b))
 *
 *   Wait until system counter reaches A value.
 *   waitcnt2 Parameters:
 *      a Target value
 *      b Adjust value
 *
 * Demo runs in an infinite loop and allows for scope measurement.
 */
HUBTEXT void main(void)
{
    unsigned int count;
    unsigned int wait;
    unsigned int pin;
    // set pin to output
    pin = 1 << 15;
    DIRA |= pin;
    
    // 2us waits for 80MHz clock assuming no fcache.
    // If the code is small enough to fit in the fcache area
    // wait as small as 500ns is possible (at 80MHz).
    wait = CLKFREQ/500000;
    for(;;) {
        // initial delay just for reasonable scope triggering
        count = wait*50+CNT;
        OUTA &= ~pin;
        count = waitcnt2(count, wait);
        // now count contains the target count for the next waitcnt2 call.
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
        OUTA ^= pin;
        count = waitcnt2(count, wait);
    }
}

I'm not much of a GNU ASM user, so I don't feel comfortable answering that question without having time for more reading.

Dave Hein · 2012-09-13 10:47

If you want to keep a delay loop from being optimized away you could do something like this:

#pragma GCC push_options
#pragma GCC optimize("O1")
void delay()
{
    int i;
    for (i = 0; i < 100; i++);
}
#pragma GCC pop_options

However, it looks like PropGCC doesn't like the pop_options pragma, so you would have to explicitly set the optimization level after the delay loop with something like "#pragma GCC optimize("Os")" instead of using the push/pop pragmas.

ersmith · 2012-09-13 11:54

ypapelis wrote: »

Using waitcnt/waitcnt2 can only generate delays higher than about 7 micro seconds

Are you sure about this? waitcnt should compile to a waitcnt instruction, so in COG mode you should be able to get accuracies of much better than a microsecond, at least with the default 80MHz clock.

So I was wondering, is there a way to inject PASM code inside C? Some other embedded C systems provide pseudo functions that do exactly that.

Yes, the __asm__ instruction. There's a section on it in the GCC manual, and various documentation online. To insert two nops you would do something like:

__asm__ volatile (" nop\n nop\n");

The string inside the __asm__ is passed through directly to the assembler. To insert multiple lines, use "\n" (and remember to indent at least one space so the instruction is not interpreted as a label). The "volatile" says not to optimize around the asm (not to move instructions past it, for example).

Note that the syntax for __asm__ is GAS, not PASM. They are mostly the same; the main thing to watch out for is that GAS addresses are always byte addresses, not long addresses.

Eric

dnalor · 2012-09-13 12:19

Dave Hein wrote: »
If you want to keep a delay loop from being optimized away you could do something like this:
#pragma GCC push_options
#pragma GCC optimize("O1")
void delay()
{
    int i;
    for (i = 0; i < 100; i++);
}
#pragma GCC pop_options
However, it looks like PropGCC doesn't like the pop_options pragma, so you would have to explicitly set the optimization level after the delay loop with something like "#pragma GCC optimize("Os")" instead of using the push/pop pragmas.

For resetting to command line options you can use:
#pragma GCC optimize(initial)

Inline assembly (PASM), injecting delays

Comments