Inline assembly (PASM), injecting delays
I have been very successfully converting my PASM programs into C (love it!), but one place where I am consistently getting in trouble is finding ways to inject precise delays in the code. I am using COG mode for C, so the C output program is running in a cog by itself, although the same problem probably would occur in LMM (or other models) for loops that end up cached into COG memory.
Using waitcnt/waitcnt2 can only generate delays higher than about 7 micro seconds, which does not cut it for tweaking clock pulses and clock-data pulse offsets. In pasm, for 'large' delays I could call a subroutine that loops around, and I could get a precise delay knowing the loop iterations. For shorter delays, simply inserting 'NOP' instructions did the trick. I tried to call a delay function in cogc, but unless the function is doing something 'meaty', it gets 'optimized' away, and not knowing (or having to look into the assembly is inconvenient). Incrementing a volatile variable appears to do the trick but is awkward. So I was wondering, is there a way to inject PASM code inside C? Some other embedded C systems provide pseudo functions that do exactly that. Or any other suggestion on how to achieve instruction-level delays would be welcomed!
Using waitcnt/waitcnt2 can only generate delays higher than about 7 micro seconds, which does not cut it for tweaking clock pulses and clock-data pulse offsets. In pasm, for 'large' delays I could call a subroutine that loops around, and I could get a precise delay knowing the loop iterations. For shorter delays, simply inserting 'NOP' instructions did the trick. I tried to call a delay function in cogc, but unless the function is doing something 'meaty', it gets 'optimized' away, and not knowing (or having to look into the assembly is inconvenient). Incrementing a volatile variable appears to do the trick but is awkward. So I was wondering, is there a way to inject PASM code inside C? Some other embedded C systems provide pseudo functions that do exactly that. Or any other suggestion on how to achieve instruction-level delays would be welcomed!

Comments
I've been able to make pins toggle as fast as 400ns (possibly 250ns) in COGC code using waitcnt2 (with 80MHz clock).
Here is a COG mode example.
/** * @file cogdelay.c * * This program demonstrates creating COG mode delays with one * statement between delays. The minimum delay is dictated * by how long it takes to execute instructions between delays. * It should be noted that if the time it takes for code to execute * is greater than wait, the code will appear to freeze for about * one minute with an 80MHz clock. */ #include <propeller.h> /** * COG mode cogdelay main function * * The basis of this delay demo is the propeller.h waitcnt2 macro * which translates to __builtin_waitcnt(count, wait). * * #define waitcnt2(a, b) __builtin_propeller_waitcnt((a),(b)) * * Wait until system counter reaches A value. * waitcnt2 Parameters: * a Target value * b Adjust value * * Demo runs in an infinite loop and allows for scope measurement. */ int main(void) { unsigned int count; unsigned int wait; unsigned int pin; // set pin to output pin = 1 << 15; DIRA |= pin; // wait as small as 250ns is possible (at 80MHz) building in COG mode. wait = (CLKFREQ/5000000); // waitcnt2 delay becomes 400ns for(;;) { // initial delay just for reasonable scope triggering count = wait*50+CNT; OUTA &= ~pin; count = waitcnt2(count, wait); // now count contains the target count for the next waitcnt2 call. OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); } }Even in LMM mode 2us delays are possible. CMM programs are slower, so the demo below won't work by default - change wait = CLKFREQ/250000; to get about a 5us wait.
/** * @file delaydemo.c * * This program demonstrates creating 2 microsecond or more delays * with one statement between delays. The minimum delay is dictated * by how long it takes to execute instructions between delays. The * use of fcache (FastCache) also determines how small the delays can * be - this demo allows fcache to be disabled, but it can be enabled. * * It should be noted that if the time it takes for code to execute * is greater than wait, the code will appear to freeze for about * one minute with an 80MHz clock. */ #include <propeller.h> /** * delaydemo main function * HUBTEXT forces function to be in HUB RAM even with XMM modes. * Not designed for COG mode. * * The basis of this delay demo is the propeller.h waitcnt2 macro * which translates to __builtin_waitcnt(count, wait). * * #define waitcnt2(a, b) __builtin_propeller_waitcnt((a),(b)) * * Wait until system counter reaches A value. * waitcnt2 Parameters: * a Target value * b Adjust value * * Demo runs in an infinite loop and allows for scope measurement. */ HUBTEXT void main(void) { unsigned int count; unsigned int wait; unsigned int pin; // set pin to output pin = 1 << 15; DIRA |= pin; // 2us waits for 80MHz clock assuming no fcache. // If the code is small enough to fit in the fcache area // wait as small as 500ns is possible (at 80MHz). wait = CLKFREQ/500000; for(;;) { // initial delay just for reasonable scope triggering count = wait*50+CNT; OUTA &= ~pin; count = waitcnt2(count, wait); // now count contains the target count for the next waitcnt2 call. OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); OUTA ^= pin; count = waitcnt2(count, wait); } }I'm not much of a GNU ASM user, so I don't feel comfortable answering that question without having time for more reading.
#pragma GCC push_options #pragma GCC optimize("O1") void delay() { int i; for (i = 0; i < 100; i++); } #pragma GCC pop_optionsHowever, it looks like PropGCC doesn't like the pop_options pragma, so you would have to explicitly set the optimization level after the delay loop with something like "#pragma GCC optimize("Os")" instead of using the push/pop pragmas.Yes, the __asm__ instruction. There's a section on it in the GCC manual, and various documentation online. To insert two nops you would do something like:
__asm__ volatile (" nop\n nop\n");The string inside the __asm__ is passed through directly to the assembler. To insert multiple lines, use "\n" (and remember to indent at least one space so the instruction is not interpreted as a label). The "volatile" says not to optimize around the asm (not to move instructions past it, for example).
Note that the syntax for __asm__ is GAS, not PASM. They are mostly the same; the main thing to watch out for is that GAS addresses are always byte addresses, not long addresses.
Eric
For resetting to command line options you can use:
#pragma GCC optimize(initial)