Shop OBEX P1 Docs P2 Docs Learn Events
Inline PASM — Parallax Forums

Inline PASM

SpillySpilly Posts: 26
edited 2014-07-03 06:46 in Propeller 1
I am on a mission to optimize a bit of code. Is the following the same as writing a section of inline assembly with PROPGCC C++ code?

__builtin_propeller_waitcnt((waitTime+CNT), 0)

If yes, can this be optimized to complete any faster? Making "waitTime" a constant does not fit my needs.

If no, what header/library is this making a function call to?

Comments

  • jazzedjazzed Posts: 11,803
    edited 2014-07-02 14:29
    Maybe something like this.
    __attribute__((fcache)) static void _outbyte(int bitcycles, int txmask, int value)
    {
      int j = 10;
      int waitcycles;
    
    
      waitcycles = CNT + bitcycles;
      while(j-- > 0) {
        /* C code is too big and not fast enough for all memory models.
        // waitcycles = waitcnt2(waitcycles, bitcycles); */
        __asm__ volatile("waitcnt %[_waitcycles], %[_bitcycles]"
                         : [_waitcycles] "+r" (waitcycles)
                         : [_bitcycles] "r" (bitcycles));
    
    
        /* if (value & 1) OUTA |= txmask else OUTA &= ~txmask; value = value >> 1; */
        __asm__ volatile("shr %[_value],#1 wc \n\t"
                         "muxc outa, %[_mask]"
                         : [_value] "+r" (value)
                         : [_mask] "r" (txmask));
      }
    }
    
  • SpillySpilly Posts: 26
    edited 2014-07-02 14:44
    jazzed wrote: »
    Maybe something like this.
    __attribute__((fcache)) static void _outbyte(int bitcycles, int txmask, int value)
    {
      int j = 10;
      int waitcycles;
    
    
      waitcycles = CNT + bitcycles;
      while(j-- > 0) {
        /* C code is too big and not fast enough for all memory models.
        // waitcycles = waitcnt2(waitcycles, bitcycles); */
        __asm__ volatile("waitcnt %[_waitcycles], %[_bitcycles]"
                         : [_waitcycles] "+r" (waitcycles)
                         : [_bitcycles] "r" (bitcycles));
    
    
        /* if (value & 1) OUTA |= txmask else OUTA &= ~txmask; value = value >> 1; */
        __asm__ volatile("shr %[_value],#1 wc \n\t"
                         "muxc outa, %[_mask]"
                         : [_value] "+r" (value)
                         : [_mask] "r" (txmask));
      }
    }
    

    I'm not sure exactly what they are doing there, but, looking at that code gave me an epiphany! I was trying to use "rdbyte" or "rdlong" to pass a C++ variable's value to PASM and that is what I have been screwing up this entire time. I'm sure this is documented somewhere but I must have skimmed over it in my assembly language crash course.

    Thank you Good Sir! You just helped me solve a week long headache!
  • jazzedjazzed Posts: 11,803
    edited 2014-07-02 15:03
    The function sends value as a bit-stream. It's part of simpleserial.h in the Simple Libraries.

    As long as the function is less than 64 bytes, it will run at near PASM speeds.
    The whole loop could be written in in-line pasm for better performance and size.

    Wasting a COG is not necessary for half-duplex operation.

    Solved no? Edit the first post "Go Advanced" and select Solved from the combo-box.
  • ersmithersmith Posts: 6,089
    edited 2014-07-03 06:34
    Spilly wrote: »
    I am on a mission to optimize a bit of code. Is the following the same as writing a section of inline assembly with PROPGCC C++ code?

    __builtin_propeller_waitcnt((waitTime+CNT), 0)

    If yes, can this be optimized to complete any faster? Making "waitTime" a constant does not fit my needs.

    Yes, the __builtin_propeller_waitcnt function maps directly to the waitcnt instruction. It waits until the cycle counter matches the first parameter, and returns the sum of the first and second parameters. So for example:
    #include <propeller.h>
    
    void toggle(unsigned delay)
    {
        unsigned int target = CNT + delay;
        for(;;) {
            // wait for "delay" ticks, and update target to be target + delay
            target = __builtin_propeller_waitcnt(target, delay);
            // toggle pin 0
            _OUTA ^= 1;
        }
    }
    
    compiled with "propeller-elf-gcc -mcog -Os -S -o foo.s foo.c" produces:
            .text
            .balign 4
            .global _toggle
    _toggle
            mov     r7, CNT
            add     r7, r0
    .L2
            waitcnt r7,r0
            xor     OUTA,#1
            jmp     #.L2
    
    which is about as good as I can get, as far as I can see.

    All of the __builtin_propeller_x functions are implemented directly in the compiler and map to the corresponding propeller instruction, where "x" can be "clkset", "cogid", "coginit", "cogstop", "rev", "waitcnt", "waitpeq", "waitpne", "waitvid", "locknew", "lockret", "lockset", "lockclr". (Some of the "lock" functions have an extra instruction to capture the value of the carry flag.)

    (So jazzed, I think the commented out waitcnt2() line in your example will produce the same result as the inline PASM, since waitcnt2 is just a #define for __builtin_propeller_waitcnt.)

    Eric
  • SpillySpilly Posts: 26
    edited 2014-07-03 06:46
    ersmith wrote: »
    Yes, the __builtin_propeller_waitcnt function maps directly to the waitcnt instruction.
    which is about as good as I can get, as far as I can see.

    All of the __builtin_propeller_x functions are implemented directly in the compiler and map to the corresponding propeller instruction, where "x" can be "clkset", "cogid", "coginit", "cogstop", "rev", "waitcnt", "waitpeq", "waitpne", "waitvid", "locknew", "lockret", "lockset", "lockclr". (Some of the "lock" functions have an extra instruction to capture the value of the carry flag.)

    I pretty much came to that conclusion last night. I thought I was saving a butt load of clock cycles, but, I had misconfigured something. The inline assembly I had written was not waiting the same amount (a lot less) of clock ticks as the waitcnt() C++ function.

    I did however save myself about 7k clock ticks from changing the variable containing the wait time to an unsigned int instead of a float.

    Thank you for the thorough explanation. It puts my mind at ease knowing what I have is about as good as it gets.
Sign In or Register to comment.