Shop OBEX P1 Docs P2 Docs Learn Events
Inline Assembly With DJNZ — Parallax Forums

Inline Assembly With DJNZ

DavidZemonDavidZemon Posts: 2,973
edited 2015-04-18 18:43 in Propeller 1
I'm getting a "truncated to fit" error when compiling the following code. Google says this happens when an address is too long to fit in a jmp operand - the djnz in my case. I'm not sure how to fix it. Surely there is a way. It's probably just because I haven't done asm in quite a while :/
__asm__ ("mov %[_waitCycles], %[_bitCycles]\n"
         "1:\twaitcnt %[_waitCycles], %[_bitCycles]\n\t"
         "djnz %[_bits], #1b"
         : [_waitCycles] "+r" (waitCycles),
           [_bits] "+r" (bits)
         : [_bitCycles] "r" (bitCycles));

Comments

  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-04-26 20:41
    Answering my own question here.

    This thread says to add .cog_ram to the top of the inline assembly. I'm trying that now...
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-04-26 20:48
    No luck. However, the error went from being reported 3 times to just once.

    I also tried surrounding with ".compress off" and ".compress default" which changes the error to "Relocation overflows" 4 times.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-04-27 04:49
    You can't use DJNZ in LMM or CMM code. You'll have to put the entire function in COG memory for this to work.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-04-27 05:25
    David Betz wrote: »
    You can't use DJNZ in LMM or CMM code. You'll have to put the entire function in COG memory for this to work.
    With fcache? I'd like it to work in any memory model.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-04-27 05:32
    With fcache? I'd like it to work in any memory model.
    I'm not sure about fcache. Eric would have to answer that. In theory it should work but I don't know if there are any details that would prevent it.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-04-27 13:48
    I'm not sure who Eric is... perhaps he'll decide to drop by this thread some time.

    In any case, I managed to get my desired result in a slightly different way. Turns out, do-while gets compiled differently (correctly) compared to while. Here's the end result (with my functional code as well, instead of just the snippet)
    /** 
     * @brief       Shift out one word of data
     */
    __attribute__ ((fcache)) void shift_out_data (register uint32_t data,
            register uint32_t bits, const register uint32_t bitCycles,
            const register uint32_t txMask) const {
        uint32_t waitCycles;
    
    
        __asm__ volatile (
                "mov %[_waitCycles], %[_bitCycles]\n\t"
                "add %[_waitCycles], CNT \n\t"
                :  // Outputs
                [_waitCycles] "+r" (waitCycles)
                :// Inputs
                [_bitCycles] "r" (bitCycles));
    
    
        do {
            __asm__ volatile(
                    "waitcnt %[_waitCycles], %[_bitCycles]\n\t"
                    "shr %[_data],#1 wc \n\t"
                    "muxc outa, %[_mask]"
                    : [_data] "+r" (data),
                    [_waitCycles] "+r" (waitCycles)
                    : [_mask] "r" (txMask),
                    [_bitCycles] "r" (bitCycles));
        } while (--bits);
    }
    

    This compiles to a perfect djnz loop with all variables loaded into cog ram before being referenced. The fcache attribute on the function was very important - without that, something wasn't getting loaded until the first call of the function. When that first call happened, the delay between initializing waitCycles and calling waitcnt was too long, and it would hang for 53 seconds. All remaining calls to the function worked great.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-04-29 04:35
    I'm not sure who Eric is... perhaps he'll decide to drop by this thread some time.
    Sorry, the Eric I refered to is Eric Smith who wrote the PropGCC code generator and implemented the code to handle fcache. He often reads these threads and I thought he might offer some more detailed information about where DJNZ can be used.
  • SRLMSRLM Posts: 5,045
    edited 2014-04-29 10:54
    The I2C driver in libpropeller uses inline fcached assembly with DJNZ:

    https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h
    /** Output a byte on the I2C bus.
         * 
         * @param   byte the 8 bits to send on the bus.
         * @returns true if the device acknowledges, false otherwise.
         */
        bool SendByte(const unsigned char byte) {
            int result;
    
            int datamask, nextCNT, temp;
    
            __asm__ volatile(
                    "         fcache #(PutByteEnd - PutByteStart)\n\t"
                    "         .compress off                  \n\t"
                    /* Setup for transmit loop */
                    "PutByteStart: "
                    "         mov %[datamask], #256          \n\t" /* 0x100 */
                    "         mov %[result],   #0            \n\t"
                    "         mov %[nextCNT],  cnt           \n\t"
                    "         add %[nextCNT],  %[clockDelay] \n\t"
    
                    /* Transmit Loop (8x) */
                    //Output bit of byte
                    "PutByteLoop: "
                    "         shr  %[datamask], #1                \n\t" // Set up mask
                    "         and  %[datamask], %[databyte] wz,nr \n\t" // Move the bit into Z flag
                    "         muxz dira,        %[SDAMask]        \n\t"
    
                    //Pulse clock
                    "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                    "         andn    dira,       %[SCLMask]    \n\t" // Set SCL high
                    "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                    "         or      dira,       %[SCLMask]    \n\t" // Set SCL low
    
                    //Return for more bits
                    "         djnz %[datamask], #__LMM_FCACHE_START+(PutByteLoop-PutByteStart) nr \n\t"
    
                    // Get ACK
                    "         andn    dira,       %[SDAMask]    \n\t" // Float SDA high (release SDA)
                    "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                    "         andn    dira,       %[SCLMask]    \n\t" // SCL high (by float)
                    "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                    "         mov     %[temp],    ina           \n\t" //Sample input
                    "         and     %[SDAMask], %[temp] wz,nr \n\t" // If != 0, ack'd, else nack
                    "         muxz    %[result],  #1            \n\t" // Set result to equal to Z flag (aka, 1 if ack'd)
                    "         or      dira,       %[SCLMask]    \n\t" // Set scl low
                    "         or      dira,       %[SDAMask]    \n\t" // Set sda low 
                    "         jmp     __LMM_RET                 \n\t"
                    "PutByteEnd: "
                    "         .compress default                 \n\t"
                    : // Outputs
                    [datamask] "=&r" (datamask),
                    [result] "=&r" (result),
                    [nextCNT] "=&r" (nextCNT),
                    [temp] "=&r" (temp)
                    : // Inputs
                    [SDAMask] "r" (sda_mask_),
                    [SCLMask] "r" (scl_mask_),
                    [databyte] "r" (byte),
                    [clockDelay] "r" (clock_delay_)
                    );
    
            return result;
        }
    
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-04-29 18:15
    SRLM wrote: »
    The I2C driver in libpropeller uses inline fcached assembly with DJNZ:

    https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h

    That's actually how I got as far as I did lol. Your code definitely helped a lot. However, I still couldn't get it working even with that example to go off of. It might help if I knew what "=&r" does and how it compares to "+r". I tried Googling it when I first ran across your code but found nothing that was thorough enough for me to really understand it. Something about the & telling the assembler not to move registers around and reuse them or something? Idk... I know it decreased code size but a couple hundred bytes... but it also broke the code.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2015-04-18 18:43
    Woo hoo! Sorry to bring this back from the dead, but I finally got it working with djnz. Turns out you can use djnz in fcache, but you can not use the GCC attribute for fcache.

    This method transmits at 4.414 MBaud :)
    /**
     * @brief       Shift out one word of data (FCache function)
     *
     * @param[in]   data        A fully configured, ready-to-go, data word
     * @param[in]   bits        Number of shiftable bits in the data word
     * @param[in]   bitCycles   Delay between each bit; Unit is clock cycles
     * @param[in]   txMask      Pin mask of the TX pin
     */
    inline void shift_out_data (uint32_t data, uint32_t bits, const uint32_t bitCycles,
                                const uint32_t txMask) const {
    #ifndef DOXYGEN_IGNORE
        uint32_t waitCycles;
        __asm__ volatile (
                "        fcache #(ShiftOutDataEnd - ShiftOutDataStart)                     \n\t"
                "        .compress off                                                     \n\t"
    
                "ShiftOutDataStart:                                                        \n\t"
                "        mov %[_waitCycles], %[_bitCycles]                                 \n\t"
                "        add %[_waitCycles], CNT                                           \n\t"
    
                "loop%=:                                                                   \n\t"
                "        waitcnt %[_waitCycles], %[_bitCycles]                             \n\t"
                "        shr %[_data],#1 wc                                                \n\t"
                "        muxc outa, %[_mask]                                               \n\t"
                "        djnz %[_bits], #__LMM_FCACHE_START+(loop%= - ShiftOutDataStart)   \n\t"
    
                "        jmp __LMM_RET                                                     \n\t"
                "ShiftOutDataEnd:                                                          \n\t"
                "        .compress default                                                 \n\t"
                : [_data] "+r"(data),
                [_waitCycles] "+r"(waitCycles),
                [_bits] "+r" (bits)
                : [_mask] "r"(txMask),
                [_bitCycles] "r"(bitCycles));
    #endif
    }
    
Sign In or Register to comment.