Inline Assembly With DJNZ

DavidZemon · 2014-04-26 20:35

I'm getting a "truncated to fit" error when compiling the following code. Google says this happens when an address is too long to fit in a jmp operand - the djnz in my case. I'm not sure how to fix it. Surely there is a way. It's probably just because I haven't done asm in quite a while

__asm__ ("mov %[_waitCycles], %[_bitCycles]\n"
         "1:\twaitcnt %[_waitCycles], %[_bitCycles]\n\t"
         "djnz %[_bits], #1b"
         : [_waitCycles] "+r" (waitCycles),
           [_bits] "+r" (bits)
         : [_bitCycles] "r" (bitCycles));

DavidZemon · 2014-04-26 20:41

Answering my own question here.

This thread says to add .cog_ram to the top of the inline assembly. I'm trying that now...

DavidZemon · 2014-04-26 20:48

No luck. However, the error went from being reported 3 times to just once.

I also tried surrounding with ".compress off" and ".compress default" which changes the error to "Relocation overflows" 4 times.

David Betz · 2014-04-27 04:49

You can't use DJNZ in LMM or CMM code. You'll have to put the entire function in COG memory for this to work.

DavidZemon · 2014-04-27 05:25

David Betz wrote: »

You can't use DJNZ in LMM or CMM code. You'll have to put the entire function in COG memory for this to work.

With fcache? I'd like it to work in any memory model.

David Betz · 2014-04-27 05:32

SwimDude0614 wrote: »

With fcache? I'd like it to work in any memory model.

I'm not sure about fcache. Eric would have to answer that. In theory it should work but I don't know if there are any details that would prevent it.

DavidZemon · 2014-04-27 13:48

I'm not sure who Eric is... perhaps he'll decide to drop by this thread some time.

In any case, I managed to get my desired result in a slightly different way. Turns out, do-while gets compiled differently (correctly) compared to while. Here's the end result (with my functional code as well, instead of just the snippet)

/** 
 * @brief       Shift out one word of data
 */
__attribute__ ((fcache)) void shift_out_data (register uint32_t data,
        register uint32_t bits, const register uint32_t bitCycles,
        const register uint32_t txMask) const {
    uint32_t waitCycles;


    __asm__ volatile (
            "mov %[_waitCycles], %[_bitCycles]\n\t"
            "add %[_waitCycles], CNT \n\t"
            :  // Outputs
            [_waitCycles] "+r" (waitCycles)
            :// Inputs
            [_bitCycles] "r" (bitCycles));


    do {
        __asm__ volatile(
                "waitcnt %[_waitCycles], %[_bitCycles]\n\t"
                "shr %[_data],#1 wc \n\t"
                "muxc outa, %[_mask]"
                : [_data] "+r" (data),
                [_waitCycles] "+r" (waitCycles)
                : [_mask] "r" (txMask),
                [_bitCycles] "r" (bitCycles));
    } while (--bits);
}

This compiles to a perfect djnz loop with all variables loaded into cog ram before being referenced. The fcache attribute on the function was very important - without that, something wasn't getting loaded until the first call of the function. When that first call happened, the delay between initializing waitCycles and calling waitcnt was too long, and it would hang for 53 seconds. All remaining calls to the function worked great.

David Betz · 2014-04-29 04:35

SwimDude0614 wrote: »

I'm not sure who Eric is... perhaps he'll decide to drop by this thread some time.

Sorry, the Eric I refered to is Eric Smith who wrote the PropGCC code generator and implemented the code to handle fcache. He often reads these threads and I thought he might offer some more detailed information about where DJNZ can be used.

SRLM · 2014-04-29 10:54

The I2C driver in libpropeller uses inline fcached assembly with DJNZ:

https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h

/** Output a byte on the I2C bus.
     * 
     * @param   byte the 8 bits to send on the bus.
     * @returns true if the device acknowledges, false otherwise.
     */
    bool SendByte(const unsigned char byte) {
        int result;

        int datamask, nextCNT, temp;

        __asm__ volatile(
                "         fcache #(PutByteEnd - PutByteStart)\n\t"
                "         .compress off                  \n\t"
                /* Setup for transmit loop */
                "PutByteStart: "
                "         mov %[datamask], #256          \n\t" /* 0x100 */
                "         mov %[result],   #0            \n\t"
                "         mov %[nextCNT],  cnt           \n\t"
                "         add %[nextCNT],  %[clockDelay] \n\t"

                /* Transmit Loop (8x) */
                //Output bit of byte
                "PutByteLoop: "
                "         shr  %[datamask], #1                \n\t" // Set up mask
                "         and  %[datamask], %[databyte] wz,nr \n\t" // Move the bit into Z flag
                "         muxz dira,        %[SDAMask]        \n\t"

                //Pulse clock
                "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                "         andn    dira,       %[SCLMask]    \n\t" // Set SCL high
                "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                "         or      dira,       %[SCLMask]    \n\t" // Set SCL low

                //Return for more bits
                "         djnz %[datamask], #__LMM_FCACHE_START+(PutByteLoop-PutByteStart) nr \n\t"

                // Get ACK
                "         andn    dira,       %[SDAMask]    \n\t" // Float SDA high (release SDA)
                "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                "         andn    dira,       %[SCLMask]    \n\t" // SCL high (by float)
                "         waitcnt %[nextCNT], %[clockDelay] \n\t"
                "         mov     %[temp],    ina           \n\t" //Sample input
                "         and     %[SDAMask], %[temp] wz,nr \n\t" // If != 0, ack'd, else nack
                "         muxz    %[result],  #1            \n\t" // Set result to equal to Z flag (aka, 1 if ack'd)
                "         or      dira,       %[SCLMask]    \n\t" // Set scl low
                "         or      dira,       %[SDAMask]    \n\t" // Set sda low 
                "         jmp     __LMM_RET                 \n\t"
                "PutByteEnd: "
                "         .compress default                 \n\t"
                : // Outputs
                [datamask] "=&r" (datamask),
                [result] "=&r" (result),
                [nextCNT] "=&r" (nextCNT),
                [temp] "=&r" (temp)
                : // Inputs
                [SDAMask] "r" (sda_mask_),
                [SCLMask] "r" (scl_mask_),
                [databyte] "r" (byte),
                [clockDelay] "r" (clock_delay_)
                );

        return result;
    }

DavidZemon · 2014-04-29 18:15

SRLM wrote: »

The I2C driver in libpropeller uses inline fcached assembly with DJNZ:

https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h

That's actually how I got as far as I did lol. Your code definitely helped a lot. However, I still couldn't get it working even with that example to go off of. It might help if I knew what "=&r" does and how it compares to "+r". I tried Googling it when I first ran across your code but found nothing that was thorough enough for me to really understand it. Something about the & telling the assembler not to move registers around and reuse them or something? Idk... I know it decreased code size but a couple hundred bytes... but it also broke the code.

DavidZemon · 2015-04-18 18:43

Woo hoo! Sorry to bring this back from the dead, but I finally got it working with djnz. Turns out you can use djnz in fcache, but you can not use the GCC attribute for fcache.

This method transmits at 4.414 MBaud

/**
 * @brief       Shift out one word of data (FCache function)
 *
 * @param[in]   data        A fully configured, ready-to-go, data word
 * @param[in]   bits        Number of shiftable bits in the data word
 * @param[in]   bitCycles   Delay between each bit; Unit is clock cycles
 * @param[in]   txMask      Pin mask of the TX pin
 */
inline void shift_out_data (uint32_t data, uint32_t bits, const uint32_t bitCycles,
                            const uint32_t txMask) const {
#ifndef DOXYGEN_IGNORE
    uint32_t waitCycles;
    __asm__ volatile (
            "        fcache #(ShiftOutDataEnd - ShiftOutDataStart)                     \n\t"
            "        .compress off                                                     \n\t"

            "ShiftOutDataStart:                                                        \n\t"
            "        mov %[_waitCycles], %[_bitCycles]                                 \n\t"
            "        add %[_waitCycles], CNT                                           \n\t"

            "loop%=:                                                                   \n\t"
            "        waitcnt %[_waitCycles], %[_bitCycles]                             \n\t"
            "        shr %[_data],#1 wc                                                \n\t"
            "        muxc outa, %[_mask]                                               \n\t"
            "        djnz %[_bits], #__LMM_FCACHE_START+(loop%= - ShiftOutDataStart)   \n\t"

            "        jmp __LMM_RET                                                     \n\t"
            "ShiftOutDataEnd:                                                          \n\t"
            "        .compress default                                                 \n\t"
            : [_data] "+r"(data),
            [_waitCycles] "+r"(waitCycles),
            [_bits] "+r" (bits)
            : [_mask] "r"(txMask),
            [_bitCycles] "r"(bitCycles));
#endif
}

Inline Assembly With DJNZ

Comments