Inline Assembly With DJNZ
DavidZemon
Posts: 2,973
I'm getting a "truncated to fit" error when compiling the following code. Google says this happens when an address is too long to fit in a jmp operand - the djnz in my case. I'm not sure how to fix it. Surely there is a way. It's probably just because I haven't done asm in quite a while 
__asm__ ("mov %[_waitCycles], %[_bitCycles]\n"
"1:\twaitcnt %[_waitCycles], %[_bitCycles]\n\t"
"djnz %[_bits], #1b"
: [_waitCycles] "+r" (waitCycles),
[_bits] "+r" (bits)
: [_bitCycles] "r" (bitCycles));

Comments
This thread says to add .cog_ram to the top of the inline assembly. I'm trying that now...
I also tried surrounding with ".compress off" and ".compress default" which changes the error to "Relocation overflows" 4 times.
In any case, I managed to get my desired result in a slightly different way. Turns out, do-while gets compiled differently (correctly) compared to while. Here's the end result (with my functional code as well, instead of just the snippet)
/** * @brief Shift out one word of data */ __attribute__ ((fcache)) void shift_out_data (register uint32_t data, register uint32_t bits, const register uint32_t bitCycles, const register uint32_t txMask) const { uint32_t waitCycles; __asm__ volatile ( "mov %[_waitCycles], %[_bitCycles]\n\t" "add %[_waitCycles], CNT \n\t" : // Outputs [_waitCycles] "+r" (waitCycles) :// Inputs [_bitCycles] "r" (bitCycles)); do { __asm__ volatile( "waitcnt %[_waitCycles], %[_bitCycles]\n\t" "shr %[_data],#1 wc \n\t" "muxc outa, %[_mask]" : [_data] "+r" (data), [_waitCycles] "+r" (waitCycles) : [_mask] "r" (txMask), [_bitCycles] "r" (bitCycles)); } while (--bits); }This compiles to a perfect djnz loop with all variables loaded into cog ram before being referenced. The fcache attribute on the function was very important - without that, something wasn't getting loaded until the first call of the function. When that first call happened, the delay between initializing waitCycles and calling waitcnt was too long, and it would hang for 53 seconds. All remaining calls to the function worked great.
https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h
/** Output a byte on the I2C bus. * * @param byte the 8 bits to send on the bus. * @returns true if the device acknowledges, false otherwise. */ bool SendByte(const unsigned char byte) { int result; int datamask, nextCNT, temp; __asm__ volatile( " fcache #(PutByteEnd - PutByteStart)\n\t" " .compress off \n\t" /* Setup for transmit loop */ "PutByteStart: " " mov %[datamask], #256 \n\t" /* 0x100 */ " mov %[result], #0 \n\t" " mov %[nextCNT], cnt \n\t" " add %[nextCNT], %[clockDelay] \n\t" /* Transmit Loop (8x) */ //Output bit of byte "PutByteLoop: " " shr %[datamask], #1 \n\t" // Set up mask " and %[datamask], %[databyte] wz,nr \n\t" // Move the bit into Z flag " muxz dira, %[SDAMask] \n\t" //Pulse clock " waitcnt %[nextCNT], %[clockDelay] \n\t" " andn dira, %[SCLMask] \n\t" // Set SCL high " waitcnt %[nextCNT], %[clockDelay] \n\t" " or dira, %[SCLMask] \n\t" // Set SCL low //Return for more bits " djnz %[datamask], #__LMM_FCACHE_START+(PutByteLoop-PutByteStart) nr \n\t" // Get ACK " andn dira, %[SDAMask] \n\t" // Float SDA high (release SDA) " waitcnt %[nextCNT], %[clockDelay] \n\t" " andn dira, %[SCLMask] \n\t" // SCL high (by float) " waitcnt %[nextCNT], %[clockDelay] \n\t" " mov %[temp], ina \n\t" //Sample input " and %[SDAMask], %[temp] wz,nr \n\t" // If != 0, ack'd, else nack " muxz %[result], #1 \n\t" // Set result to equal to Z flag (aka, 1 if ack'd) " or dira, %[SCLMask] \n\t" // Set scl low " or dira, %[SDAMask] \n\t" // Set sda low " jmp __LMM_RET \n\t" "PutByteEnd: " " .compress default \n\t" : // Outputs [datamask] "=&r" (datamask), [result] "=&r" (result), [nextCNT] "=&r" (nextCNT), [temp] "=&r" (temp) : // Inputs [SDAMask] "r" (sda_mask_), [SCLMask] "r" (scl_mask_), [databyte] "r" (byte), [clockDelay] "r" (clock_delay_) ); return result; }That's actually how I got as far as I did lol. Your code definitely helped a lot. However, I still couldn't get it working even with that example to go off of. It might help if I knew what "=&r" does and how it compares to "+r". I tried Googling it when I first ran across your code but found nothing that was thorough enough for me to really understand it. Something about the & telling the assembler not to move registers around and reuse them or something? Idk... I know it decreased code size but a couple hundred bytes... but it also broke the code.
This method transmits at 4.414 MBaud
/** * @brief Shift out one word of data (FCache function) * * @param[in] data A fully configured, ready-to-go, data word * @param[in] bits Number of shiftable bits in the data word * @param[in] bitCycles Delay between each bit; Unit is clock cycles * @param[in] txMask Pin mask of the TX pin */ inline void shift_out_data (uint32_t data, uint32_t bits, const uint32_t bitCycles, const uint32_t txMask) const { #ifndef DOXYGEN_IGNORE uint32_t waitCycles; __asm__ volatile ( " fcache #(ShiftOutDataEnd - ShiftOutDataStart) \n\t" " .compress off \n\t" "ShiftOutDataStart: \n\t" " mov %[_waitCycles], %[_bitCycles] \n\t" " add %[_waitCycles], CNT \n\t" "loop%=: \n\t" " waitcnt %[_waitCycles], %[_bitCycles] \n\t" " shr %[_data],#1 wc \n\t" " muxc outa, %[_mask] \n\t" " djnz %[_bits], #__LMM_FCACHE_START+(loop%= - ShiftOutDataStart) \n\t" " jmp __LMM_RET \n\t" "ShiftOutDataEnd: \n\t" " .compress default \n\t" : [_data] "+r"(data), [_waitCycles] "+r"(waitCycles), [_bits] "+r" (bits) : [_mask] "r"(txMask), [_bitCycles] "r"(bitCycles)); #endif }