CALL problem with PNut binaries
bradzone
Posts: 6
in Propeller 2
I’ve been transitioning from PNut (33p) to FlexGui/fastspin (4.0.6) and I came upon a problem with a CALLD instruction not working as I expected under PNut created binaries, while the FlexGui binary worked ok. When I compared the two binaries I noticed that PNut and FlexGui were producing different machine codes for the same CALLD instruction. While both PNut and FlexGui produced an instruction using the #rel20 coding, the PC offset generated by PNut was in longs instead of bytes. As a result, when the CALLD instruction is executed, the PC is not set to the target label’s address (see code below).
PNut and FlexGui also differ in their handling of the wc/wz/wcz effects on CALLD’s #rel9 encoding format. PNut generates an “effect is not allowed” error when a wc, wz, or wcz effect is requested, while FlexGui does not. From the documentation, the new c and z values are taken from S[31] and S[30] where the S value would be PC + rel9. When I ran the FlexGui binary the c and z flags were set to zero when the wcz effect was requested (see code below). It doesn’t seem that the S[31:30] bits can actually be set using a 9-bit relative offset.
As a relative newcomer, I would appreciate if those with more experience could confirm this CALLD behavior. Also, does anyone know the reasoning for using the #rel9 encoding for short branches? For relative CALLD instructions, it seems that using the #rel20 encoding for all cases when the D value is PA, PB, PTRA, or PTRB avoids the confusion of what to do with the wc, wz, and wcz effects.
PNut and FlexGui also differ in their handling of the wc/wz/wcz effects on CALLD’s #rel9 encoding format. PNut generates an “effect is not allowed” error when a wc, wz, or wcz effect is requested, while FlexGui does not. From the documentation, the new c and z values are taken from S[31] and S[30] where the S value would be PC + rel9. When I ran the FlexGui binary the c and z flags were set to zero when the wcz effect was requested (see code below). It doesn’t seem that the S[31:30] bits can actually be set using a 9-bit relative offset.
As a relative newcomer, I would appreciate if those with more experience could confirm this CALLD behavior. Also, does anyone know the reasoning for using the #rel9 encoding for short branches? For relative CALLD instructions, it seems that using the #rel20 encoding for all cases when the D value is PA, PB, PTRA, or PTRB avoids the confusion of what to do with the wc, wz, and wcz effects.
' Demo for CALLD differences under fastspin and PNut ' fastspin: all tests pass ' PNut: change commented instructions to assemble/pass tests ' removes wcz effect and changes expected values ' LEDs 56-58: test results (on == pass) ' 56: rel9 encoding with negative value ' 57: rel9 encoding with positive value ' 58: rel20 encoding with positive value DAT org hubset #0 ' use RCFAST (20Mhz) setq #2 ' move target code for rel20 branches rdlong cog_r20,##@cog_r20 main call #test_cog .L1 waitx ##2_500_000 jmp #.L1 cogstop #0 cog_r9m mov arg00,#1 ' CALLD target: negative rel9 value rczl arg00 wcz jmp PA test_cog drvl #56 ' LEDs 56-58 for test - assume success drvl #57 ' for PNut change commented instructions ' drvl #58 modcz _SET,_SET wcz calld PA,#cog_r9m wcz ' fastspin: use wcz cmp arg00,#4 wz ' fastspin: success ' calld PA,#cog_r9m ' PNut: avoid effect is not allowed error ' cmp arg00,#7 wz ' PNut: w/o wcz, arg00 is #7 if_nz drvh #56 modcz _SET,_SET wcz calld PA,#cog_r9 wcz ' fastspin: use wcz cmp arg00,#8 wz ' fastspin: success ' calld PA,#cog_r9 ' PNut: avoid effect is not allowed error ' cmp arg00,#11 wz ' PNut: w/o wcz, arg00 is #11 if_nz drvh #57 modcz _CLR,_CLR wcz calld PA,#cog_r20 ' fastspin: FE100400; PNut: FE100100 ' long $FE100100 ' fastspin: emulate PNut behavior cmp arg00,#$10 wz ' fastspin arg00 == $10 ' cmp arg00,#$20 wz ' PNut: arg00 == $20 if_nz drvh #58 ret cog_r9 mov arg00,#2 ' CALLD target: positive rel9 target rczl arg00 jmp PA long 0[58] ' array of NOPs cog_r20A ' CALLD target for PNut FE100100 encoding mov arg00,#8 rczl arg00 jmp PA result1 arg00 long 0 arg01 long 0 _pa long 0 org cog_r9+$100-3 cog_r20 mov arg00,#4 ' CALLD target: rel20 value rczl arg00 jmp PA cog_code_end
Comments
1. If the parser looks for PA/PB/PTRA/PTRB first and if found always encodes for rel20 then the wc, wz, and wcz effects would never be allowed.
2. If the parser sees these as symbols for registers $1F6 to $1F9, and then chooses the encoding based on the reach of the target, you'll get rel9 with effects allowed.
This is because the syntax doesn't specifically disambiguate these cases.
The effects can only clear the flags with rel9, as bits 31 and 30 of the source are clear.
I'd suggest that PNut is calculating the rel20 offset in longs because this is cog code (long addressed).
The routine you use to relocate the rel20 code is reading from hub rather than cog, so is that having an effect?
I can't understand why rczl should shift arg00 by a different amount in cog_r20 depending on the assembler; surely the result for both should be $20?
I agree that for PA/PB/PTRA/PTRB defaulting to rel20 (as PNut does) makes sense.
I'm confident that relocating the rel20 code by reading a copy from the hub is not having an effect on the test. I only posted a small portion of the code from my testing. I also tested and got the same behavior when the code was executing from the lut or hub. Also, the same machine instructions are generated by both assemblers for the CALLD PA,#rel20 instruction (with spinsim offset in bytes and PNut offset in longs). The code I posted for fastspin is consistent with Chip’s revB documentation and shows that the revB hardware expects the #rel20 offset to be in bytes. One oddity is that the revB document highlights the CALLD instruction as changed for revB but I couldn't find any differences.
The rczl instruction shifts arg00 by 2 and inserts the current CZ values in bits [1:0]. I used the rczl instruction to capture the state of the CZ flags after a CALLD instruction using the wcz effect. The cog_r20 subroutine is not shifting the arg00 by a different amount. For PNut created binaries, the $20 arg00 value results because the cog_r20 subroutine (mov arg00,#4) is not being invoked. Instead, based on the relative offset in longs, CALLD sets the PC to cog_r20A subroutine (mov arg00,#8).
PNut does not default to rel20 and also uses the rel9 encoding for short branches. I think using longs instead of bytes for the rel20 offset is a serious bug with PNut that needs to be fixed. As a developer, this bug is particularly pernicious because your code works for long time until you add some code that moves the target subroutine beyond 256 longs (the limit for the rel9 format) and all hell breaks loose.
To demonstrate the PNut behavior, I’ve attached a file with my code modified to compile under PNut. This version compiles under both fastspin and PNut. LED 58 is off under fastspin (cog_20 invoked), but on for PNut (cog_20A invoked).
I'm not seeing the indication that CALLD changed for revB. I might have missed it in my searching, so could you point me to it?
So, PEBCAK: I didn't follow your code properly.
Ok, I now understand better what you were saying.
So PNut's error message should really be a warning 'effect will clear relevant flag' or similar.
Rereading the Prop2 docs more carefully, the wording suggests that the relative offset is always meant to be in instructions for rel9, while from the examples given rel20 offsets should always in bytes:
I'd suggest this is to account for the fact that hubexec code needn't be long aligned.
It also suggests that with rel9, if you have short data segments inline with your hubexec code you might need to ensure instructions before and after the data blocks fall on the same byte offset from long-alignment if you use a rel9 branch to skip over the block, while it seems rel20 could handle any difference.
When using WCZ with CALLD the C and Z flags will always be cleared if an immediate source is used since all the bits above bit 8 are zero. The source would have to specify a cog memory location to be able to set C and Z to nonzero values.
As you discovered, PNut is generating a long address instead of a byte address when using rel20. This is clearly a bug. This bug may have been introduced at the time when there was discussion about how the LOC instruction should work within cog memory. LOC also uses rel20 encoding.
BTW, your cog_r20 routine will be located immediately after _pa in cog memory. So when you call cog_r20, your not actually jumping to the correct location in memory. You can use ORGF to pad the code with zeros so that cog_r20 is in the correct place. Using multiple ORG instructions is useful if you want to create multiple cog images in the same file, or you want to create an image for LUT memory, or if you want to create labels for cog locations.
While you're at this, please check if you've fixed the old LOC bug as well - https://forums.parallax.com/discussion/comment/1457051/#Comment_1457051
Well, this is embarrassing. I agree that CALLD was not changed in revB. I mistook the green highlights from my search for CALLD occurrences with Chip’s red highlights for revB changes. In my defense, at the beginning of my long trouble-shooting effort, I had assumed that PNut’s use of long offsets for rel20 was likely correct and that something must have changed in revB. Also, in searching for revB changes, I noticed some CALLD related comments in forum discussions of the revB changes. Clearly, I was predispose towards seeing CALLD changes in the documentation that weren’t really there.
Your alignment observation got me thinking and I wrote some test code with calls to subroutines that are not long aligned.
Under fastspin, the listing shows that the rel20 encoding is used for short and long branches. This binary should work correctly when executed from the hub, but should not be copied to a cog, because the instructions are not long-aligned.
When I enabled the ORG line for cog mode, the instructions were forced to be long aligned and the rel9 encoding was used for the short branch. This code could executed from the hub and work correctly.
I tried this under PNut and got similar behavior, but was surprised to see that in hub mode (without the ORG line) that the rel20 offsets were in bytes. In cog mode the rel20 offset was in longs.
I find all this complexity in the handling of the CALLD instruction somewhat confusing and still think it would make more sense to always use the #rel20 encoding for all cases when the D value is PA, PB, PTRA, or PTRB.