evanh, could you explain in more detail what you're suggesting? The ORG directive currently tells the assembler to go into cog mode, and it sets the starting cog address. What else should it do?
I'm fine with ORG, that was just a passing remark. It's LOC that needs the work.
EDIT: I'd call it a base address rather than start address. "Start" might be mistaken for start of execution.
EDIT2: Hmm, base is wrong too, ORG is not a relative thing at all. Section origin then.
OK, I was wrong. I ran this under spinsim, and I got PA=$1020 and PB=$1030. This may be correct, or p2asm may be wrong, or spinsim might be wrong. I'll have to check the binary with PNut's binary.
OK, I was wrong. I ran this under spinsim, and I got PA=$1020 and PB=$1030. This may be correct, or p2asm may be wrong, or spinsim might be wrong. I'll have to check the binary with PNut's binary.
p2asm looks OK, it produces the same thing as fastspin does, and when I run the result on the FPGA both pa and pb have the same value. I've attached the code I used for testing: foo.bas is the original source, foo.spin2 is the raw PASM produced by fastspin, foo.lst is the listing file that p2asm produces when it compiles foo.spin2. The output is:
which is correct (1064 = $428, which is where the label ends up in memory).
Note that there's a bug in fastspin 3.9.10 such that it cannot handle @ and \ in inline assembly. That's fixed in the current github sources, so you'll need to use those if you want to regenerate foo.spin2.
00400- 080090 FE 0C 04 A0 FE 030064 FD 00000000 '..........d.....'
I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.
00400- 080090 FE 0C 04 A0 FE 030064 FD 00000000 '..........d.....'
I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.
00400- 080090 FE 0C 04 A0 FE 030064 FD 00000000 '..........d.....'
I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.
(Try it!)
WHAT ?!?!
Look at foo.spin2 and/or foo.lst that I posted a few pages back (that's foo.bas converted to PASM2 by fastspin). The relevant instructions are:
Note that the first loc is encoded as $fe90001c (relative addressing) whereas the second loc is encoded as $fea00428 (absolute addressing). At runtime they both put $428 into the respective registers, as is proven by the program output.
The reason is simple: the PC relative "loc" instruction adds the next PC (PC+4) to the offset to get the value to put into the register, just like a relative "jmp" adds the next PC to the offset to get the new PC. So the first loc, at address $408, adds $40c to the offset $1c to get the final value $428.
Note that it isn't *just* the offset that is different in the two loc encodings, there's actually a bit in the instruction that says whether the offset is absolute or relative.
You should be able to assemble and run foo.spin2 with PNut to verify this. Actually maybe not, it may use @@@, so you may have to use fastspin or p2asm. But all 3 assemblers agree about the encoding of the LOC instructions, so this isn't some quirk of fastspin or p2asm, it's the way the hardware works.
The reason is simple: the PC relative "loc" instruction adds the next PC (PC+4) to the offset to get the value to put into the register, just like a relative "jmp" adds the next PC to the offset to get the new PC. So the first loc, at address $408, adds $40c to the offset $1c to get the final value $428.
Oh, oops, I've not been examining the final register content ... and I was convinced I was too, damn ...
I get $40C for both cases when running on the FPGA. However, spinsim seems to be confused. It produces $1020 and $1030. It's shifting the value up by 2 bits, which means it must think it's in the COG mode.
If I move the routine to a different location other than $400 I get the correct value in the relative mode, but an incorrect value in the absolute mode. This kind of shows the value of having position-independent-code. It appears that my linker isn't adjusting the address for the absolution mode. It doesn't surprise me since I don't recall handling relocation for the LOC command.
I'm going to take the warning print out for the LOC command.
Apologies on the PC-relative complaint. I was way off there.
There is still the bug in Pnut though. It is in the cogexec LOC instruction encoding for PC-relative encoding below absolute $400. I guess that's where Cluso came unstuck and got me digging.
Here's another one:
I've just been experimenting with building some diagnostic code and discovered it would be nice to know if the caller code was from cogexec or hubexec. A third status bit in the stacked address maybe.
In this case I'm wanting a subroutine to extract the encoding of the instruction prior to the call. If I don't know whether the caller was in cogexec at the time or not then I can't calculate the relative address from the call stack.
EDIT: Ah, forgot that code can't execute below $400 in hubRAM. That should be enough ...
EDIT2: And working source code:
pop char 'grab caller addresspush char 'restack itcmp char, ##$400wcz'test if caller was cogexec or hubexec, C = borrow of (D - S)if_csub char, #2'was cogexecif_calts char 'MOV indirection - get register content of register number in "char"if_cmovpa, 0-0if_ncsub char, #8'was hubexecif_ncrdlongpa, char
Wow, that detail is needed. Pnut is making a mess of the PC-relative LOC encodings. Only six of the twelve PC-relative combinations above is correct. Even two of the hubexec encodings ($fe800110 is absolute encoding) is wrong because it is using absolute encoding below $400 in hubRAM where it should still be PC-relative.
Or is that case intentional because hubexec can't go there?
Wow, that detail is needed. Pnut is making a mess of the PC-relative LOC encodings. Only six of the twelve PC-relative combinations above is correct. Even two of the hubexec encodings ($fe800110 is absolute encoding) is wrong because it is using absolute encoding below $400 in hubRAM where it should still be PC-relative.
Or is that case intentional because hubexec can't go there?
LOC is also usable to obtain the hub address of a table, which can reside below or above hub $400.
That is one of the problems I found - the hard way as I wasted a whole day trying to find a bug in my program.
Chip,
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.
In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.
PS: I'm very certain. Testing is on P123 board with v32i image loaded.
Chip,
I think I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.
In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.
Do you mean RDLUT from COGn, occurring on the same address, and on the same sysclk as the WRLUT from COGm, is corrupted ?
ie it is neither the old value, nor the new value ?
Do you have come test code that reproduces this ?
ie it is neither the old value, nor the new value ?
Definitely not the new value. I don't think an old value could upset things the way it has because it will be the same every time and the importance of the data is metronomic ...
Comments
EDIT: I'd call it a base address rather than start address. "Start" might be mistaken for start of execution.
EDIT2: Hmm, base is wrong too, ORG is not a relative thing at all. Section origin then.
That really isn't true. One of the things chip made explicit early on was the fact that data can be intermingled with code.
I still don't quite see the danger of using LOC with a relative address, at least one above $400. After this code:
orgh $400 loc pa, #@label ' relative addressing loc pb, #\@label ' absolute addressing cogstop #0 label long 1
PA and PB should have the same value. Am I missing something?dat 00400 orgh $400 3: WARNING: Relative mode used with LOC instruction 00400 fe900008 loc pa, #@label ' relative addressing 00404 fea0040c loc pb, #\@label ' absolute addressing 00408 fd640003 cogstop #0 0040c label 0040c 00000001 long 1
As I said before, the might be value in using the difference for position-independent code.Edit: typo
p2asm looks OK, it produces the same thing as fastspin does, and when I run the result on the FPGA both pa and pb have the same value. I've attached the code I used for testing: foo.bas is the original source, foo.spin2 is the raw PASM produced by fastspin, foo.lst is the listing file that p2asm produces when it compiles foo.spin2. The output is:
$ bin/loadp2 foo.binary -t [ Entering terminal mode. Press ESC to exit. ] getting values pa=1064 pb= 1064
which is correct (1064 = $428, which is where the label ends up in memory).Note that there's a bug in fastspin 3.9.10 such that it cannot handle @ and \ in inline assembly. That's fixed in the current github sources, so you'll need to use those if you want to regenerate foo.spin2.
orgh $400 loc pa, #@label ' relative addressing loc pb, #\@label ' absolute addressing cogstop #0 label
shows PA = $8 PB = $40c00400- 08 00 90 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00 '..........d.....'
but this code shows PA = $40C and PB = $40C
orgh $400 org loc pa,#@label loc pb,#\@label cogstop #0 label
shows00400- 0C 04 80 FE 0C 04 A0 FE 03 00 64 FD 00 00 00 00 '..........d.....'
Edit: Pnut switches to absolute because the ORG directive causes a domain crossiing.
I think you're confusing the instruction encoding with what is actually put in the register when the instruction executes. If you execute the "relative addressing" version of the loc instruction, the PC after the instruction ($404) is added to the offset ($8 in this case) to get the final value of $40c. In other words, at run time PA and PB will end up with the same value of $40c in them when the two instructions execute.
(Try it!)
WHAT ?!?!
Look at foo.spin2 and/or foo.lst that I posted a few pages back (that's foo.bas converted to PASM2 by fastspin). The relevant instructions are:
' ' sub getlabelvals() 00408 _getlabelvals ' asm 00408 fe90001c loc pa, #@label 0040c fea00428 loc pb, #\@label 00410 f6006df6 mov _var_00, pa 00414 f6006ff7 mov _var_01, pb ' paval = x 00418 fc606c2b wrlong _var_00, objptr ' pbval = y 0041c f1045604 add objptr, #4 00420 fc606e2b wrlong _var_01, objptr 00424 f1845604 sub objptr, #4 ' label: 00428 label 00428 _getlabelvals_ret 00428 fd64002e reta
Note that the first loc is encoded as $fe90001c (relative addressing) whereas the second loc is encoded as $fea00428 (absolute addressing). At runtime they both put $428 into the respective registers, as is proven by the program output.The reason is simple: the PC relative "loc" instruction adds the next PC (PC+4) to the offset to get the value to put into the register, just like a relative "jmp" adds the next PC to the offset to get the new PC. So the first loc, at address $408, adds $40c to the offset $1c to get the final value $428.
Note that it isn't *just* the offset that is different in the two loc encodings, there's actually a bit in the instruction that says whether the offset is absolute or relative.
You should be able to assemble and run foo.spin2 with PNut to verify this. Actually maybe not, it may use @@@, so you may have to use fastspin or p2asm. But all 3 assemblers agree about the encoding of the LOC instructions, so this isn't some quirk of fastspin or p2asm, it's the way the hardware works.
Oh, oops, I've not been examining the final register content ... and I was convinced I was too, damn ...
If I move the routine to a different location other than $400 I get the correct value in the relative mode, but an incorrect value in the absolute mode. This kind of shows the value of having position-independent-code. It appears that my linker isn't adjusting the address for the absolution mode. It doesn't surprise me since I don't recall handling relocation for the LOC command.
I'm going to take the warning print out for the LOC command.
=================================================== LOC/MOV syntax PA register results from hubRAM ORG $0F ORGH $110 ORGH $600 =================================================== loc pa, #label 0000000f 00000110 00000600 loc pa, #@label 0000003c 00000110 00000600 loc pa, #\label 0000000f 00000110 00000600 loc pa, #\@label 0000003c 00000110 00000600 mov pa, ##label 0000000f 00000110 00000600 mov pa, ##@label 0000003c 00000110 00000600 =================================================== LOC/MOV syntax PA register results from cogRAM ORG $0F ORGH $110 ORGH $600 =================================================== loc pa, #label 000fffee 000003e9 00000600 loc pa, #@label 00000081 000003c8 00000600 loc pa, #\label 0000000f 00000110 00000600 loc pa, #\@label 0000003c 00000110 00000600 mov pa, ##label 0000000f 00000110 00000600 mov pa, ##@label 0000003c 00000110 00000600
loc pa, #\@str1 call #puts loc pa, #palette1 call #itoh call #putsp loc pa, #palette2 call #itoh call #putsp loc pa, #palette3 call #itoh call #putnl
There is still the bug in Pnut though. It is in the cogexec LOC instruction encoding for PC-relative encoding below absolute $400. I guess that's where Cluso came unstuck and got me digging.
I've just been experimenting with building some diagnostic code and discovered it would be nice to know if the caller code was from cogexec or hubexec. A third status bit in the stacked address maybe.
In this case I'm wanting a subroutine to extract the encoding of the instruction prior to the call. If I don't know whether the caller was in cogexec at the time or not then I can't calculate the relative address from the call stack.
EDIT: Ah, forgot that code can't execute below $400 in hubRAM. That should be enough ...
EDIT2: And working source code:
pop char 'grab caller address push char 'restack it cmp char, ##$400 wcz 'test if caller was cogexec or hubexec, C = borrow of (D - S) if_c sub char, #2 'was cogexec if_c alts char 'MOV indirection - get register content of register number in "char" if_c mov pa, 0-0 if_nc sub char, #8 'was hubexec if_nc rdlong pa, char
LOC/MOV syntax ORG $00F ORGH $00110 ORGH $00600 from hubRAM op-code PA-data op-code PA-data op-code PA-data ============================================================================== loc pa, #label fe80000f 0000000f fe800110 00000110 fe900198 00000600 loc pa, #@label fe80003c 0000003c fe800110 00000110 fe900174 00000600 loc pa, #\label fe80000f 0000000f fe800110 00000110 fe800600 00000600 loc pa, #\@label fe80003c 0000003c fe800110 00000110 fe800600 00000600 mov pa, ##label f607ec0f 0000000f f607ed10 00000110 f607ec00 00000600 mov pa, ##@label f607ec3c 0000003c f607ed10 00000110 f607ec00 00000600 LOC/MOV syntax ORG $00F ORGH $00110 ORGH $00600 from cogRAM op-code PA-data op-code PA-data op-code PA-data ============================================================================== loc pa, #label fe9fffe0 000ffff7 fe9003dc 000003f5 fe800600 00000600 loc pa, #@label fe900070 00000090 fe9003b8 000003da fe800600 00000600 loc pa, #\label fe80000f 0000000f fe800110 00000110 fe800600 00000600 loc pa, #\@label fe80003c 0000003c fe800110 00000110 fe800600 00000600 mov pa, ##label f607ec0f 0000000f f607ed10 00000110 f607ec00 00000600 mov pa, ##@label f607ec3c 0000003c f607ed10 00000110 f607ec00 00000600
Or is that case intentional because hubexec can't go there?
That is one of the problems I found - the hard way as I wasted a whole day trying to find a bug in my program.
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.
In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.
PS: I'm very certain. Testing is on P123 board with v32i image loaded.
Do you mean RDLUT from COGn, occurring on the same address, and on the same sysclk as the WRLUT from COGm, is corrupted ?
ie it is neither the old value, nor the new value ?
Do you have come test code that reproduces this ?
It's messy and non-specific.
'================================== ' paired mailbox '================================== ORG $3fe monitor res 1 duration res 1 '================================== ' Sinc3 filter (cogexec, paired) '================================== ORG start_sinc3 cid cogid cid testb cid, #0 wz if_z jmp #start_monitor 'identical code on paired cogs wrpin ##%0111_0000_000_0000100000000_00_01111_0, #mpin 'adc/counter mode, bitstream is #tpin, clock input is #mpin (#tpin+1) wypin #0, #mpin 'inc on high wxpin #0, #mpin 'totaliser dirh #mpin 'enable smart pin 'Sinc3 loop (8 sysclocks) rep @.lend, #0 'loop forever rdpin acc1, #mpin add acc2, acc1 add acc3, acc2 wrlut acc3, #(monitor & $1ff) 'for the decimator (lut sharing is active) ' add acc4, acc3 ' wrlut acc4, #(monitor & $1ff) 'for the decimator (lut sharing is active) .lend cogstop cid acc1 long 0 acc2 long 0 acc3 long 0 acc4 long 0 diff1 long 0 diff2 long 0 diff3 long 0 diff4 long 0 period long 400 'max 1024 clocks, needs 30-bit registers '============================ ' Monitor (cogexec, paired) '============================ start_monitor lutson samp wrpin ##%1010000000000_00_00010_0, #1 'set DAC mode for DAC1/Blue, 16-bit dither smartpin mode offset wxpin #1, #1 'continuous dither scale wypin #0, #1 'DAC level dirh #1 'enable DAC delay setse1 #%110_000000|rx_pin 'high IN from smartpin tick getct tick key addct1 tick, period 'start the clock! 'keyboard controls '=================== .keyboard rdpin key, #rx_pin shr key, #32-8 'lsbit align 'scaling recalculation encod scale, period shr scale, #1 mul scale, #8 cmp key, #"+" wz if_z add offset, #511 cmp key, #"-" wz if_z sub offset, #511 cmp key, #"*" wz if_z add period, scale cmp key, #"/" wz if_z sub period, scale fles period, #504 'max of 1024 clocks, multiple of 8 fges period, #32 'fastest monitor loop time, multiple of 8 mov delay, period sub delay, #25 waitx #6 'lutRAM sharing flaw!! Change the #6 to hide it 'monitor loop '============== .monl ' waitct1 ' addct1 tick, period ' rep @.monend, #0 'loop forever waitx delay '2 rdlut samp, #(monitor & $1ff) '5 sub samp, diff1 '7 add diff1, samp '9 sub samp, diff2 '11 add diff2, samp '13 sub samp, diff3 '15 add diff3, samp '17 ' sub samp, diff4 ' add diff4, samp 'non-smartpin mode ' shr samp, scale ' scale signal to fit DAC trace ' add samp, offset ' offset centring ' shl samp, #8 ' align for 8-bit DAC1 ' setdacs samp 'smartpin mode ' shr samp, scale ' scale signal to fit DAC trace add samp, offset '19 offset centring wypin samp, #1 '21 smartpin 16-bit dither into DAC1 ' jse1 #.keyboard '23 branch on event reduces monitor loop by one instruction .monend jnse1 #.monl '25 branch on event reduces monitor loop by one instruction jmp #.keyboard cogstop cid