P2 Tricks, Traps & Differences between P1 (general discussion)

123457»

Comments

  • Thanks, Guys. I've got it now.
  • SKIPF and SKIP
    Special SKIPF Branching Rules
    From the manual...
    Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.

    For non-CALL\CALLPA\CALLPB branches within SKIPF sequences, SKIPF will work through all immediate-relative branches, which are the default for immediate branches within cog/LUT memory. If an absolute-address branch is being used (#\label, register, or RET, for example), you must not skip the first instruction after the branch. This is not a problem with immediate-relative branches, however, since the variable PC stepping works to advantage, by landing the PC at the first instruction of interest at, or beyond, the branch address.

    Today I was testing to see if I could nest subroutines while keeping the skip in place for return.
    Here is what I found (only tested in COG)...
    * SKIPF fails if the CALL is relative and the next instruction is to be skipped
    * SKIP works correctly even if the call is relative (at least my test did)
    * SKIPF and SKIP both work correctly if the call is absolute (ie #\label)
    * When it works, 2 level nesting works (ie the CALLed routine makes another CALL.

    Here is an extract of the code I used
    000d8 036 00 C0 07 F6 |               mov       lmm_x, #0
    000dc 037 32 18 64 FD |               skipf     #%0000_1100                     ' SKIPF result is $0000_0C31 - WRONG!!!
    000e0 038             | '             skip      #%0000_1100                     ' SKIP  result is $0000_FE31 - correct
    000e0 038 01 C0 47 F5 |               or        lmm_x, #%0000_0001              ' xxxx xxxx xxx0                              
    000e4 039 1C 00 B0 FD |               call      #sr1                            ' xxxx xxxx xx0x
    000e8 03a 04 C0 47 F5 |               or        lmm_x, #%0000_0100              ' xxxx xxxx x1xx skip
    000ec 03b 08 C0 47 F5 |               or        lmm_x, #%0000_1000              ' xxxx xxxx 1xxx skip
    000f0 03c 10 C0 47 F5 |               or        lmm_x, #%0001_0000              ' xxxx xxx0 xxxx
    000f4 03d 20 C0 47 F5 |               or        lmm_x, #%0010_0000              ' xxxx xx0x xxxx
    000f8 03e             | 
    000f8 03e 28 CB AF FD |               call      #_hubHex8
    000fc 03f E4 CA AF FD |               call      #_hubTxCR
    00100 040 78 CD 8F FD |               jmp       #_hubMonitor
    00104 041             |               
    00104 041             | sr1
    00104 041 1C 00 B0 FD |               call      #sr2                            '\ gets skipped if SKIPF and CALL #sr1 is relative
    00108 042 01 00 00 FF 
    0010c 043 00 C0 47 F5 |               or        lmm_x, ##%0010_0000_0000        '/ gets skipped if SKIPF and CALL #sr1 is relative
    00110 044 02 00 00 FF 
    00114 045 00 C0 47 F5 |               or        lmm_x, ##%0100_0000_0000
    00118 046 04 00 00 FF 
    0011c 047 00 C0 47 F5 |               or        lmm_x, ##%1000_0000_0000
    00120 048 2D 00 64 FD |               ret               
    00124 049             |                                 
    00124 049             | sr2           or        lmm_x, ##%0001_0000_0000_0000
    00124 049 08 00 00 FF 
    00128 04a 00 C0 47 F5 
    0012c 04b 10 00 00 FF 
    00130 04c 00 C0 47 F5 |               or        lmm_x, ##%0010_0000_0000_0000
    00134 04d 20 00 00 FF 
    00138 04e 00 C0 47 F5 |               or        lmm_x, ##%0100_0000_0000_0000
    0013c 04f 40 00 00 FF 
    00140 050 00 C0 47 F5 |               or        lmm_x, ##%1000_0000_0000_0000
    00144 051 2D 00 64 FD |               ret
    00148 052             | 
    00148 052             | ' SKIPF result is $0000_0C31
    00148 052             | ' SKIP  result is $0000_FE31 - correct 
    
    Using absolute addressing works
    000e4 039 41 00 A0 FD |               call      #\sr1                           ' xxxx xxxx xx0x     'SKIPF result is $0000_FE31 - correct
    
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • evanhevanh Posts: 8,072
    edited 2019-10-08 - 10:55:27
    I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.
    '===============================================
    'Emit string from immediate code in hubRAM
    '  input:  (hardware call stack) - hubRAM address of string
    ' result:  (none)
    'scratch:  pb, temp1
    '
    putsi
    		mov	temp1, ptra		'preserve existing PTRA
    		pop	ptra			'address of immediate data following the CALL (includes the calling C/Z flags)
    .loop
    		rdbyte	pb, ptra++	wz	'get next charater, Z sets with null termination
    	if_nz	call	#putch			'emit character
    	if_nz	jmp	#.loop
    		push	ptra			'update return address to instruction following the null character
    		mov	ptra, temp1		'restore prior PTRA
    		ret			wcz	'calling C/Z preserved
    
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • evanhevanh Posts: 8,072
    edited 2019-10-15 - 00:26:59
    I've been getting myself in trouble with concurrent cordic ops. It's quite cool firing it off and coming back later to collect the results ... but, if for example I add in some debug type code, I find I'm breaking things too easy now because all my decimal printing is using the cordic divide operation.

    So my first step to tidying this up a little is to at least make the printing routines themselves reliable in this scenario. The trick here is how to know you are getting the newest result - from print's QDIV operation. A little experimenting later and two instructions does it, eg:
    emitclkfrq
    		qdiv	clk_freq, ##1_000_000
    		pollqmt					'clear old event
    .flushloop
    		getqx	pa				'MHz whole number - at final pipeline result
    		jnqmt	#.flushloop			'wait for QMT flag - CORDIC pipeline flushed
    
    		getqy	temp2				'six decimal places
    		...
    

    EDIT: PS: I fixed me problem. It was a bug, I wasn't clearing the event flag before using. I keep forgetting that the things that set the event flags, don't reset them.

    EDIT2: It's in contrast to the straight through code that assumes the pipeline is empty prior to routine call. Which would be coded like this instead:
    emitclkfrq
    		qdiv	clk_freq, ##1_000_000
    		getqx	pa				'MHz whole number
    		getqy	temp2				'six decimal places
    		...
    

    My first attempt was to check before use, but that immediately annoyed me as bloaty code. eg:
    emitclkfrq
    		pollqmt					'clear old event
    .flushloop
    		getqx	inb
    		jnqmt	#.flushloop			'wait for QMT flag - CORDIC pipeline flushed
    
    		qdiv	clk_freq, ##1_000_000
    		getqx	pa				'MHz whole number - at final pipeline result
    		getqy	temp2				'six decimal places
    		...
    
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • evanh wrote: »
    I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.
    '===============================================
    'Emit string from immediate code in hubRAM
    '  input:  (hardware call stack) - hubRAM address of string
    ' result:  (none)
    'scratch:  pb, temp1
    '
    putsi
    		mov	temp1, ptra		'preserve existing PTRA
    		pop	ptra			'address of immediate data following the CALL (includes the calling C/Z flags)
    .loop
    		rdbyte	pb, ptra++	wz	'get next charater, Z sets with null termination
    	if_nz	call	#putch			'emit character
    	if_nz	jmp	#.loop
    		push	ptra			'update return address to instruction following the null character
    		mov	ptra, temp1		'restore prior PTRA
    		ret			wcz	'calling C/Z preserved
    

    Interesting. Because PTRA++ only increments the lower 20 bits, and the upper bits remain unchanged.
    Certainly a nice way to pass parameters.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • evanhevanh Posts: 8,072
    edited 2019-10-29 - 11:15:45
    A trap with the smartpin pulse out modes: This applies to pulse %00100 and transition %00101 out modes at least. Presumably also applies to all DAC, NCO and PWM modes as well. It really only affects pulse and transition modes though because they have an end count of pulses.

    The "base period" is a metronomic clock from when the smartpin mode is first configured. This stays actively ticking within the smartpin even if the smartpin is not generating pulses. EDIT: What this means is that when WYPIN issues more pulses to generate, the smartpin is not instruction aligned but rather will start the pulse generation at the beginning of the next base period.

    Most of the time this detail can be ignored. But I've been playing around with aligning a streamer bursting of SPI data out to coincide with a smartpin emulating a SPI clock. This means, because of the base period effect, the SPI clock pin will then have an unpleasant alignment dither with respect to the SPI data pin if the smartpin is not reconfigured for each burst. A disable/enable combo is not enough.


    PS: It maybe possible to give the streamer the same "base period" and using XCONT instead of XINIT for each burst to duplicate the smartpin's behaviour. Not something I've tried out yet ...

    PPS: Correction: Along with a compensation, clearing out the chaff allowed a DIRL+DIRH combo on the SPI clock smartpin to do the job. XCONT wasn't the answer.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • roglohrogloh Posts: 1,446
    edited 2019-10-29 - 05:20:25
    Just found something weird in testing some video driver code and hitting a bug I had to solve which took me a while.

    When you copy ptrb to ptra the upper bits in ptra are somehow lost/trashed. This code fails:
            mov     ptra, ptrb              'make a copy to preserve things
            ...
            getnib  a, ptra, #5             'extract pin group 
    
    which behaves differently to this code below, which works.
            mov     pb, ptrb              'make a copy to preserve things
            ...
            getnib  a, pb, #5             'extract pin group 
    

    The snipped ... code in the middle is innocuous and doesn't ever access ptra.
  • rogloh wrote: »
    When you copy ptrb to ptra the upper bits in ptra are somehow lost/trashed. This code fails:
            mov     ptra, ptrb              'make a copy to preserve things
            ...
            getnib  a, ptra, #5             'extract pin group 
    

    Not seeing that here. Here's my test code:
    		mov	bcdlen, #8
    		mov	count, #10
    
    .loop
    		getrnd	ptrb
    		mov     ptra, ptrb
    
    		getnib	pa, ptra, #5
    		call	#itoh
    		call	#putsp
    
    		mov	pa, ptra
    		call	#itoh
    		call	#putsp
    
    		mov	pa, ptrb
    		call	#itoh
    		call	#putsp
    
    		getnib	pa, ptrb, #5
    		call	#itoh
    		call	#putnl
    
    		djnz	count, #.loop
    		jmp	#$
    

    and output:
    00000003   aa30f2d5   aa30f2d5   00000003
    00000006   c865965d   c865965d   00000006
    00000000   9704b86f   9704b86f   00000000
    00000002   ac2d8dcd   ac2d8dcd   00000002
    0000000c   bdc56ccd   bdc56ccd   0000000c
    0000000f   a8fcdcb5   a8fcdcb5   0000000f
    00000009   909f3ad5   909f3ad5   00000009
    00000001   131cc72a   131cc72a   00000001
    00000006   e9611c4d   e9611c4d   00000006
    0000000e   49e2fc62   49e2fc62   0000000e
    
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Well it definitely happens to me.

    I removed all code in the ... part to rule anything else out.

    This works fine:
                mov     pb, ptrb                  'make a copy to preserve things
                getnib  a, pb, #5                 'extract pin group 
    
    This does not
                mov     ptra, ptrb                'make a copy to preserve things
                getnib  a, ptra, #5               'extract pin group 
    
    neither does this...
                getnib  a, ptrb, #5               'extract pin group 
    

    Next time ptra gets accessed later in my code it is overwritten with a new value anyway so leaving residual data in it is not causing problems.. And it doesn't have to using be the pb register to somehow inadvertently make it work, other general registers work too instead of pb. It just seems using ptra or ptrb doesn't work here with getting upper nibbles, somehow the upper bits get lost. I thought these registers were meant to still be 32 bits.

    ps. I am executing this code from LUT RAM in case that could possibly make any difference...?
  • Lutexec is fine for me.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Are you using rev A or rev B?
  • evanhevanh Posts: 8,072
    edited 2019-10-29 - 08:43:12
    revB at the moment. After earlier confusions with revA vs revB vs FPGA I have it list a few crucial detected parameters on each run. First text emitted of all recent runs:
    Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
    Rev B silicon.  Sysclock 4.0000 MHz
    
    
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Not seeing fault here either Roger.
    Might be worth checking compiler output.
    I'm running Pnut and I think evan runs fastspin?
    IIRC your running P2ASM?
    Melbourne, Australia
  • I am running P2ASM and I have been overclocking somewhat in the 252-308MHz range. I'll check the P2ASM output to make sure it is not generating bad opcodes.
  • Yes, I'm using fastspin almost exclusively these days. I tested mine up to 395 MHz without issue. No issue with the data values at 400 MHz but it does crash as expected on repeated runs.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • roglohrogloh Posts: 1,446
    edited 2019-10-29 - 09:58:52
    Bad:
    00910 303 f603f1f9             mov     ptra, ptrb                 'make a copy to preserve things
    0094c 312 f86f1500             getnib  a, ptra, #5               'extract pin group 
    
    Good:
    00910 303 f603eff9             mov     pb, ptrb                'make a copy to preserve things
    0094c 312 f86b15f7             getnib  a, pb, #5               'extract pin group 
    
    The S address in "getnib a, ptra, #5" looks a bit weird if it's $100. Seems bad and almost like it's using the GETNIB D form, but not quite.

    Seems this is a bug in P2ASM @Dave Hein are you still doing bug fixes? Actually I am running v0.016. I'd better check I'm up to the latest.

    Update: Yes, I think it is the latest version on github

    https://github.com/davehein/p2gcc/blob/master/p2asm_src/p2asm.c
  • Dave HeinDave Hein Posts: 5,958
    edited 2019-10-29 - 13:56:19
    I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.

    EDIT: This is now fixed in version 0.017.
  • Dave Hein wrote: »
    I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.

    EDIT: This is now fixed in version 0.017.

    Blast! I wish I had read this thread before releasing the latest version of Catalina!

    Serves me right for not keeping up to date :)
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • evanhevanh Posts: 8,072
    edited 2019-10-31 - 14:08:43
    Is anyone supporting programming of the boot Flash EEPROM on board the Eval boards in their tools? Cluso, Chip, and Peter I think, worked out a pinout convention for having both SD and SPI bootable components on same four pins, P58-61. Chip has them documented in the prop2 doc. I presume Peter also uses same pinout for P2D2 boards.

    PS: I've had very good success in tuning up Brian's demo code to make the booting very fast even for large binaries - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
    Perhaps the download authors can comment please ???
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.
  • Cluso99 wrote: »
    While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
    Perhaps the download authors can comment please ???

    Cluso, were any of your ROM based SD init & write sector routines made available in a callable manner? If so it might be more straightforward to load and run some very small SD downloader PASM into the P2 somewhat like it did with its MainLoader1.spin that can access these routines and then we can write directly to a file, instead of developing an entire SD handling object first before that will be possible.

    I know we can yank the SD card and write it in a PC etc, but on the P2-EVAL getting the microSD in and out becomes a chore fast and is not that ideal during development. I think there are some extender cards available that would help with that.
  • Catalina has a command that can program any .bin file into the FLASH RAM on the P2 EVAL board. It uses a version of the Flash_Loader_1.2 by ozpropdev.

    See the command "flash_payload"
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • evanhevanh Posts: 8,072
    edited 2019-11-01 - 00:29:36
    RossH wrote: »
    Catalina has a command that can program any .bin file into the FLASH RAM on the P2 EVAL board. It uses a version of the Flash_Loader_1.2 by ozpropdev.

    See the command "flash_payload"

    Good to hear. That should make it easy to integrate what I've done with speeding up the booting loader code.

    I did also rework Brian's low-level serial programming routines but all that can be ignored. It was from when I was trying to figure out why nothing was working on the revB Eval board. Turns out I had a non-soldered CS pin on the Flash chip.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • ersmith wrote: »
    loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.
    Ah, just noticed this as a request. Oz posted this a while back - https://forums.parallax.com/discussion/169608/prop2-flash-loader/p1
    On the second page I had been reworking the low level reads for booting to get max loading speed - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Cluso,
    Oh, that's not working for me - https://forums.parallax.com/discussion/comment/1482105/#Comment_1482105

    You've got three types of "try"s. A column, 3 lines, and individual grid entries. What's the differences?

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
Sign In or Register to comment.