P2 Tricks, Traps & Differences between P1 (general discussion)

cgracey · 2019-06-01 05:54

Thanks, Guys. I've got it now.

Cluso99 · 2019-09-19 04:02

SKIPF and SKIP
Special SKIPF Branching Rules
From the manual...

Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.

For non-CALL\CALLPA\CALLPB branches within SKIPF sequences, SKIPF will work through all immediate-relative branches, which are the default for immediate branches within cog/LUT memory. If an absolute-address branch is being used (#\label, register, or RET, for example), you must not skip the first instruction after the branch. This is not a problem with immediate-relative branches, however, since the variable PC stepping works to advantage, by landing the PC at the first instruction of interest at, or beyond, the branch address.

Today I was testing to see if I could nest subroutines while keeping the skip in place for return.
Here is what I found (only tested in COG)...
* SKIPF fails if the CALL is relative and the next instruction is to be skipped
* SKIP works correctly even if the call is relative (at least my test did)
* SKIPF and SKIP both work correctly if the call is absolute (ie #\label)
* When it works, 2 level nesting works (ie the CALLed routine makes another CALL.

Here is an extract of the code I used

000d8 036 00 C0 07 F6 |               mov       lmm_x, #0
000dc 037 32 18 64 FD |               skipf     #%0000_1100                     ' SKIPF result is $0000_0C31 - WRONG!!!
000e0 038             | '             skip      #%0000_1100                     ' SKIP  result is $0000_FE31 - correct
000e0 038 01 C0 47 F5 |               or        lmm_x, #%0000_0001              ' xxxx xxxx xxx0                              
000e4 039 1C 00 B0 FD |               call      #sr1                            ' xxxx xxxx xx0x
000e8 03a 04 C0 47 F5 |               or        lmm_x, #%0000_0100              ' xxxx xxxx x1xx skip
000ec 03b 08 C0 47 F5 |               or        lmm_x, #%0000_1000              ' xxxx xxxx 1xxx skip
000f0 03c 10 C0 47 F5 |               or        lmm_x, #%0001_0000              ' xxxx xxx0 xxxx
000f4 03d 20 C0 47 F5 |               or        lmm_x, #%0010_0000              ' xxxx xx0x xxxx
000f8 03e             | 
000f8 03e 28 CB AF FD |               call      #_hubHex8
000fc 03f E4 CA AF FD |               call      #_hubTxCR
00100 040 78 CD 8F FD |               jmp       #_hubMonitor
00104 041             |               
00104 041             | sr1
00104 041 1C 00 B0 FD |               call      #sr2                            '\ gets skipped if SKIPF and CALL #sr1 is relative
00108 042 01 00 00 FF 
0010c 043 00 C0 47 F5 |               or        lmm_x, ##%0010_0000_0000        '/ gets skipped if SKIPF and CALL #sr1 is relative
00110 044 02 00 00 FF 
00114 045 00 C0 47 F5 |               or        lmm_x, ##%0100_0000_0000
00118 046 04 00 00 FF 
0011c 047 00 C0 47 F5 |               or        lmm_x, ##%1000_0000_0000
00120 048 2D 00 64 FD |               ret               
00124 049             |                                 
00124 049             | sr2           or        lmm_x, ##%0001_0000_0000_0000
00124 049 08 00 00 FF 
00128 04a 00 C0 47 F5 
0012c 04b 10 00 00 FF 
00130 04c 00 C0 47 F5 |               or        lmm_x, ##%0010_0000_0000_0000
00134 04d 20 00 00 FF 
00138 04e 00 C0 47 F5 |               or        lmm_x, ##%0100_0000_0000_0000
0013c 04f 40 00 00 FF 
00140 050 00 C0 47 F5 |               or        lmm_x, ##%1000_0000_0000_0000
00144 051 2D 00 64 FD |               ret
00148 052             | 
00148 052             | ' SKIPF result is $0000_0C31
00148 052             | ' SKIP  result is $0000_FE31 - correct

Using absolute addressing works

000e4 039 41 00 A0 FD |               call      #\sr1                           ' xxxx xxxx xx0x     'SKIPF result is $0000_FE31 - correct

evanh · 2019-10-08 10:48

I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.

'===============================================
'Emit string from immediate code in hubRAM
'  input:  (hardware call stack) - hubRAM address of string
' result:  (none)
'scratch:  pb, temp1
'
putsi
		mov	temp1, ptra		'preserve existing PTRA
		pop	ptra			'address of immediate data following the CALL (includes the calling C/Z flags)
.loop
		rdbyte	pb, ptra++	wz	'get next charater, Z sets with null termination
	if_nz	call	#putch			'emit character
	if_nz	jmp	#.loop
		push	ptra			'update return address to instruction following the null character
		mov	ptra, temp1		'restore prior PTRA
		ret			wcz	'calling C/Z preserved

evanh · 2019-10-14 07:36

I've been getting myself in trouble with concurrent cordic ops. It's quite cool firing it off and coming back later to collect the results ... but, if for example I add in some debug type code, I find I'm breaking things too easy now because all my decimal printing is using the cordic divide operation.

So my first step to tidying this up a little is to at least make the printing routines themselves reliable in this scenario. The trick here is how to know you are getting the newest result - from print's QDIV operation. A little experimenting later and two instructions does it, eg:

emitclkfrq
		qdiv	clk_freq, ##1_000_000
		pollqmt					'clear old event
.flushloop
		getqx	pa				'MHz whole number - at final pipeline result
		jnqmt	#.flushloop			'wait for QMT flag - CORDIC pipeline flushed

		getqy	temp2				'six decimal places
		...

EDIT: PS: I fixed me problem. It was a bug, I wasn't clearing the event flag before using. I keep forgetting that the things that set the event flags, don't reset them.

EDIT2: It's in contrast to the straight through code that assumes the pipeline is empty prior to routine call. Which would be coded like this instead:

emitclkfrq
		qdiv	clk_freq, ##1_000_000
		getqx	pa				'MHz whole number
		getqy	temp2				'six decimal places
		...

My first attempt was to check before use, but that immediately annoyed me as bloaty code. eg:

emitclkfrq
		pollqmt					'clear old event
.flushloop
		getqx	inb
		jnqmt	#.flushloop			'wait for QMT flag - CORDIC pipeline flushed

		qdiv	clk_freq, ##1_000_000
		getqx	pa				'MHz whole number - at final pipeline result
		getqy	temp2				'six decimal places
		...

Cluso99 · 2019-10-14 09:08

evanh wrote: »

I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.

'===============================================
'Emit string from immediate code in hubRAM
'  input:  (hardware call stack) - hubRAM address of string
' result:  (none)
'scratch:  pb, temp1
'
putsi
		mov	temp1, ptra		'preserve existing PTRA
		pop	ptra			'address of immediate data following the CALL (includes the calling C/Z flags)
.loop
		rdbyte	pb, ptra++	wz	'get next charater, Z sets with null termination
	if_nz	call	#putch			'emit character
	if_nz	jmp	#.loop
		push	ptra			'update return address to instruction following the null character
		mov	ptra, temp1		'restore prior PTRA
		ret			wcz	'calling C/Z preserved

Interesting. Because PTRA++ only increments the lower 20 bits, and the upper bits remain unchanged.
Certainly a nice way to pass parameters.

evanh · 2019-10-29 03:01

A trap with the smartpin pulse out modes: This applies to pulse %00100 and transition %00101 out modes at least. Presumably also applies to all DAC, NCO and PWM modes as well. It really only affects pulse and transition modes though because they have an end count of pulses.

The "base period" is a metronomic clock from when the smartpin mode is first configured. This stays actively ticking within the smartpin even if the smartpin is not generating pulses. EDIT: What this means is that when WYPIN issues more pulses to generate, the smartpin is not instruction aligned but rather will start the pulse generation at the beginning of the next base period.

Most of the time this detail can be ignored. But I've been playing around with aligning a streamer bursting of SPI data out to coincide with a smartpin emulating a SPI clock. This means, because of the base period effect, the SPI clock pin will then have an unpleasant alignment dither with respect to the SPI data pin if the smartpin is not reconfigured for each burst. A disable/enable combo is not enough.

PS: It maybe possible to give the streamer the same "base period" and using XCONT instead of XINIT for each burst to duplicate the smartpin's behaviour. Not something I've tried out yet ...

PPS: Correction: Along with a compensation, clearing out the chaff allowed a DIRL+DIRH combo on the SPI clock smartpin to do the job. XCONT wasn't the answer.

rogloh · 2019-10-29 05:19

Just found something weird in testing some video driver code and hitting a bug I had to solve which took me a while.

When you copy ptrb to ptra the upper bits in ptra are somehow lost/trashed. This code fails:

        mov     ptra, ptrb              'make a copy to preserve things
        ...
        getnib  a, ptra, #5             'extract pin group

which behaves differently to this code below, which works.

        mov     pb, ptrb              'make a copy to preserve things
        ...
        getnib  a, pb, #5             'extract pin group

The snipped ... code in the middle is innocuous and doesn't ever access ptra.

evanh · 2019-10-29 06:49

rogloh wrote: »
When you copy ptrb to ptra the upper bits in ptra are somehow lost/trashed. This code fails:
        mov     ptra, ptrb              'make a copy to preserve things
        ...
        getnib  a, ptra, #5             'extract pin group 

Not seeing that here. Here's my test code:

		mov	bcdlen, #8
		mov	count, #10

.loop
		getrnd	ptrb
		mov     ptra, ptrb

		getnib	pa, ptra, #5
		call	#itoh
		call	#putsp

		mov	pa, ptra
		call	#itoh
		call	#putsp

		mov	pa, ptrb
		call	#itoh
		call	#putsp

		getnib	pa, ptrb, #5
		call	#itoh
		call	#putnl

		djnz	count, #.loop
		jmp	#$

and output:

00000003   aa30f2d5   aa30f2d5   00000003
00000006   c865965d   c865965d   00000006
00000000   9704b86f   9704b86f   00000000
00000002   ac2d8dcd   ac2d8dcd   00000002
0000000c   bdc56ccd   bdc56ccd   0000000c
0000000f   a8fcdcb5   a8fcdcb5   0000000f
00000009   909f3ad5   909f3ad5   00000009
00000001   131cc72a   131cc72a   00000001
00000006   e9611c4d   e9611c4d   00000006
0000000e   49e2fc62   49e2fc62   0000000e

rogloh · 2019-10-29 07:37

Well it definitely happens to me.

I removed all code in the ... part to rule anything else out.

This works fine:

            mov     pb, ptrb                  'make a copy to preserve things
            getnib  a, pb, #5                 'extract pin group

This does not

            mov     ptra, ptrb                'make a copy to preserve things
            getnib  a, ptra, #5               'extract pin group

neither does this...

            getnib  a, ptrb, #5               'extract pin group

Next time ptra gets accessed later in my code it is overwritten with a new value anyway so leaving residual data in it is not causing problems.. And it doesn't have to using be the pb register to somehow inadvertently make it work, other general registers work too instead of pb. It just seems using ptra or ptrb doesn't work here with getting upper nibbles, somehow the upper bits get lost. I thought these registers were meant to still be 32 bits.

ps. I am executing this code from LUT RAM in case that could possibly make any difference...?

evanh · 2019-10-29 07:51

Lutexec is fine for me.

rogloh · 2019-10-29 08:29

Are you using rev A or rev B?

evanh · 2019-10-29 08:37

revB at the moment. After earlier confusions with revA vs revB vs FPGA I have it list a few crucial detected parameters on each run. First text emitted of all recent runs:


Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
Rev B silicon.  Sysclock 4.0000 MHz

ozpropdev · 2019-10-29 09:21

Not seeing fault here either Roger.
Might be worth checking compiler output.
I'm running Pnut and I think evan runs fastspin?
IIRC your running P2ASM?

rogloh · 2019-10-29 09:25

I am running P2ASM and I have been overclocking somewhat in the 252-308MHz range. I'll check the P2ASM output to make sure it is not generating bad opcodes.

evanh · 2019-10-29 09:34

Yes, I'm using fastspin almost exclusively these days. I tested mine up to 395 MHz without issue. No issue with the data values at 400 MHz but it does crash as expected on repeated runs.

rogloh · 2019-10-29 09:38

Bad:

00910 303 f603f1f9             mov     ptra, ptrb                 'make a copy to preserve things
0094c 312 f86f1500             getnib  a, ptra, #5               'extract pin group

Good:

00910 303 f603eff9             mov     pb, ptrb                'make a copy to preserve things
0094c 312 f86b15f7             getnib  a, pb, #5               'extract pin group

The S address in "getnib a, ptra, #5" looks a bit weird if it's $100. Seems bad and almost like it's using the GETNIB D form, but not quite.

Seems this is a bug in P2ASM @"Dave Hein" are you still doing bug fixes? Actually I am running v0.016. I'd better check I'm up to the latest.

Update: Yes, I think it is the latest version on github

https://github.com/davehein/p2gcc/blob/master/p2asm_src/p2asm.c

Dave Hein · 2019-10-29 13:42

I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.

EDIT: This is now fixed in version 0.017.

RossH · 2019-10-31 10:25

Dave Hein wrote: »

I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.

EDIT: This is now fixed in version 0.017.

Blast! I wish I had read this thread before releasing the latest version of Catalina!

Serves me right for not keeping up to date

evanh · 2019-10-31 14:05

Is anyone supporting programming of the boot Flash EEPROM on board the Eval boards in their tools? Cluso, Chip, and Peter I think, worked out a pinout convention for having both SD and SPI bootable components on same four pins, P58-61. Chip has them documented in the prop2 doc. I presume Peter also uses same pinout for P2D2 boards.

PS: I've had very good success in tuning up Brian's demo code to make the booting very fast even for large binaries - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866

Cluso99 · 2019-10-31 19:38

While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
Perhaps the download authors can comment please ???

ersmith · 2019-10-31 22:03

loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.

rogloh · 2019-10-31 22:24

Cluso99 wrote: »

While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
Perhaps the download authors can comment please ???

Cluso, were any of your ROM based SD init & write sector routines made available in a callable manner? If so it might be more straightforward to load and run some very small SD downloader PASM into the P2 somewhat like it did with its MainLoader1.spin that can access these routines and then we can write directly to a file, instead of developing an entire SD handling object first before that will be possible.

I know we can yank the SD card and write it in a PC etc, but on the P2-EVAL getting the microSD in and out becomes a chore fast and is not that ideal during development. I think there are some extender cards available that would help with that.

RossH · 2019-11-01 00:07

Catalina has a command that can program any .bin file into the FLASH RAM on the P2 EVAL board. It uses a version of the Flash_Loader_1.2 by ozpropdev.

See the command "flash_payload"

evanh · 2019-11-01 00:27

RossH wrote: »

Catalina has a command that can program any .bin file into the FLASH RAM on the P2 EVAL board. It uses a version of the Flash_Loader_1.2 by ozpropdev.

See the command "flash_payload"

Good to hear. That should make it easy to integrate what I've done with speeding up the booting loader code.

I did also rework Brian's low-level serial programming routines but all that can be ignored. It was from when I was trying to figure out why nothing was working on the revB Eval board. Turns out I had a non-soldered CS pin on the Flash chip.

evanh · 2019-11-08 01:07

ersmith wrote: »

loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.

Ah, just noticed this as a request. Oz posted this a while back - https://forums.parallax.com/discussion/169608/prop2-flash-loader/p1
On the second page I had been reworking the low level reads for booting to get max loading speed - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866

evanh · 2019-11-19 03:25

Cluso,
Oh, that's not working for me - https://forums.parallax.com/discussion/comment/1482105/#Comment_1482105

You've got three types of "try"s. A column, 3 lines, and individual grid entries. What's the differences?

Cluso99 · 2019-11-19 16:52

Follow down left column as code executes...
So from Reset, the code tests for PU on P59, and if yes, Try will "try Serial", and if it fails else it will go on to "try FLASH" in the first column
Next "try FLASH", the code tests for PU on P61, and if yes, Try will "try FLASH", and if successful will load/run FLASH, else, will go on to "try SD" in the first column
Next "try SD", the code tests for PU on P60, and if yes, Try will "try SD", and if successful will load/run SD, else, will go on the "try Serial" in the first column
Next, "try Serial", the code tests for PD on P59, and if yes, Try, will "STOP", else will wait for "SERIAL" (after timeout will STOP IIRC)

Certainly could be expressed better - I did it in a hurry to see what pullups and pulldowns I need on a pcb.

evanh · 2019-11-19 23:40

Hmm, I'll add that Chip has six boot combinations listed in the google doc and, of the four boot lines you've got there, the first two, "Reset..." and "try FLASH..." are definitely wrong.

Peter Jakacki · 2019-11-20 04:43

P59 PD was specifically to disable serial boot altogether. So saying STOP is appropriate in this case.

Boot ROM code execution is in this order:
P59 PU - TRY SERIAL
P61 PU - TRY FLASH
P60 PU - TRY SD
P59 PD - IGNORE SERIAL

To force it to check serial first before anything else after reset you would add a pull-up to P59. ( Cluso99's "flow" table is correct )
If I wanted it to always check Flash first then I would have a pull-up on the Flash CS
If an SD card is inserted then it detected as a pull-up on its CS = P60 and initializing the card can take many 100's of milliseconds. (don't use an external PU)

So with a Flash in the system you would tend to have a PU on its CS so it would get checked first.
If the Flash didn't have valid boot code then it would check the SD card as long as there is one inserted.
Finally it would check serial as long as there isn't a PD there.

evanh · 2019-11-20 04:49

Here's the table from the prop2 google doc

P2 Tricks, Traps &amp; Differences between P1 (general discussion)

Comments

P2 Tricks, Traps & Differences between P1 (general discussion)