Is there a better description of the p2 instructions than the Rev B doc and spreadsheet?

Steve_Hatch · 2020-01-14 18:41

I am looking for a better description of the PASM instructions for the P2 than what is in the documentation for the Rev B silicon and the instruction spreadsheet. I'm having a hard time understanding the documentation. Abbreviations are used without explanation. There is a lot of assumed knowledge that I don't have. In the old P1 book, there was a fair explanation for each instruction that included some examples of usage. More than the ddddddddd sssssssss type of discussion. I am assuming because the P2 is so new that this type of documentation hasn't been developed yet, although if I wait it will become available. I'm just too impatient. I am just wondering if there is something I'm not aware of.

Steve

VonSzarvas · 2020-01-14 18:54

AFAIK there's not. The docs are in flux/progress as Chip finishes SPIN2.

You'll likely find a lot of answers in Propeller 2 forum threads, although sure - that can be a lot of digging sometimes!

Best advice for now, is to ask anything you need in the Propeller 2 thread, or use the Search box. There's lot's of experts here that have the knowledge you seek!

Rayman · 2020-01-14 19:28

There is a text file that comes with PNut…
It has some details that I don't think are anywhere else...

Actually, I guess that isn't much help...

Rayman · 2020-01-14 21:07

We do need a "Propeller II Manual", like we do for P1.

I guess the good news is that P2 ASM is largely a superset of P1 ASM.
So, the reads and moves and shifts work more or less the same way.

One thing about the spreadsheet is that it often has exact FPGA code for the result. So, if you can read Verilog (I think it was), then you're in good shape.

The documentation file is a pretty good resource though.
What abbreviations need explaining?
I think anybody can make notes in the documentation if more explanation is needed...

Cluso99 · 2020-01-14 21:18

There was a document on google? that detailed many of the instructions. However, as P2 changed this became out of date with the bit definitions although mostly the instructions still worked the same way. Not sure where that document is these days. Perhaps Peter J can chime in as IIRC he did a lot of that work.

Note that many of the P2 base instructions are the same as the P1 base instructions, with just the bit positions being different, and of course there is no 'NR' bit, and the no-op condition '0000' is now an implied _RET_.
Most condition flags are the same, but not all.

evanh · 2020-01-14 22:11

As Cluso says, the simpler instructions are just like the prop1, with the exact behaviour concisely in the google spreadsheet. There won't be full prop1 type documentation for a while. I won't try to guess how long.

The more complex instructions are explicitly documented in the big google doc. Things like how to use all the ALTx instructions, uses of SETQ and it's Q register, HUBSET, COGINIT, SKIP, REP, RDFAST, ... SETCMOD colour space config. How to config events and interrupts. Even the Cordic instructions have their section. Streamers and smartpins too of course.

evanh · 2020-01-14 22:22

The instructions.txt file contains the assembler directives like ORG, DAT, FIT. Things like # and @ were first described there too.

Steve_Hatch · 2020-01-14 22:41

I was trying to learn how to use the cog ram to implement a simple stack. So I wanted to simply create a base address and a stack pointer. Store some data on the stack. Read it back and compare what I stored with another temp register. Here is what I have done so far.


' Stack test 
' assembly language stack test program

con

  freq = 160_000_000
  mode = $010007f8
  delay = freq / 10

dat
    org

    ' set up clock
    hubset #0
    hubset ##mode  ' initialize oscillator
    waitx ##20_000_000/100 ' wait for it to settle down
    hubset ##mode + %11   ' enable it


		mov sb, #$100	' Set stack base to $100
		mov sp, #1	' Set some data in sp
		alts sb		' Try to set up sb as a pointer of where to write data
		mov sb, sp 	' Write data ( a 1) onto the stack
		alts sb
                mov x, sb
		mov y, #1
		sub x, y wz	' Subrtract the two temp registers
	if_z	jmp #test1
		jmp #test2

test1		mov x, #56	' Set x to blink 1st pin  values are equal
		drvnot x	' Toggle pin 56
		shl x, #18
		waitx x
		jmp #test1
test2		mov x, #63	' Set pin 63  Values are not equal
		drvnot x
		shl x, #18	
		waitx x
		jmp #test2
		
 	


sb	res 1		' stack base
sp	res 1		' stack pointer
x	res 1		' temporary register
y	res 1		' Temporary register

Rayman · 2020-01-14 23:01

Shouldn't that first "alts sb" be "altd sb"?

Also, a lot of people would write "mov 0-0,sp" on the next line where "0-0" just means that it is a value that will be overwritten...

Steve_Hatch · 2020-01-14 23:21

Thank you. That is what I was missing. My little example is now working with those two changes.

Steve

Cluso99 · 2020-01-15 01:13

@Steve_Hatch,
Is this just a test or are you trying to do something because there is a small 8 deep stack internally in each COG that supports calls/returns as well as push/pop. There are also registers PTRA/PTRB/PA/PB that can be used as pointers to registers with auto increment/decrement that can be useful for creating stacks especially in HUB and possibly in COG or LUT too.

evanh · 2020-01-15 01:24

PTRA/B can only be a direct address when used with RDLONG/RDLUT type instructions. They can't address cogRAM without use of ALTx prefixes, same as any general register.

I'd guess Steve is just using a stacking mechanism to test out the ALTx instructions.

AJL · 2020-01-15 01:33

Steve_Hatch wrote: »

I was trying to learn how to use the cog ram to implement a simple stack. So I wanted to simply create a base address and a stack pointer. Store some data on the stack. Read it back and compare what I stored with another temp register. Here is what I have done so far.

@Steve_Hatch,

I've reproduced your code with comments updated to cover what actually happens and some efficiencies added


' Stack test 
' assembly language stack test program

con

  freq = 160_000_000
  mode = $010007f8
  delay = freq / 10


dat
    org

    ' set up clock
    hubset #0
    hubset ##mode  ' initialize oscillator
    waitx ##20_000_000/100 ' wait for it to settle down
    hubset ##mode + %11   ' enable it


		mov sb, #$100	' Set stack base to $100
		mov sp, #1	' Set some data in sp; stack pointer is normally added to the stack base to get the current stack entry
'		alts sb		' Replace s field of next instruction with address of sb register; removal of this instruction would make no difference
		mov sb, sp 	' Effectively No-op: Write contents of sb to sb
'		alts sb             ' Removal of this instruction would make no difference
'                mov x, sb        ' Could be eliminated if a CMP instruction was used in place of the SUB below
'		mov y, #1
'		sub x, y wz	' Subtract the two temp registers
                cmp sb, #1 wz ' Single instruction to replace three
	if_z	jmp #test1
		jmp #test2

test1		'mov x, #56	' Set x to blink 1st pin  values are equal; use of immediate in drvnot eliminates the need to use a variable
		drvnot #56	' Toggle pin 56
		'shl x, #18
		waitx ##56<<18
		jmp #test1
test2		'mov x, #63	' Set pin 63  Values are not equal; use of pin 63 is potentially problematic as this is a serial boot pin
		drvnot #63
'		shl x, #18	
		waitx ##63<<18
		jmp #test2
		
 	


sb	res 1		' stack base
sp	res 1		' stack pointer
x	res 1		' temporary register
y	res 1		' Temporary register

Here's code that is closer to what I think you want, if you specifically want to use cog ram (a limited resource) for your stack:


' Stack test 
' assembly language stack test program

con

  freq = 160_000_000
  mode = $010007f8     ' gives 160MHz from 20MHz crystal
  delay = freq / 10
  sb = $100

dat
    org

    ' set up clock
    hubset #0
    hubset ##mode  ' initialize oscillator
    waitx ##20_000_000/100 ' wait for it to settle down
    hubset ##mode + %11   ' enable it



		mov sp, #0	' Set stack pointer to zero, so that written address is $100
                mov data, #5   ' Initialise data

		 altd sp, #sb	' set stack address as destination of next instruction to push to stack 
		 mov 0-0, data 	' push data to stack
		 add sp, #1        ' Update stack pointer; normally would have some code here to deal with stack over-flow

  ' now pop value from top of stack
                sub sp, #1 wz   ' Point to value to be popped
  ' there should be some code here testing Z to deal with stack under-flow
		alts sp, #sb      ' set stack address as destination of source of next instruction to pop from stack
                mov odata, 0-0
                cmp odata, data wz ' popped data matches pushed data?
	if_nz	jmp #failure

                rep #2,#0
        	 drvnot #56	' values match; toggle pin 56
		 waitx ##delay ' toggle at 5 Hz

failure	rep #2, #0
                 drvnot #57       'something went wrong; toggle pin 57
		 waitx ##delay ' toggle at 5 Hz

		
 	


sb	res 1		' stack base
sp	res 1		' stack pointer
data	res 1		' push data register
odata	res 1		' pop data register

lut ram is also worth considering for stacks as you have the rdlut and wrlut instructions that can update the pointer automatically as part of the operation. and as @Cluso99 has mentioned, there's a hardware stack in each cog, and hubram is a viable option for stacks too.

Peter Jakacki · 2020-01-15 02:57

There was this document I was doing up years ago and I guess I could bring it up to date, perhaps with some collaborative effort?
Here is the link to the static pubdoc version too.

wmosscrop · 2020-04-02 15:22

Cluso99 wrote: »

...because there is a small 8 deep stack internally in each COG that supports calls/returns as well as push/pop.

Am I correct in assuming that this replaces the P1's use of the RET instruction to store the return address for a subroutine call?
If so, it would be nice if this was documented in the P2 documentation (if it hasn't already been changed). Searching the forums for this type of information is iffy at best.

Rayman · 2020-04-02 16:23

I'm pretty sure this is true. It's more like the way everybody else does it now...

Cluso99 · 2020-04-02 19:38

Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2. You just cannot replace them, as each case requires understanding the P1 code. I should have pushed for a JMPRET instruction equivalent on P2 much earlier but it’s now too late. As we are now finding, nothing translates easily, so we are basically back to a new code write. P2 is a new animal that shares some common ideas with P1.

wmosscrop · 2020-04-02 20:57

Cluso99 wrote: »

Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

The instruction spreadsheet alludes to a stack "K". But I don't see any definition of that stack.
The docs mention the stack - in passing - as part of the EXECF instruction documentation.

Cluso99 wrote: »

The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2.

Yes, I am learning that the hard way. My 1130 emulator instruction decoding was based on altering a JMP instruction. Boy, that didn't work on the P2!

I am having major difficulties understanding switching between cog and hub execution. I believe it has something to do with relative vs. absolute addressing but have not yet gotten this to work properly (other than my issue with cog initialization, which works fine, the issue is with calling hub subroutines once I've returned to cog execution). I've make sure to use ORGH $400 and the like without any success.
Does anyone have some sample code that shows cog-to-hub subroutine calls?

Wuerfel_21 · 2020-04-02 21:04

Cluso99 wrote: »

Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2. You just cannot replace them, as each case requires understanding the P1 code. I should have pushed for a JMPRET instruction equivalent on P2 much earlier but it’s now too late. As we are now finding, nothing translates easily, so we are basically back to a new code write. P2 is a new animal that shares some common ideas with P1.

Umm, isn't CALLD basically P2's JMPRET ?

ersmith · 2020-04-02 22:28

wmosscrop wrote: »

I am having major difficulties understanding switching between cog and hub execution. I believe it has something to do with relative vs. absolute addressing but have not yet gotten this to work properly (other than my issue with cog initialization, which works fine, the issue is with calling hub subroutines once I've returned to cog execution). I've make sure to use ORGH $400 and the like without any success.
Does anyone have some sample code that shows cog-to-hub subroutine calls?

There's really nothing to switching between cog and hub: the processor does it automatically based on the PC (if the PC is 0-$1ff, then code runs from COG; if it's $200-$3ff then it runs from LUT; if it's $400 and above it runs from HUB). Any program compiled by fastspin does this at startup automatically: the initial setup code is in COG memory, it calls the main user program in HUB memory, and if that returns then the COG program does a cogstop. So the really stripped down version would look like:

    org 0
    ' COG startup code goes here
    call #my_hubcode
    ' more COG code here
...
    ' the very first orgh should give an address of at least $400
    orgh $400
my_hubcode
    ' here's the hub subroutine
    ret
    ' switch back to COG mode for a new COG program
    org 0
my_cog2
    ' here's some more COG code, maybe to run in another COG
    call #my_hubcode2

    ' done with COG code, now go back to HUB code with orgh
    ' don't need to provide an address, it'll just go on from wherever
    ' the HUB pc is now
    orgh
my_hubcode2
    ' more hub code
    ret

rogloh · 2020-04-02 22:58

Wuerfel_21 wrote: »

Umm, isn't CALLD basically P2's JMPRET ?

It's similar but not the same. On a P1 a JMPRET would only patch the S field in the destination register with the return address, and the destination register written would typically already be an actual JMP #S instruction. On a P2 a CALLD will write the PC (and flags) into the entire register and so it cannot remain a JMP instruction after this. You will need an additional indirect JMP instruction to emulate the P1's "RET" pseudo instruction which is actually just the JMP #S anyway.

Now I think Cluso has had problems translating his P1 PASM code because of this particular difference. There are certainly some ways in the P2 to try to emulate it but they all basically take up additional code space.

wmosscrop · 2020-04-03 00:13

@ersmith,

Here's your code, modified to show progress (or lack of it). When I run this (with PNut v34Q), both led's 61 & 62 are lit, but 56 does not. In addition, there should be a 1/4 second pause between 61 & 62 being lit; there is none.
Edit: When I run this with FlexGUI 4.1.3, it does work.
So it appears to me that the subroutine "my_hubcode" is never called with PNut.
Note that there is a compile message: "warning: orgh with explicit origin does not work if Spin methods are present".

CON
  _clkfreq = 80_000_000
PUB Start()
  coginit(16, @cogStart, 0)
  repeat
DAT
    org 0
    ' COG startup code goes here
cogStart
    drvl	#60
    call #my_hubcode
    drvl	#61
    ' more COG code here
loop    jmp	#loop
'...
    ' the very first orgh should give an address of at least $400
    orgh $400
my_hubcode
    ' here's the hub subroutine
    drvl	#56
    waitx	##20_000_000
    ret

ersmith · 2020-04-03 00:55

Ah, if you're mixing Spin and PASM then yes, the first orgh does *not* need an address (so leave off the $400 and just say "orgh"). That's only required for pure PASM programs.

If that still doesn't help, then you should probably report a bug to Chip, because I think it should work (and, as you discovered, it does work when compiled with fastspin/flexgui).

Rayman · 2020-04-03 02:05

iirc it took a few days for ersmith to get fastpin working as expected in Spin mode with hubexec and cogexec and maybe even lutexec all together in giant codes...

I wouldn't be surprised if Spin2 needs some tweaks to make it all right.

ozpropdev · 2020-04-03 02:28

When using Pnut with Spin2+PASM the orgh directive seems to ignore addresses.
Therefore your hubcode is being assigned a address of $18.

TYPE: DAT_LONG        VALUE: FFF00018          NAME: MY_HUBCODE

So the P2 assumes a cog address (<$400) which has no valid code at that address.

Your code works fine in PASM only form.
You can also pad out hubram with a LONG statement after the orgh directive.

	orgh
	long	0[$400)	'oadding

Pnut seems to put DAT blocks before the spin interpreter code.

Cluso99 · 2020-04-03 03:00

wmosscrop wrote: »

Cluso99 wrote: »

Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

The instruction spreadsheet alludes to a stack "K". But I don't see any definition of that stack.
The docs mention the stack - in passing - as part of the EXECF instruction documentation.

Cluso99 wrote: »

The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2.

Yes, I am learning that the hard way. My 1130 emulator instruction decoding was based on altering a JMP instruction. Boy, that didn't work on the P2!

I am having major difficulties understanding switching between cog and hub execution. I believe it has something to do with relative vs. absolute addressing but have not yet gotten this to work properly (other than my issue with cog initialization, which works fine, the issue is with calling hub subroutines once I've returned to cog execution). I've make sure to use ORGH $400 and the like without any success.
Does anyone have some sample code that shows cog-to-hub subroutine calls?

See this for calling the ROM monitor routines
https://forums.parallax.com/discussion/comment/1450717/#Comment_1450717

wmosscrop · 2020-04-03 07:36

ozpropdev wrote: »
You can also pad out hubram with a LONG statement after the orgh directive.
	orgh
	long	0[$400)	'oadding
Pnut seems to put DAT blocks before the spin interpreter code.

Yes, that worked for me. @cgracey, this seems to be an issue with PNut v34Q not handling the orgh correctly for spin2 + pasm. Not a major issue (other than my bald spot is getting larger).

wmosscrop · 2020-04-03 07:38

Cluso99 wrote: »

See this for calling the ROM monitor routines
https://forums.parallax.com/discussion/comment/1450717/#Comment_1450717

Thanks for this info. I didn't realize that something from 2018 was still relevant, I was assuming major changes since then.

cgracey · 2020-04-03 07:45

I'm looking into this ORGH issue in PNut.

cgracey · 2020-04-03 08:40

There's no bug in PNut over this. PNut just doesn't have final runtime addresses at compile time, because the object hierarchy is built progressively and then compacted. Only at runtime can Spin2 know an address using @symbol.

Here's a simple way to do it, by passing the 'hubcode' through PTRA (3rd term in COGINIT):

CON
  _clkfreq = 80_000_000

PUB Start()
  coginit(16, @cogcode, @hubcode)

DAT
    	org		'org automatically at $000

cogcode	drvnot	#60
	call	ptra
	jmp	#cogcode

	orgh		'orgh automatically at $400

hubcode	drvnot	#56
	waitx	##20_000_000
	ret

You can also use PTRB, which will contain the absolute address of 'cogcode' on entry, as a base to which you add the offset of 'hubcode'. Both 'cogcode' and 'hubcode' addresses in PASM are relative to the start of the object they exist in, not the absolute addresses of where they wind up at:

CON
  _clkfreq = 80_000_000

PUB Start()
  coginit(16, @cogcode, 0)

DAT
    	org		'org automatically at $000

cogcode	add	ptrb,#@hubcode-@cogcode

.loop	drvnot	#60
	call	ptrb
	jmp	#.loop

	orgh		'orgh automatically at $400

hubcode	drvnot	#56
	waitx	##20_000_000
	ret

Ariba · 2020-04-03 10:07

And how can this work if you have more than one hubcode subroutine?
You will need to modify the ptrb for every hub-call, which is ugly, slow and needs a lot of unnecessary cog code.

It would all be so much easier if we get the real hub address with @ in DAT blocks.

Andy

Is there a better description of the p2 instructions than the Rev B doc and spreadsheet?

Comments