Is there a better description of the p2 instructions than the Rev B doc and spreadsheet?

I am looking for a better description of the PASM instructions for the P2 than what is in the documentation for the Rev B silicon and the instruction spreadsheet. I'm having a hard time understanding the documentation. Abbreviations are used without explanation. There is a lot of assumed knowledge that I don't have. In the old P1 book, there was a fair explanation for each instruction that included some examples of usage. More than the ddddddddd sssssssss type of discussion. I am assuming because the P2 is so new that this type of documentation hasn't been developed yet, although if I wait it will become available. I'm just too impatient. I am just wondering if there is something I'm not aware of.

Steve
«134

Comments

  • AFAIK there's not. The docs are in flux/progress as Chip finishes SPIN2.

    You'll likely find a lot of answers in Propeller 2 forum threads, although sure - that can be a lot of digging sometimes!

    Best advice for now, is to ask anything you need in the Propeller 2 thread, or use the Search box. There's lot's of experts here that have the knowledge you seek!
  • RaymanRayman Posts: 11,072
    edited 2020-01-14 - 21:04:35
    There is a text file that comes with PNut…
    It has some details that I don't think are anywhere else...

    Actually, I guess that isn't much help...
  • RaymanRayman Posts: 11,072
    edited 2020-01-14 - 21:09:35
    We do need a "Propeller II Manual", like we do for P1.

    I guess the good news is that P2 ASM is largely a superset of P1 ASM.
    So, the reads and moves and shifts work more or less the same way.

    One thing about the spreadsheet is that it often has exact FPGA code for the result. So, if you can read Verilog (I think it was), then you're in good shape.

    The documentation file is a pretty good resource though.
    What abbreviations need explaining?
    I think anybody can make notes in the documentation if more explanation is needed...
  • There was a document on google? that detailed many of the instructions. However, as P2 changed this became out of date with the bit definitions although mostly the instructions still worked the same way. Not sure where that document is these days. Perhaps Peter J can chime in as IIRC he did a lot of that work.

    Note that many of the P2 base instructions are the same as the P1 base instructions, with just the bit positions being different, and of course there is no 'NR' bit, and the no-op condition '0000' is now an implied _RET_.
    Most condition flags are the same, but not all.
  • As Cluso says, the simpler instructions are just like the prop1, with the exact behaviour concisely in the google spreadsheet. There won't be full prop1 type documentation for a while. I won't try to guess how long.

    The more complex instructions are explicitly documented in the big google doc. Things like how to use all the ALTx instructions, uses of SETQ and it's Q register, HUBSET, COGINIT, SKIP, REP, RDFAST, ... SETCMOD colour space config. How to config events and interrupts. Even the Cordic instructions have their section. Streamers and smartpins too of course.
  • The instructions.txt file contains the assembler directives like ORG, DAT, FIT. Things like # and @ were first described there too.
  • I was trying to learn how to use the cog ram to implement a simple stack. So I wanted to simply create a base address and a stack pointer. Store some data on the stack. Read it back and compare what I stored with another temp register. Here is what I have done so far.
    ' Stack test 
    ' assembly language stack test program
    
    con
    
      freq = 160_000_000
      mode = $010007f8
      delay = freq / 10
    
    dat
        org
    
        ' set up clock
        hubset #0
        hubset ##mode  ' initialize oscillator
        waitx ##20_000_000/100 ' wait for it to settle down
        hubset ##mode + %11   ' enable it
    
    
    		mov sb, #$100	' Set stack base to $100
    		mov sp, #1	' Set some data in sp
    		alts sb		' Try to set up sb as a pointer of where to write data
    		mov sb, sp 	' Write data ( a 1) onto the stack
    		alts sb
                    mov x, sb
    		mov y, #1
    		sub x, y wz	' Subrtract the two temp registers
    	if_z	jmp #test1
    		jmp #test2
    
    test1		mov x, #56	' Set x to blink 1st pin  values are equal
    		drvnot x	' Toggle pin 56
    		shl x, #18
    		waitx x
    		jmp #test1
    test2		mov x, #63	' Set pin 63  Values are not equal
    		drvnot x
    		shl x, #18	
    		waitx x
    		jmp #test2
    		
     	
    
    
    sb	res 1		' stack base
    sp	res 1		' stack pointer
    x	res 1		' temporary register
    y	res 1		' Temporary register
    
  • Shouldn't that first "alts sb" be "altd sb"?

    Also, a lot of people would write "mov 0-0,sp" on the next line where "0-0" just means that it is a value that will be overwritten...
  • Thank you. That is what I was missing. My little example is now working with those two changes.

    Steve
  • @Steve_Hatch,
    Is this just a test or are you trying to do something because there is a small 8 deep stack internally in each COG that supports calls/returns as well as push/pop. There are also registers PTRA/PTRB/PA/PB that can be used as pointers to registers with auto increment/decrement that can be useful for creating stacks especially in HUB and possibly in COG or LUT too.
  • PTRA/B can only be a direct address when used with RDLONG/RDLUT type instructions. They can't address cogRAM without use of ALTx prefixes, same as any general register.

    I'd guess Steve is just using a stacking mechanism to test out the ALTx instructions.

  • AJLAJL Posts: 327
    edited 2020-01-15 - 01:35:36
    I was trying to learn how to use the cog ram to implement a simple stack. So I wanted to simply create a base address and a stack pointer. Store some data on the stack. Read it back and compare what I stored with another temp register. Here is what I have done so far.
    @Steve_Hatch,

    I've reproduced your code with comments updated to cover what actually happens and some efficiencies added
    ' Stack test 
    ' assembly language stack test program
    
    con
    
      freq = 160_000_000
      mode = $010007f8
      delay = freq / 10
    
    
    dat
        org
    
        ' set up clock
        hubset #0
        hubset ##mode  ' initialize oscillator
        waitx ##20_000_000/100 ' wait for it to settle down
        hubset ##mode + %11   ' enable it
    
    
    		mov sb, #$100	' Set stack base to $100
    		mov sp, #1	' Set some data in sp; stack pointer is normally added to the stack base to get the current stack entry
    '		alts sb		' Replace s field of next instruction with address of sb register; removal of this instruction would make no difference
    		mov sb, sp 	' Effectively No-op: Write contents of sb to sb
    '		alts sb             ' Removal of this instruction would make no difference
    '                mov x, sb        ' Could be eliminated if a CMP instruction was used in place of the SUB below
    '		mov y, #1
    '		sub x, y wz	' Subtract the two temp registers
                    cmp sb, #1 wz ' Single instruction to replace three
    	if_z	jmp #test1
    		jmp #test2
    
    test1		'mov x, #56	' Set x to blink 1st pin  values are equal; use of immediate in drvnot eliminates the need to use a variable
    		drvnot #56	' Toggle pin 56
    		'shl x, #18
    		waitx ##56<<18
    		jmp #test1
    test2		'mov x, #63	' Set pin 63  Values are not equal; use of pin 63 is potentially problematic as this is a serial boot pin
    		drvnot #63
    '		shl x, #18	
    		waitx ##63<<18
    		jmp #test2
    		
     	
    
    
    sb	res 1		' stack base
    sp	res 1		' stack pointer
    x	res 1		' temporary register
    y	res 1		' Temporary register
    

    Here's code that is closer to what I think you want, if you specifically want to use cog ram (a limited resource) for your stack:
    ' Stack test 
    ' assembly language stack test program
    
    con
    
      freq = 160_000_000
      mode = $010007f8     ' gives 160MHz from 20MHz crystal
      delay = freq / 10
      sb = $100
    
    dat
        org
    
        ' set up clock
        hubset #0
        hubset ##mode  ' initialize oscillator
        waitx ##20_000_000/100 ' wait for it to settle down
        hubset ##mode + %11   ' enable it
    
    
    
    		mov sp, #0	' Set stack pointer to zero, so that written address is $100
                    mov data, #5   ' Initialise data
    
    		 altd sp, #sb	' set stack address as destination of next instruction to push to stack 
    		 mov 0-0, data 	' push data to stack
    		 add sp, #1        ' Update stack pointer; normally would have some code here to deal with stack over-flow
    
      ' now pop value from top of stack
                    sub sp, #1 wz   ' Point to value to be popped
      ' there should be some code here testing Z to deal with stack under-flow
    		alts sp, #sb      ' set stack address as destination of source of next instruction to pop from stack
                    mov odata, 0-0
                    cmp odata, data wz ' popped data matches pushed data?
    	if_nz	jmp #failure
    
                    rep #2,#0
            	 drvnot #56	' values match; toggle pin 56
    		 waitx ##delay ' toggle at 5 Hz
    
    failure	rep #2, #0
                     drvnot #57       'something went wrong; toggle pin 57
    		 waitx ##delay ' toggle at 5 Hz
    
    		
     	
    
    
    sb	res 1		' stack base
    sp	res 1		' stack pointer
    data	res 1		' push data register
    odata	res 1		' pop data register
    

    lut ram is also worth considering for stacks as you have the rdlut and wrlut instructions that can update the pointer automatically as part of the operation. and as @Cluso99 has mentioned, there's a hardware stack in each cog, and hubram is a viable option for stacks too.
  • There was this document I was doing up years ago and I guess I could bring it up to date, perhaps with some collaborative effort?
    Here is the link to the static pubdoc version too.
  • Cluso99 wrote: »
    ...because there is a small 8 deep stack internally in each COG that supports calls/returns as well as push/pop.
    Am I correct in assuming that this replaces the P1's use of the RET instruction to store the return address for a subroutine call?
    If so, it would be nice if this was documented in the P2 documentation (if it hasn't already been changed). Searching the forums for this type of information is iffy at best.
  • I'm pretty sure this is true. It's more like the way everybody else does it now...
  • Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

    The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2. You just cannot replace them, as each case requires understanding the P1 code. I should have pushed for a JMPRET instruction equivalent on P2 much earlier but it’s now too late. As we are now finding, nothing translates easily, so we are basically back to a new code write. P2 is a new animal that shares some common ideas with P1.
  • Cluso99 wrote: »
    Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

    The instruction spreadsheet alludes to a stack "K". But I don't see any definition of that stack.
    The docs mention the stack - in passing - as part of the EXECF instruction documentation.
    Cluso99 wrote: »
    The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2.
    Yes, I am learning that the hard way. My 1130 emulator instruction decoding was based on altering a JMP instruction. Boy, that didn't work on the P2! :)

    I am having major difficulties understanding switching between cog and hub execution. I believe it has something to do with relative vs. absolute addressing but have not yet gotten this to work properly (other than my issue with cog initialization, which works fine, the issue is with calling hub subroutines once I've returned to cog execution). I've make sure to use ORGH $400 and the like without any success.
    Does anyone have some sample code that shows cog-to-hub subroutine calls?
  • Cluso99 wrote: »
    Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

    The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2. You just cannot replace them, as each case requires understanding the P1 code. I should have pushed for a JMPRET instruction equivalent on P2 much earlier but it’s now too late. As we are now finding, nothing translates easily, so we are basically back to a new code write. P2 is a new animal that shares some common ideas with P1.

    Umm, isn't CALLD basically P2's JMPRET ?
  • wmosscrop wrote: »
    I am having major difficulties understanding switching between cog and hub execution. I believe it has something to do with relative vs. absolute addressing but have not yet gotten this to work properly (other than my issue with cog initialization, which works fine, the issue is with calling hub subroutines once I've returned to cog execution). I've make sure to use ORGH $400 and the like without any success.
    Does anyone have some sample code that shows cog-to-hub subroutine calls?

    There's really nothing to switching between cog and hub: the processor does it automatically based on the PC (if the PC is 0-$1ff, then code runs from COG; if it's $200-$3ff then it runs from LUT; if it's $400 and above it runs from HUB). Any program compiled by fastspin does this at startup automatically: the initial setup code is in COG memory, it calls the main user program in HUB memory, and if that returns then the COG program does a cogstop. So the really stripped down version would look like:
        org 0
        ' COG startup code goes here
        call #my_hubcode
        ' more COG code here
    ...
        ' the very first orgh should give an address of at least $400
        orgh $400
    my_hubcode
        ' here's the hub subroutine
        ret
        ' switch back to COG mode for a new COG program
        org 0
    my_cog2
        ' here's some more COG code, maybe to run in another COG
        call #my_hubcode2
    
        ' done with COG code, now go back to HUB code with orgh
        ' don't need to provide an address, it'll just go on from wherever
        ' the HUB pc is now
        orgh
    my_hubcode2
        ' more hub code
        ret
    
  • Wuerfel_21 wrote: »

    Umm, isn't CALLD basically P2's JMPRET ?

    It's similar but not the same. On a P1 a JMPRET would only patch the S field in the destination register with the return address, and the destination register written would typically already be an actual JMP #S instruction. On a P2 a CALLD will write the PC (and flags) into the entire register and so it cannot remain a JMP instruction after this. You will need an additional indirect JMP instruction to emulate the P1's "RET" pseudo instruction which is actually just the JMP #S anyway.

    Now I think Cluso has had problems translating his P1 PASM code because of this particular difference. There are certainly some ways in the P2 to try to emulate it but they all basically take up additional code space.
  • wmosscropwmosscrop Posts: 364
    edited 2020-04-03 - 00:33:34
    @ersmith,

    Here's your code, modified to show progress (or lack of it). When I run this (with PNut v34Q), both led's 61 & 62 are lit, but 56 does not. In addition, there should be a 1/4 second pause between 61 & 62 being lit; there is none.
    Edit: When I run this with FlexGUI 4.1.3, it does work.
    So it appears to me that the subroutine "my_hubcode" is never called with PNut.
    Note that there is a compile message: "warning: orgh with explicit origin does not work if Spin methods are present".
    CON
      _clkfreq = 80_000_000
    PUB Start()
      coginit(16, @cogStart, 0)
      repeat
    DAT
        org 0
        ' COG startup code goes here
    cogStart
        drvl	#60
        call #my_hubcode
        drvl	#61
        ' more COG code here
    loop    jmp	#loop
    '...
        ' the very first orgh should give an address of at least $400
        orgh $400
    my_hubcode
        ' here's the hub subroutine
        drvl	#56
        waitx	##20_000_000
        ret
    
  • Ah, if you're mixing Spin and PASM then yes, the first orgh does *not* need an address (so leave off the $400 and just say "orgh"). That's only required for pure PASM programs.

    If that still doesn't help, then you should probably report a bug to Chip, because I think it should work (and, as you discovered, it does work when compiled with fastspin/flexgui).
  • iirc it took a few days for ersmith to get fastpin working as expected in Spin mode with hubexec and cogexec and maybe even lutexec all together in giant codes...

    I wouldn't be surprised if Spin2 needs some tweaks to make it all right.
  • When using Pnut with Spin2+PASM the orgh directive seems to ignore addresses.
    Therefore your hubcode is being assigned a address of $18.
    TYPE: DAT_LONG        VALUE: FFF00018          NAME: MY_HUBCODE
    
    So the P2 assumes a cog address (<$400) which has no valid code at that address.

    Your code works fine in PASM only form.
    You can also pad out hubram with a LONG statement after the orgh directive.
    	orgh
    	long	0[$400)	'oadding
    	
    

    Pnut seems to put DAT blocks before the spin interpreter code.
  • wmosscrop wrote: »
    Cluso99 wrote: »
    Yes. BTW I am fairly sure it’s mentioned in Chip’s docs as well as in the instruction set.

    The instruction spreadsheet alludes to a stack "K". But I don't see any definition of that stack.
    The docs mention the stack - in passing - as part of the EXECF instruction documentation.
    Cluso99 wrote: »
    The biggie here is that the P1 JMPRET/CALL/RET (and JMP) instructions are completely different in P2.
    Yes, I am learning that the hard way. My 1130 emulator instruction decoding was based on altering a JMP instruction. Boy, that didn't work on the P2! :)

    I am having major difficulties understanding switching between cog and hub execution. I believe it has something to do with relative vs. absolute addressing but have not yet gotten this to work properly (other than my issue with cog initialization, which works fine, the issue is with calling hub subroutines once I've returned to cog execution). I've make sure to use ORGH $400 and the like without any success.
    Does anyone have some sample code that shows cog-to-hub subroutine calls?

    See this for calling the ROM monitor routines
    https://forums.parallax.com/discussion/comment/1450717/#Comment_1450717
  • ozpropdev wrote: »
    You can also pad out hubram with a LONG statement after the orgh directive.
    	orgh
    	long	0[$400)	'oadding
    

    Pnut seems to put DAT blocks before the spin interpreter code.

    Yes, that worked for me. @cgracey, this seems to be an issue with PNut v34Q not handling the orgh correctly for spin2 + pasm. Not a major issue (other than my bald spot is getting larger).

  • Cluso99 wrote: »
    Thanks for this info. I didn't realize that something from 2018 was still relevant, I was assuming major changes since then.

  • I'm looking into this ORGH issue in PNut.
  • cgraceycgracey Posts: 12,809
    edited 2020-04-03 - 08:41:28
    There's no bug in PNut over this. PNut just doesn't have final runtime addresses at compile time, because the object hierarchy is built progressively and then compacted. Only at runtime can Spin2 know an address using @symbol.

    Here's a simple way to do it, by passing the 'hubcode' through PTRA (3rd term in COGINIT):
    CON
      _clkfreq = 80_000_000
    
    PUB Start()
      coginit(16, @cogcode, @hubcode)
    
    DAT
        	org		'org automatically at $000
    
    cogcode	drvnot	#60
    	call	ptra
    	jmp	#cogcode
    
    	orgh		'orgh automatically at $400
    
    hubcode	drvnot	#56
    	waitx	##20_000_000
    	ret
    

    You can also use PTRB, which will contain the absolute address of 'cogcode' on entry, as a base to which you add the offset of 'hubcode'. Both 'cogcode' and 'hubcode' addresses in PASM are relative to the start of the object they exist in, not the absolute addresses of where they wind up at:
    CON
      _clkfreq = 80_000_000
    
    PUB Start()
      coginit(16, @cogcode, 0)
    
    DAT
        	org		'org automatically at $000
    
    cogcode	add	ptrb,#@hubcode-@cogcode
    
    .loop	drvnot	#60
    	call	ptrb
    	jmp	#.loop
    
    	orgh		'orgh automatically at $400
    
    hubcode	drvnot	#56
    	waitx	##20_000_000
    	ret
    
  • And how can this work if you have more than one hubcode subroutine?
    You will need to modify the ptrb for every hub-call, which is ugly, slow and needs a lot of unnecessary cog code.

    It would all be so much easier if we get the real hub address with @ in DAT blocks.

    Andy
Sign In or Register to comment.